Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Redo constraints (since columns can only take one constraint at create time) #876

Merged
merged 3 commits into from
Dec 17, 2024

Conversation

benc-db
Copy link
Collaborator

@benc-db benc-db commented Dec 12, 2024

Description

When I went to apply the constraint logic from the previous PR, I discovered that Databricks does not like receiving multiple constraints for a single column in the table create syntax; however, you can add arbitrarily many table level constraints, so in this PR I aim to do the following:

  • if a column has a not_null constraint, directly set not_null on the column
  • any other column constraint gets converted to a TypedConstraint, a subclass of ModelLevelConstraint that enforces the requirements of the particular constraint, and holds the rendering logic. The conversion required setting 'columns=[name]'. Because python makes operator overloading a pain in the butt, I'm going back to OOP for the render logic.
  • If a model level constraint is not_null, convert to setting not_null on each affected column, since Databricks does not support not_null constraint definition as a table constraint
  • any other model constraint gets converted to a TypedConstraint

Checklist

  • I have run this code in development and it appears to resolve the stated issue
  • This PR includes tests, or tests are not required/relevant for this PR
  • I have updated the CHANGELOG.md and added information about my change to the "dbt-databricks next" section.

Copy link
Contributor

@alexguo-db alexguo-db left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

def get_enriched_columns(
existing_columns: list[DatabricksColumn], model_columns: dict[str, dict[str, Any]]
) -> list[DatabricksColumn]:
def parse_columns_and_constraints(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this used anywhere?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only in my branch where I found out it doesn't really work. I tried to isolate model and column constraint parsing/processing before, but when I tried to use the constraint code on my local branch, I discovered that Databricks does not like multiple column constraints on a single column; also dbt supports specifying a bunch of null constraints at the model level. Hence why I have to do this less elegant, 'process them all at the same time' approach :/.

@benc-db benc-db merged commit 87ec5d1 into 1.10.latest Dec 17, 2024
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants