-
Notifications
You must be signed in to change notification settings - Fork 123
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix dbt incremental_strategy behavior by fixing schema table existing check #530
Fix dbt incremental_strategy behavior by fixing schema table existing check #530
Conversation
It's unclear how the change you're proposing accomplishes your stated goal. It seems if anything, that your change will try to compare a potentially mixed-case schema against a list that I think is natively lower-cased. |
@benc-db 🔹 Normal Expected Case:Defining the schema in lowercase
🔹 Abnormal Unexpected Case:Defining the schema in uppercase
This happens because when doing an incremental update, it checks for the existence of the table. If the schema is passed in uppercase, it is determined that the table does not exist and thus a recreate is triggered. While it is possible to fix this on the incremental side, I thought it would be more considerate to check beforehand if the schema defined in profiles.yml exactly matches.This is what I would like to solve and the change that I make. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe you that the problem exists, but the code you're proposing to remove should be exactly preventing what you describe, as it compares the fully lower-cased version of the schema in question to the lower-cased name of every schema in the database. Given that, as you said, Databricks returns the schema name lower-cased, how does removing schema.lower() achieve your ends? Have you tested that using your local version fixes your problem?
Thank you for your reply. I am attempting to integration test, but I haven't yet been able to create the environment for that.I am trying to do following document way. What I want to achieve is that if the 'profiles' schema is passed in uppercase, it would be treated as a non-existent schema, leading to a connection failure and a fail state. Currently, even if passed in uppercase, it does not fail to connect, which causes issues in subsequent processes like I shared. By removing the 'lower', I believe it will compare the uppercase 'profiles' schema passed with the lowercase schema retrieved from Databricks, and fail the connection as a non-existent schema. |
So, I'm now understanding that you are trying to make it fail at the schema step so that you don't recreate the table. I didn't realize that was your goal initially. Ok, so I think the fix that is more inline with our principles is to find out why the table check is failing and fix that instead. In general we trying to make everything lowercase for comparison because, to my knowledge, Databricks is not supposed to be case sensitive. I think getting rid of the lowercasing here will be too disruptive. Can we instead identify why the table check is failing? |
Yea I apologize for the confusion, of course we can identify and can find the ideal way to solve this issue. It will be possible that user lost history data. So I think better to solve this issue some how.Thank you! |
I have created the new PR to investigate this schema issue. Could you also check this PR? Currently I am investigating by passing schema.This PR help to pass default schema option and will be useful to investigate this issue . |
Hi @benc-db I fix this issue could you please check this change? I have test it locally and confirm the issue has been fixed. I pass the uppercase of schema and confirm instead of recreate table, append the rows into existing tables To pass the uppercase of schema from dbt integration test , I need to fix existing integration environment. I have created that change as different PR. So please check this PR as well
Checked uppercase of schema has been passed from integration test and list table check has been done by lowercase converted schema.Table existing check has been passed and this change increment rows into existing table instead of recreate table
Checked append the row as expected. In the second log of the table history, 「WRITE」 is recorded.
I also try to test without lower convert from list tables method. I confirmed instead of append rows, table has been recreated and integration test is failed as we expected. In the second log of the table history, 「CREATE OR REPLACE」 is recorded.
|
I think this is the same issue |
Sorry for the delay, was taking holidays. Will take a look today. |
@benc-db Thank you! I modified the query and table history images to make the verification results easier to understand. |
Hello @benc-db Thank you for the other PR check that I requested ! I appreciate if you check this PR as well when you have time. Thank you. |
@case-k-git please resolve the conflict, and then I will run tests and review :) |
@benc-db Thank you fix the conflict |
Thanks for your help and patience. Will merge today, and this will be in the next release (which I expect will come out next week). |
@benc-db Thank you !! |
Resolves #
#523
Description
I have made a modification to consider case sensitivity when checking for the presence of a schema. Databricks manages schemas in lowercase, so if there isn't an exact match, it can cause problems in subsequent processes. For example, when setting
materialized='incremental'
in the SQL Config for an append operation, it checks for the existence of the table before appending the differential data. If the schema is passed inuppercase
in this case, since the schema on Databricks is inlowercase
, the check for the table's existence will fail, and instead of appending, it will run recreate the existing table even if using incremental option. We have now fixed it to ensure an exact match during the initial schema check.Checklist
CHANGELOG.md
and added information about my change to the "dbt-databricks next" section.