-
Notifications
You must be signed in to change notification settings - Fork 18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
materializations: improved handling for "ambiguous" materialized fields #1200
Conversation
48c4443
to
a448184
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
c = &pm.Response_Validated_Constraint{ | ||
Type: pm.Response_Validated_Constraint_FIELD_FORBIDDEN, | ||
Reason: fmt.Sprintf( | ||
"Flow collection field '%s' is ambiguous with fields already being materialized as '%s' in the destination. Consider using an alternate, unambiguous projection of this field to allow it to be materialized", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 all this was no-doubt tedious to implement, but will be really helpful for users trying to understand what's happening and why.
c = &pm.Response_Validated_Constraint{ | ||
Type: pm.Response_Validated_Constraint_FIELD_OPTIONAL, | ||
Reason: fmt.Sprintf( | ||
"Flow collection field '%s' would be materialized as '%s', which is ambiguous with fields [%s]. Only a single field from this set should be selected. Consider using alternate projections if you want to materialize more than one of these fields", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Noting this error is the same as the one in the new bindings case. Maybe it could be extracted into a common constant, but ... that confuses Go tooling and there's just one instance of duplication so probably best to leave as-is 🤷. Maybe just comment both to note that they're effectively the same reason and language should be changed together ?
Reason: "This field is a key in the current materialization", | ||
} | ||
} else { | ||
// TODO(whb): Really this should be "FIELD_RECOMMENDED", but that is not a |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lemme know if you think we need to implement this.
The materializations for some fields may be ambiguous if the destination systems are for example case insensitive, and a field like "thisField" will be materialized as the same column as "thisfield". Presently this is a concern for `materialize-redshift`, which treats everything as lowercase, and `materialize-snowflake` because of the historical way we have quoted identifiers. Historically we have not handled these cases well, and would end up with a error during apply because of trying to create a duplicate column. More recently, changes were made to outright forbid these ambiguous fields, and instead require alternate projections of them in order to be materialized. Forbidding the fields was simple, but excessive since users generally want to use field selection (rather than projections on collections) to pick which ambiguous field they actually want as the quick and easy way to get something working. This change enables that by implementing the following behaviors: * For a new binding of a materialization, ambiguous fields will all be optional if they aren't otherwise forbidden. The constraint description will say that only one of them should be selected to materialize, since we don't have any kind of mutual exclusion constraint. If a user selects more than one of these fields for a new materialization, it will fail to apply. * Once an ambiguous field is selected for materialization, all others from that set of ambiguous fields will be forbidden. If a binding is published with no ambiguous field selected, they will all still be constrained as optional so that it is possible to select one for an existing binding. * In the event of a pre-existing table with no previously persisted spec for that binding, the ambiguous fields will still be set as optional, because we don't know which projection the existing column should correspond to without a field selection. This is different than other pre-existing columns which we will recommend by default, but seems to be necessary given the ambiguous nature of the columns.
… default value Materializations do not support collections with nullable keys, and will generally fail if a null value is every encountered for a collection key. This changes validation to enforce that a nullable collection key not be allowed, unless it has a default value annotation. If there is a default value annotation, the connector is assured that it will never actually see a null value, and it will not try to drop NOT NULL constraints from the materialized field in future `Apply` calls.
3807100
to
012c82c
Compare
Description:
The materializations for some fields may be ambiguous if the destination systems are for example case insensitive, and a field like "thisField" will be materialized as the same column as "thisfield". Presently this is a concern for
materialize-redshift
, which treats everything as lowercase, andmaterialize-snowflake
because of the historical way we have quoted identifiers.Historically we have not handled these cases well, and would end up with a error during apply because of trying to create a duplicate column. More recently, changes were made to outright forbid these ambiguous fields, and instead require alternate projections of them in order to be materialized. Forbidding the fields was simple, but excessive since users generally want to use field selection (rather than projections on collections) to pick which ambiguous field they actually want as the quick and easy way to get something working.
This change enables that by implementing the following behaviors:
Also, a change is included to enforce that materialized collection keys are not
null
, or are nullable with a default value annotation as part ofValidate
calls. Materializations that use typed columns will currently fail in unspecified ways if anull
value is actually present in a collection, and this ensures thatValidate
will not allow key fields that may benull
.Workflow steps:
Materialize a collection with fields that the destination system considers ambiguous. A blind publication of the materialization will not fail, but will also not include any of the ambiguous fields by default. Selecting one of them via field selection will work, and
forbidden
constraints will then result for other fields that are amiguous with the selected field.Documentation links affected:
(list any documentation links that you created, or existing ones that you've identified as needing updates, along with a brief description)
Notes for reviewers:
There's some test snapshot update churn, since I re-ran all the tests with the latest binaries that include the
flow_truncated
metadata field.This change isdata:image/s3,"s3://crabby-images/d0bb7/d0bb7f7625ca5bf5c3cf7a2b7a514cf841ab8395" alt="Reviewable"