Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feat: Source Microsoft Onedrive #1199

Closed
wants to merge 52 commits into from
Closed

Conversation

flavioriper
Copy link
Contributor

@flavioriper flavioriper commented Jan 22, 2024

Description:

(Describe the high level scope of new or changed features)

Workflow steps:

(How does one use this feature, and how has it changed)

Documentation links affected:

(list any documentation links that you created, or existing ones that you've identified as needing updates, along with a brief description)

Notes for reviewers:

(anything that might help someone review this PR)


This change is Reviewable

@jonwihl
Copy link
Contributor

jonwihl commented Jan 22, 2024

LGTM

williamhbaker and others added 25 commits January 25, 2024 09:14
Updates the connector to use `stateKey` for each binding to track its state.

This includes a migration for existing captures. Once all existing captures have started up with
this new connector and we are confident that no additional old captures will be re-enabled, we can
remove the migration code.
The materializations for some fields may be ambiguous if the destination systems are for example
case insensitive, and a field like "thisField" will be materialized as the same column as
"thisfield". Presently this is a concern for `materialize-redshift`, which treats everything as
lowercase, and `materialize-snowflake` because of the historical way we have quoted identifiers.

Historically we have not handled these cases well, and would end up with a error during apply
because of trying to create a duplicate column. More recently, changes were made to outright forbid
these ambiguous fields, and instead require alternate projections of them in order to be
materialized. Forbidding the fields was simple, but excessive since users generally want to use
field selection (rather than projections on collections) to pick which ambiguous field they actually
want as the quick and easy way to get something working.

This change enables that by implementing the following behaviors:
* For a new binding of a materialization, ambiguous fields will all be optional if they aren't
  otherwise forbidden. The constraint description will say that only one of them should be selected
  to materialize, since we don't have any kind of mutual exclusion constraint. If a user selects
  more than one of these fields for a new materialization, it will fail to apply.
* Once an ambiguous field is selected for materialization, all others from that set of ambiguous
  fields will be forbidden. If a binding is published with no ambiguous field selected, they will
  all still be constrained as optional so that it is possible to select one for an existing binding.
* In the event of a pre-existing table with no previously persisted spec for that binding, the
  ambiguous fields will still be set as optional, because we don't know which projection the
  existing column should correspond to without a field selection. This is different than other
  pre-existing columns which we will recommend by default, but seems to be necessary given the
  ambiguous nature of the columns.
… default value

Materializations do not support collections with nullable keys, and will generally fail if a null
value is every encountered for a collection key.

This changes validation to enforce that a nullable collection key not be allowed, unless it has a
default value annotation. If there is a default value annotation, the connector is assured that it
will never actually see a null value, and it will not try to drop NOT NULL constraints from the
materialized field in future `Apply` calls.
mdibaiee and others added 23 commits January 25, 2024 09:14
…or float64

We used to create number columns as "bignumeric", so allowing "bignumeric" for these columns allows
for backward compatibility. See #859.
…keyword

Users may wish to pre-create indices with specific mappings as keywords instead of text, or alter
their mappings after the fact.
…ceStates`

`updateResourceStates` may remove bindings from the checkpoint, and these removals won't be reflected
in persisted merge patches in future document checkpoints. Because of this, we should always emit a
full update to the state checkpoint in `Pull` after `updateResourceStates` has run to persist any
changes in the state due to bindings being removed.
We used to create number columns as "bignumeric" (see
#859.), and now create strings formatted as integer
columns as "bignumeric". Allowing "bignumeric" columns to be either integer or number compatible
allows both new and old materializations to work.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants