Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

materialize-snowflake: unset internal state for empty bindings #1218

Merged
merged 1 commit into from
Jan 29, 2024

Conversation

mdibaiee
Copy link
Member

@mdibaiee mdibaiee commented Jan 26, 2024

Description:

  • Until now, if a binding had some documents in one transaction, but no documents in subsequent transactions, we would not clear its queries from the checkpoint, so on every call to Acknowledge we would end up running the last set of queries for that binding again and again until it had new documents.
  • Unset the internal state of bindings which have no documents so no queries are run for those in subsequent
    transactions

Tested by reproducing the original issue, and confirming the issue no longer exists with this pull-request

Workflow steps:

(How does one use this feature, and how has it changed)

Documentation links affected:

(list any documentation links that you created, or existing ones that you've identified as needing updates, along with a brief description)

Notes for reviewers:

(anything that might help someone review this PR)


This change is Reviewable

@jgraettinger
Copy link
Member

This seems to remove the binding state for binding X, while preparing a transaction to commit where binding Y, Z, are updated (but not X). Right?

Why is the post-Acknowledge state clearing mechanism insufficient? Is it because the runtime state has the binding removed per the post-Acknowelge flow, but the connector's view of the state doesn't ? I'm also assuming (but didn't verify) that it's not using merge-patch state updates.

This would be a harder mistake to make, if a) the connector issued merge-patch updates of its state, and 2) issued merge-patch updates like deletions were coded up side-by-side with the corresponding operation updating the connector's internal state view. Agree? Is that an achievable factoring without a ton of effort?

@mdibaiee mdibaiee force-pushed the mahdi/snowflake-state-fix branch from e6634b6 to dbb6292 Compare January 29, 2024 15:57
@mdibaiee
Copy link
Member Author

mdibaiee commented Jan 29, 2024

@jgraettinger that does make sense actually. We do use merge patch updates, but at the point where we clear the checkpoint of bindings that have been processed, we were not clearing it up from the connector's current state. I just moved the deletion of the binding state to the point where we cleanup the checkpoint.

@@ -568,6 +568,7 @@ func (d *transactor) Acknowledge(ctx context.Context) (*pf.ConnectorState, error
var checkpointClear = make(checkpoint)
for _, b := range d.bindings {
checkpointClear[b.target.StateKey] = nil
delete(d.cp, b.target.StateKey)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With this change, I don't think we need the if len(item.Query) == 0 { check above anymore, on line 531.

Copy link
Member

@williamhbaker williamhbaker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This LGTM, assuming @jgraettinger is good with the update per his comment.

I had a thought that constructing an entirely separate "checkpoint" object was even a bit more than we needed to do, and could instead more directly update the internal state to clear the completed queries and then marshal that internal state into the checkpoint to emit. But I guess this would be slightly less efficient, and kind of annoying to deal with the nil checkpoints values we'd need for merge patches to work out.

Copy link
Member

@jgraettinger jgraettinger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@mdibaiee mdibaiee merged commit 13e3818 into main Jan 29, 2024
43 of 44 checks passed
@mdibaiee mdibaiee deleted the mahdi/snowflake-state-fix branch January 29, 2024 17:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants