Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(low code): Add GroupingPartitionRouter #354
base: main
Are you sure you want to change the base?
feat(low code): Add GroupingPartitionRouter #354
Changes from all commits
96ee457
cbf6328
26455ca
b35a165
2348d25
cf782a2
4d8c918
6cb895e
fd9b225
01fb6c8
1a1d407
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we assume this? Couldn't we group objects like
{"key1": "value1", "key2": "value2"}
?To be clear, I'm fine with not supporting that as part of the first iteration but I think we would need to:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Both ListPartitionRouter and SubstreamPartitionRouter return partitions as dictionaries with a single key. If a partition router were to return multiple keys, we couldn't guarantee that all partition keys would be consistently present in every partition. Some keys might be missing in the first batch and appear in the next one, making grouping unreliable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm good with that. Does that means that there should be an error if the partition router has multiple values? I would prefer to fail loudly especially since we allow for CustomPartitionRouter
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is interesting to me because it feels like we have a gap between the state of the parent and what is actually emitted and I'm not sure if this is fine. Let me explain:
. For example, given we have group_size = 2, we could have the following situation:
If we were to request the state between T0 and T1, the state would actually be wrong because we haven't consumed parent_1 from the child's perspective. I don't know if we have a process that does that. I could easily see that if we fail on the parent stream between T0 and T1, the sync would stop and maybe at that point we would update the state. However, I don't know if it is actually the case. Do we have something like that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're right. I'll update the GroupingPartitionRouter to return the state for the last emitted parent record to ensure state consistency.