source-hubspot-native: ignore search results with timestamps more recent than the requested window #2210
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description:
The search API is used for requesting a stream of "delayed" records, held back by a 1 hour horizon, to account for the eventual consistency of the HubSpot APIs in general.
Sometimes we see that a record in our search result is returned out of order with respect to its "updated at" timestamp. When this happens, there ends up being a record with an "updated at" timestamp fairly near the present, and outside the upper limit that was requested.
We can only speculate as to why this happens, but it could be because the record is getting updated around the same time as we make our search request, and it is getting included in the search results based on its original timestamp, but we are getting the updated record in the place of where the original one should have been.
Currently the strategy is for the connector to crash when this happens, at which point it will retry when restarted and eventually make progress. But we've seen cases where it's happening so often that limits the connector's progress and is just generally confusing for users to see, so the strategy is being modified to implement client-side filtering to exclude records with timestamps more recent than what were requested.
The upper timestamp limit
until
argument to our search API function is actually optional and you may be wondering when it would be absent: Custom record types, line items, and products are only obtainable through the search API, so these records actually use the search API for the non-delayed stream too. If we end up getting results out of order for these cases I guess we'll just continue to crash and have to deal with that later. I haven't seen the error with these stream types so I'm not particularly worried about it right now.Workflow steps:
(How does one use this feature, and how has it changed)
Documentation links affected:
(list any documentation links that you created, or existing ones that you've identified as needing updates, along with a brief description)
Notes for reviewers:
(anything that might help someone review this PR)
This change is