Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

source-hubspot-native: ignore search results with timestamps more recent than the requested window #2210

Merged
merged 1 commit into from
Dec 16, 2024

Conversation

williamhbaker
Copy link
Member

@williamhbaker williamhbaker commented Dec 16, 2024

Description:

The search API is used for requesting a stream of "delayed" records, held back by a 1 hour horizon, to account for the eventual consistency of the HubSpot APIs in general.

Sometimes we see that a record in our search result is returned out of order with respect to its "updated at" timestamp. When this happens, there ends up being a record with an "updated at" timestamp fairly near the present, and outside the upper limit that was requested.

We can only speculate as to why this happens, but it could be because the record is getting updated around the same time as we make our search request, and it is getting included in the search results based on its original timestamp, but we are getting the updated record in the place of where the original one should have been.

Currently the strategy is for the connector to crash when this happens, at which point it will retry when restarted and eventually make progress. But we've seen cases where it's happening so often that limits the connector's progress and is just generally confusing for users to see, so the strategy is being modified to implement client-side filtering to exclude records with timestamps more recent than what were requested.

The upper timestamp limit until argument to our search API function is actually optional and you may be wondering when it would be absent: Custom record types, line items, and products are only obtainable through the search API, so these records actually use the search API for the non-delayed stream too. If we end up getting results out of order for these cases I guess we'll just continue to crash and have to deal with that later. I haven't seen the error with these stream types so I'm not particularly worried about it right now.

Workflow steps:

(How does one use this feature, and how has it changed)

Documentation links affected:

(list any documentation links that you created, or existing ones that you've identified as needing updates, along with a brief description)

Notes for reviewers:

(anything that might help someone review this PR)


This change is Reviewable

…ent than the requested window

The search API is used for requesting a stream of "delayed" records, held back
by a 1 hour horizon, to account for the eventual consistency of the HubSpot APIs
in general.

Sometimes we see that a record in our search result is returned out of order
with respect to its "updated at" timestamp. When this happens, there ends up
being a record with an "updated at" timestamp fairly near the present, and
outside the upper limit that was requested.

We can only speculate as to why this happens, but it could be because the record
is getting updated around the same time as we make our search request, and it is
getting included in the search results based on its original timestamp, but we
are getting the updated record in the place of where the original one should
have been.

Currently the strategy is for the connector to crash when this happens, at which
point it will retry when restarted and eventually make progress. But we've seen
cases where it's happening so often that limits the connector's progress and is
just generally confusing for users to see, so the strategy is being modified to
implement client-side filtering to exclude records with timestamps more recent
than what were requested.

The upper timestamp limit `until` argument to our search API function is
actually optional and you may be wondering when it would be absent: Custom
record types, line items, and products are only obtainable through the search
API, so these records actually use the search API for the non-delayed stream
too. If we end up getting results out of order for these cases I guess we'll
just continue to crash and have to deal with that later. I haven't seen the
error with these stream types so I'm not particularly worried about it right
now.
Copy link
Member

@Alex-Bair Alex-Bair left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@williamhbaker williamhbaker merged commit 5f388c2 into main Dec 16, 2024
74 of 80 checks passed
@williamhbaker williamhbaker deleted the wb/hsn-search-ordering branch December 16, 2024 20:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants