From 51ce03f05eee9bb60639709c9e047e2dd44f911a Mon Sep 17 00:00:00 2001 From: Will Baker Date: Mon, 16 Dec 2024 14:14:16 -0500 Subject: [PATCH] source-hubspot-native: ignore search results with timestamps more recent than the requested window The search API is used for requesting a stream of "delayed" records, held back by a 1 hour horizon, to account for the eventual consistency of the HubSpot APIs in general. Sometimes we see that a record in our search result is returned out of order with respect to its "updated at" timestamp. When this happens, there ends up being a record with an "updated at" timestamp fairly near the present, and outside the upper limit that was requested. We can only speculate as to why this happens, but it could be because the record is getting updated around the same time as we make our search request, and it is getting included in the search results based on its original timestamp, but we are getting the updated record in the place of where the original one should have been. Currently the strategy is for the connector to crash when this happens, at which point it will retry when restarted and eventually make progress. But we've seen cases where it's happening so often that limits the connector's progress and is just generally confusing for users to see, so the strategy is being modified to implement client-side filtering to exclude records with timestamps more recent than what were requested. The upper timestamp limit `until` argument to our search API function is actually optional and you may be wondering when it would be absent: Custom record types, line items, and products are only obtainable through the search API, so these records actually use the search API for the non-delayed stream too. If we end up getting results out of order for these cases I guess we'll just continue to crash and have to deal with that later. I haven't seen the error with these stream types so I'm not particularly worried about it right now. --- source-hubspot-native/source_hubspot_native/api.py | 10 ++++++++-- 1 file changed, 8 insertions(+), 2 deletions(-) diff --git a/source-hubspot-native/source_hubspot_native/api.py b/source-hubspot-native/source_hubspot_native/api.py index 95d71cc080..8ee00e3aee 100644 --- a/source-hubspot-native/source_hubspot_native/api.py +++ b/source-hubspot-native/source_hubspot_native/api.py @@ -555,9 +555,15 @@ async def fetch_search_objects( for r in result.results: this_mod_time = r.properties.hs_lastmodifieddate + + if until and this_mod_time > until: + log.info( + "ignoring search result with record modification time that is later than maximum search window", + {id: r.id, "this_mod_time": this_mod_time, "until": until}, + ) + continue + if this_mod_time < max_updated: - # This should never happen since results are requested in - # ASCENDING order by the last modified time. raise Exception(f"last modified date {this_mod_time} is before {max_updated} for {r.id}") max_updated = this_mod_time