Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FDB-427 Limit memory usage on remotefdb.read #96

Merged
merged 6 commits into from
Mar 15, 2025

Conversation

ChrisspyB
Copy link
Member

(Only relevant to RemoteFDB)

Introduces the ReadLimiter class, which tracks the total size of fields already requested from the store servers and prevents requesting more fields until the exisitng ones have been consumed. The FDBRemoteDataHandle is responsible for signalling to the ReadLimiter when its field has been consumed.

RemoteStores no longer directly send their request to a server, they now first send it to the ReadLimiter class, which holds onto the request until there is sufficient memory to process it.

@codecov-commenter
Copy link

codecov-commenter commented Mar 11, 2025

Codecov Report

Attention: Patch coverage is 7.84314% with 94 lines in your changes missing coverage. Please review.

Project coverage is 63.92%. Comparing base (c914500) to head (90f8f56).

Files with missing lines Patch % Lines
src/fdb5/remote/client/ReadLimiter.cc 5.88% 80 Missing ⚠️
src/fdb5/remote/client/RemoteStore.cc 0.00% 11 Missing ⚠️
src/fdb5/remote/RemoteFieldLocation.cc 0.00% 2 Missing ⚠️
src/fdb5/remote/client/ReadLimiter.h 0.00% 1 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff             @@
##           develop      #96      +/-   ##
===========================================
- Coverage    64.25%   63.92%   -0.33%     
===========================================
  Files          278      280       +2     
  Lines        15713    15806      +93     
  Branches      1632     1644      +12     
===========================================
+ Hits         10096    10104       +8     
- Misses        5617     5702      +85     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@ChrisspyB ChrisspyB requested a review from danovaro March 12, 2025 00:15
@ChrisspyB ChrisspyB marked this pull request as ready for review March 12, 2025 11:49
@@ -93,7 +93,8 @@ eckit::DataHandle* RemoteFieldLocation::dataHandle() const {
remote.query("internalHost", "");
}
remote.query("internalScheme", "");
FieldLocation* loc = FieldLocationFactory::instance().build(scheme, remote, offset_, length_, remapKey_);
std::unique_ptr<FieldLocation> loc(
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NB: this was a fly-by fix of a memory-leak.

@ChrisspyB ChrisspyB requested a review from simondsmart March 13, 2025 11:04
Copy link
Contributor

@simondsmart simondsmart left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some simple comments added.

But overall, I don't understand why this is a thing. We have (size limited) queues for received messages, and for data waiting to be read. Why is that not catching the right stuff?

// When a RemoteStore is destroyed, it must evict any unconsumed requests.
// If all went well, there will be no requests to evict, but we must check (consumer may have abandoned the
// request).
/// @todo: This is somewhat pointless right now because the RemoteStores appear to be infinitely long lived...
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems surprising. Are these not owned by something with a finite lifetime?

Copy link
Member Author

@ChrisspyB ChrisspyB Mar 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not as far as I can tell:

RemoteStore& RemoteStore::get(const eckit::URI& uri) {
    static std::mutex storeMutex_;
    // we memoise one read store for each endpoint. Do not need to have one for each key
    static std::map<std::string, std::unique_ptr<RemoteStore>> readStores_;

    std::lock_guard<std::mutex> lock(storeMutex_);
    const std::string& endpoint = uri.hostport();
    auto it                     = readStores_.find(endpoint);
    if (it != readStores_.end()) {
        return *(it->second);
    }

    return *(readStores_[endpoint] = std::make_unique<RemoteStore>(uri, Config()));
}

Where RemoteStore::get is a static function


// Attempt to send the next request in the queue. Returns true if a request was sent.
// If not enough memory is available, or there is no next request, returns false.
bool tryNextRequest();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is a bit odd that the return value for this method doesn't seem to ever be used

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I figured there could one day be contexts where it was useful to know whether the request was actually sent. But can make void if this is premature.

@ChrisspyB ChrisspyB force-pushed the fix/FDB-427-memory-limit branch from 9e7ff15 to c612266 Compare March 13, 2025 14:12
@ChrisspyB
Copy link
Member Author

We would like to have more fine grained control of queue sizes. We have decided this will be implemented in a followup PR, for the sake of expediency. I will just make some small changes here first before we merge.

@ChrisspyB ChrisspyB force-pushed the fix/FDB-427-memory-limit branch from 58f1e43 to 34d522f Compare March 14, 2025 18:20
@danovaro danovaro merged commit f3cf7a6 into develop Mar 15, 2025
146 checks passed
@danovaro danovaro deleted the fix/FDB-427-memory-limit branch March 15, 2025 15:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants