Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Alloc metadata storage scheme. #282

Closed
wants to merge 4 commits into from

Conversation

dsm9000
Copy link
Contributor

@dsm9000 dsm9000 commented Aug 14, 2023

No description provided.

Copy link
Contributor

@deadalnix deadalnix left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This does not implement any kind of feature, so there is not much feedback I can provide. To know what architecture work, you need to have the code do something. Only then we can judge of the code.

@@ -217,13 +280,22 @@ public:
return (bits >> 48) & Mask;
}

uint allocate() {
uint allocate(bool isAppendable = false, bool isFinalizable = false) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is unlikely to be the API we want. We are going to allocate slots in bulk and store them in a per thread cache, as per #269 so this API already doesn't work, unless we want to have 4 caches for every combination of flags, which doesn't seems workable.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why would it be necessary to have separate caches for the combinations of flags when caching per thread? The flags are only set when a slot is occupied.

Also, I thought we were going to start with the simplest mechanisms and then refine?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you gain by increasing the cache size by 4x?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@deadalnix do you have a description of the intended cache scheme somewhere that would explain why using this scheme would require a 4x larger cache? Slabs were not even mentioned in #269 or the linked snippets from jemalloc.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@deadalnix Finally understood what you meant here, following our Slack conversation re: when in the pipeline the metadata ought to be set (i.e. in tcache rather than in Extent or the arena.)

}

@property
ref _meta extMeta() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't used anywhere, why add it?

}

@property
bool isFinalizable(uint index) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I highly doubt this is the API we want. In what context will we want to ask the question in this manner? When collecting, we'll probably go over the whole slab either asking the question or not depending on the sizeclass right out of the bat. That question would be easy to answer if the code actually did something.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

isFinalizable is intended to be used when destroying a slab allocation (in which case the finalizer is expected to be present in the tail of the block, and is invoked.) isAppendable is intended to be used when carrying out an operation which requires appendability (or invoking the predicated which determines whether such operations may be used) on a slab allocation.

Copy link
Contributor Author

@dsm9000 dsm9000 Aug 14, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please note that it is impossible to answer the question of e.g. isFinalizable simply by looking at a slab's size class -- a slab alloc in that class may nonetheless not contain a finalizer. Similarly for isAppendable. I did not exclude these classes from containing "non-special" allocs (i.e. without a finalizer or length appendix.)

Comment on lines 184 to 188
@property
ulong slabSlots() const {
assert(isSlab(), "slabSlots accessed on non slab!");
return ulong(binInfos[sizeClass].slots);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Every time this is called, you take the risk of taking one cache miss. There is no reason to make it look like member data, and more generally, I doubt it's actually what's needed because I doubt most of these computation ought to happen in this class. In fact, #283 kinda gives it away, as computation there is redundant with computation here, so even without the code actually doing anything, it's already suspicious.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How can one determine which meta bits are valid for a given slab alloc size class without knowing the slot size? And it doesn't seem that there is any convenient place to cache it locally in Extent. And where would you have the computation happen if not in Extent ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, there is nothing redundant about this piece vs. #283. In the latter, we determine the effective size of a slab alloc required to support the flags shown here. Let's illustrate with excerpt from the size class dumper:

id: 0   size: 8          needPages: 1    slots: 512
id: 1   size: 16        needPages: 1    slots: 256
id: 2   size: 24        needPages: 3    slots: 512
id: 3   size: 32        needPages: 1    slots: 128
id: 4   size: 40        needPages: 5    slots: 512
id: 5   size: 48        needPages: 3    slots: 256
id: 6   size: 56        needPages: 7    slots: 512
id: 7   size: 64        needPages: 1    slots: 64
id: 8   size: 80        needPages: 5    slots: 256
id: 9   size: 96        needPages: 3    slots: 128
id: 10  size: 112       needPages: 7    slots: 256

With the given scheme, classes and 0, 2, 4, 6 leave no room for any flags.
1, 5, 8, and 10 allow only one flag per slab slot.
All others permit two flags per slot.

In #283 we compute any required up-sizing for a slab alloc to allow it to utilize the requested flags.
Whereas here we access the flags per se.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thinking about it, I can hardcode the classes in the predicates in Extent; this is probably what you meant when said "redundant". But this would leave less flexibility to modify the size classes later. Is this what we want?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Which means you can do (sc & 0x01) || (sc > 6) to know which size are suitable for appendable allocations. you can use a unittest to ensure implementation match the intended behavior (256 entries or less).

Copy link
Contributor Author

@dsm9000 dsm9000 Aug 14, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@deadalnix Excellent point, thank you! I will add this and some tests.

@deadalnix
Copy link
Contributor

I'm going to close this. There are too many PR in parallel that are not moving.

@deadalnix deadalnix closed this Aug 16, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants