Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[#3854] feat(core): Add a cache service to cache the mapping of name identifer to id #3986

Draft
wants to merge 27 commits into
base: main
Choose a base branch
from

Conversation

yuqi1129
Copy link
Contributor

What changes were proposed in this pull request?

Cache the mapping of name identifier to id.

Why are the changes needed?

To enhance the speed of API.

Fix: #3854

Does this PR introduce any user-facing change?

N/A

How was this patch tested?

N/A

@yuqi1129 yuqi1129 self-assigned this Jun 27, 2024
@yuqi1129 yuqi1129 closed this Jul 17, 2024
@yuqi1129 yuqi1129 reopened this Jul 17, 2024
@yuqi1129 yuqi1129 requested a review from mchades July 29, 2024 07:37
@yuqi1129
Copy link
Contributor Author

yuqi1129 commented Aug 7, 2024

@jerqi @mchades
Could you assist in reviewing this PR?

@jerqi
Copy link
Contributor

jerqi commented Aug 7, 2024

How to handle rename operation?

@yuqi1129
Copy link
Contributor Author

yuqi1129 commented Aug 7, 2024

How to handle rename operation?

Please see the code:

image

}

public Long get(EntityIdentifier key) {
return ident2IdCache.getIfPresent(key);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the cache is not hit, does the upper layer need to retrieve it from the underlying storage by itself?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, in some scenarios we only need to judge whether the value associated with the key exists.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am a bit confused. Shouldn't the cacheService automatically help the caller to query the underlying layer when the cache expires? But I didn't see the logic of querying from the underlying storage in your cache.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can use

  public Long get(EntityIdentifier key, Function<EntityIdentifier, Long> mappingFunction) {
    return ident2IdCache.get(key, mappingFunction);
  }

instead.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your class name is NameIdMappingService, not a caching service. Caching seems to be just an optimization technique in your service, so I think the cache should be transparent to the caller.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your class name is NameIdMappingService, not a caching service. Caching seems to be just an optimization technique in your service, so I think the cache should be transparent to the caller.

I don't understand your words clearly. Could you explain them in more detail?

Since the name of the class is NameIdMappingService for the caller, he only knows the id-name mapping and could not sense the cache mechanism, so what's the problem

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As your current implementation, Long get(EntityIdentifier key) makes the caller feel that they can directly obtain the id through the key, but in fact it only performs a cache query. That's my point.

@yuqi1129 yuqi1129 closed this Aug 8, 2024
@yuqi1129 yuqi1129 reopened this Aug 8, 2024
@yuqi1129 yuqi1129 closed this Aug 8, 2024
@yuqi1129 yuqi1129 reopened this Aug 8, 2024
Copy link
Contributor

@jerqi jerqi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From high perspective:

  1. Do you need id-name name-id mapping? Maybe we need the cache have id and name as the key.
  2. Should cache service be coupled with jdbc storage backend?
  3. Could you extract more common methods?

@@ -123,36 +126,54 @@ public boolean exists(NameIdentifier ident, Entity.EntityType entityType) throws
@Override
public <E extends Entity & HasIdentifier> void insert(E e, boolean overwritten)
throws EntityAlreadyExistsException, IOException {
EntityType entityType;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

EntityType entityType = e.getType();

} finally {
// Remove the entity from the cache again because we may add the cache during the deletion
// process
NameIdMappingService.getInstance().invalidateWithPrefix(entityIdentifier);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we should follow the pattern.
We write the slow storage first, then write the fast storage.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you provide more details for the sentence We write the slow storage first, then write the fast storage.?

public void testGetInstance() throws IOException {
NameIdMappingService instance = NameIdMappingService.getInstance();

EntityIdentifier makeLakeIdent1 =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

makeLake -> metalake.

@yuqi1129
Copy link
Contributor Author

  • Do you need id-name name-id mapping? Maybe we need the cache to have id and name as the key.
  • Should cache service be coupled with jdbc storage backend?
  • Could you extract more common methods?

Maybe we need the cache have id and name as the key.

I am unable to envision this scenario at the moment. Can you give an example here?

Should cache service be coupled with jdbc storage backend?

Let me give it a try. It seems that we need to an abstract the class or proxy that delegate EntityStore.

Could you extract more common methods?

OK

@yuqi1129 yuqi1129 added 0.6.1 and removed branch-0.6 labels Aug 15, 2024
@jerqi
Copy link
Contributor

jerqi commented Mar 11, 2025

This isn't good enough. Maybe we need a pluggable cache framework to let production users to implement their cache. Maybe Xiaomi wants to implement Redis cache. cc @lw-yang

@yuqi1129
Copy link
Contributor Author

This isn't good enough. Maybe we need a pluggable cache framework to let production users to implement their cache. Maybe Xiaomi wants to implement Redis cache. cc @lw-yang

This PR is just make it up-to-date and the design is under discussion.

@jerqi
Copy link
Contributor

jerqi commented Mar 11, 2025

This isn't good enough. Maybe we need a pluggable cache framework to let production users to implement their cache. Maybe Xiaomi wants to implement Redis cache. cc @lw-yang

This PR is just make it up-to-date and the design is under discussion.

OK, but you should make this draft first.

@yuqi1129 yuqi1129 marked this pull request as draft March 11, 2025 09:09
@lw-yang
Copy link
Contributor

lw-yang commented Mar 11, 2025

This isn't good enough. Maybe we need a pluggable cache framework to let production users to implement their cache. Maybe Xiaomi wants to implement Redis cache. cc @lw-yang

Yes, we need a pluggable caching system that can implement the Redis cache for use in the production environment.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[FEATURE] Storage system needs a id-name mapping cache
4 participants