Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[#2780] Improvement(server,core): Move tree lock from rest api to the corresponding implementation to minimize tree lock range. #2873

Merged
merged 14 commits into from
Feb 28, 2025

Conversation

yuqi1129
Copy link
Contributor

What changes were proposed in this pull request?

Modify the rest API and move the tree lock to the core module.

Why are the changes needed?

The rest API should not be locked entirely by a tree lock.

Fix: #2780

Does this PR introduce any user-facing change?

N/A.

How was this patch tested?

N/A.

@yuqi1129 yuqi1129 self-assigned this Apr 10, 2024
@yuqi1129 yuqi1129 closed this Apr 10, 2024
@yuqi1129 yuqi1129 reopened this Apr 10, 2024
@yuqi1129 yuqi1129 closed this Apr 11, 2024
@yuqi1129 yuqi1129 reopened this Apr 11, 2024
@yuqi1129 yuqi1129 closed this Apr 11, 2024
@yuqi1129 yuqi1129 reopened this Apr 11, 2024
});
Table loadTable = dispatcher.loadTable(tableIdent);

Partition p =
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As we have moved the logic about tree lock into dispatcher, for partitioning-related logic, we can't completely remove tree lock from APIs.

This seems to be not very elegant, but I can't find a better one till now.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The change here will possibly lead to inconsistency, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I'm working on it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In case of this, I would rethink the necessity of this change.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I have some doubts about whether PR is necessary.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can defer this and work on others firstly.

@jerryshao
Copy link
Contributor

@yuqi1129 I'm going to close this first. We can reopen this when we feel necessary to change and have a better solution.

@jerryshao jerryshao closed this Apr 16, 2024
@yuqi1129
Copy link
Contributor Author

to

Got it

@jerryshao
Copy link
Contributor

I think you can do some fine-grained access control for the lock scope, rather than blindly wrapping the whole logic into one lock. For example, like "load fileset", you only need to lock the doWithCatalog operation, no need to lock the whole logic. For "create fileset", you can lock the catalog for property validation, and again lock the fileset for fileset creation. Can you please carefully check the code, and optimize them as possible as you can.

@yuqi1129
Copy link
Contributor Author

I think you can do some fine-grained access control for the lock scope, rather than blindly wrapping the whole logic into one lock. For example, like "load fileset", you only need to lock the doWithCatalog operation, no need to lock the whole logic. For "create fileset", you can lock the catalog for property validation, and again lock the fileset for fileset creation. Can you please carefully check the code, and optimize them as possible as you can.

@jerryshao

I have encountered several code as following:

image

Is this behavior acceptable? The consequence is complex and the goal of tree lock is to guarantee the behavior in
the method is determinate. According to the code above, I have not seen the effect of the tree lock, and seems to be the same effect without any lock.

Could you show your review on this point? If this is acceptable, I will take it and change many places accordingly.

@jerryshao
Copy link
Contributor

I see your point, the original code may have some problems, or it can be optimized. I think it would be better to optimize (guarantee the consistency) as much as we can. If it cannot be achieved, then the change should be no worse than the original code.

}

@Override
public User getUser(String metalake, String user)
throws NoSuchUserException, NoSuchMetalakeException {
return userGroupManager.getUser(metalake, user);
return TreeLockUtils.doWithTreeLock(
AuthorizationUtils.ofGroup(metalake, user),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also here.

}
throw new RuntimeException(e);
}
});
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need add lock for this part, I feel it is not so necessary, what do you think? @mchades

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree. There should be no race condition for a mocked catalog. And the test result only represents the status at that time.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see.

Comment on lines +644 to +660
try {
catalogWrapper.doWithPropertiesMeta(
f -> {
Pair<Map<String, String>, Map<String, String>> alterProperty =
getCatalogAlterProperty(changes);
validatePropertyForAlter(
f.catalogPropertiesMetadata(),
alterProperty.getLeft(),
alterProperty.getRight());
return null;
});
} catch (IllegalArgumentException e1) {
throw e1;
} catch (Exception e) {
LOG.error("Failed to alter catalog {}", ident, e);
throw new RuntimeException(e);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to add this part into the lock?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, there is no need to put it into the lock, the fact that this part of logic is between loadCatalogAndWrap and updataCatalog, unless we split these two logic and lock them separately can we remove the code out of lock range.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can split into two locks, it should be fine.

@jerryshao
Copy link
Contributor

@mchades can you please help to review the CatalogManager part, this part is a bit complex after you introduced the in use mechanism.

@jerqi can you please review access control part?

LockType.WRITE,
() ->
TreeLockUtils.doWithTreeLock(
NameIdentifier.of(AuthorizationUtils.ofRoleNamespace(metalake).levels()),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it should role, not role namespace. WDYT?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be fine as it will not change the name of role.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess we use role namepace not role here because we would access multiple roles (List) in this method. For ease of use, we lock the role parent directly.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jerqi Please help to confirm this problem

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer locking role namespace. Because we should align to other code.

LockType.WRITE,
() ->
TreeLockUtils.doWithTreeLock(
NameIdentifier.of(AuthorizationUtils.ofRoleNamespace(metalake).levels()),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also here.

Comment on lines +644 to +660
try {
catalogWrapper.doWithPropertiesMeta(
f -> {
Pair<Map<String, String>, Map<String, String>> alterProperty =
getCatalogAlterProperty(changes);
validatePropertyForAlter(
f.catalogPropertiesMetadata(),
alterProperty.getLeft(),
alterProperty.getRight());
return null;
});
} catch (IllegalArgumentException e1) {
throw e1;
} catch (Exception e) {
LOG.error("Failed to alter catalog {}", ident, e);
throw new RuntimeException(e);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can split into two locks, it should be fine.

.withHiddenProperties(
getHiddenPropertyNames(
return TreeLockUtils.doWithTreeLock(
NameIdentifier.of(ident.namespace().levels()),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm curious why do we need to add the write in the catalog level? I think it should be fine to add the write lock in the schema level, am I right?

Copy link
Contributor Author

@yuqi1129 yuqi1129 Feb 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The following scenario will cause problems if we use write lock in the schema level,assuming the schema name is schema1

TheadA                                                 ThreadB

Hold schema1  write lock

Rename schema1 to schema2                

Store schema2 to storage                 hold schema2 write lock   

Block due to some reason                rename schema2 to schema1
return                                   store schema2 to storage 
   x                                                       return  
                                                 

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This problem only happens for rename, I think we don't support schema rename. Besides, we can treat rename and other change separately, if the changes contain rename, then we use parent level write lock, otherwise we use its own write lock. WDYT?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me verify this and check if this can be applied to catalogs, tables, and other objects.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed the code and determined if we need to lock the parent or itself.

@@ -194,70 +198,75 @@ public Table createTable(
public Table alterTable(NameIdentifier ident, TableChange... changes)
throws NoSuchTableException, IllegalArgumentException {
validateAlterProperties(ident, HasPropertyMetadata::tablePropertiesMetadata, changes);
return TreeLockUtils.doWithTreeLock(
NameIdentifier.of(ident.namespace().levels()),
LockType.WRITE,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also here, why alter operation needs parent write lock?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

@yuqi1129
Copy link
Contributor Author

I think we can split into two locks, it should be fine.

Changed.

Comment on lines +72 to +75
TreeLockUtils.doWithTreeLock(
AuthorizationUtils.ofRole(metalake, role),
LockType.READ,
() -> roleEntitiesToGrant.add(roleManager.getRole(metalake, role)));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess with the overhead of requesting multiple read lock, the performance may not be as good as request one parent read lock. I would be inclined to use the old way, what do you think?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Locking the parent node with read type is NOT correct logically. For example, if we lock the parent lock with a read lock, other threads can still modify the role when this method is loading the role, so if we are going to use the namespace lock, the lock type should be changed to WRITE. Is that acceptable to you?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Write lock is not acceptable.

@jerryshao jerryshao merged commit 9f5f3ee into apache:main Feb 28, 2025
28 checks passed
Abyss-lord pushed a commit to Abyss-lord/gravitino that referenced this pull request Mar 2, 2025
…to the corresponding implementation to minimize tree lock range. (apache#2873)

### What changes were proposed in this pull request?

Modify the rest API and move the tree lock to the core module. 

### Why are the changes needed?

The rest API should not be locked entirely by a tree lock.

Fix: apache#2780 

### Does this PR introduce _any_ user-facing change?

N/A.

### How was this patch tested?

N/A.

---------

Co-authored-by: Jerry Shao <jerryshao@datastrato.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Improvement] Minimize the scope of using tree lock
4 participants