-
Notifications
You must be signed in to change notification settings - Fork 416
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[#6561] improvement(core): Cache Hadoop Filesystem instance on Gravitino server to improve the performance #6619
base: main
Are you sure you want to change the base?
Conversation
@yuqi1129 PTAL, thanks |
I will take some time to verify whether this works in the loud filesystem as the related ITs are not triggered by default. |
* @return The FileSystem instance. | ||
* @throws IOException If the FileSystem instance cannot be created. | ||
*/ | ||
default FileSystem getFileSystem( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will this method only be used by the Gravitino server?
@Nonnull Path path, @Nonnull Map<String, String> config, boolean disableCache) | ||
throws IOException { | ||
// disable cache | ||
config.put(String.format("fs.%s.impl.disable.cache", scheme()), String.valueOf(disableCache)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if the config already contains key fs.%s.impl.disable.cache
? will we cover it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It will overwrite. I think we should prioritize the disableCache parameter.
ok |
@xloya @jerryshao I don't see any issues. Can you take a look? |
If I understand correctly, the current version of FileSystem Provider will also be used for Hadoop GVFS. I think using Hadoop's own cache here will cause security issues, because the authenticated FileSystem is no longer encapsulated by GVFS, and others can arbitrarily obtain the authenticated FileSystem through |
thanks, Let me check it again. |
On the Gravitino server side, the fs cache is only used by the Gravitino server. Why can't we use the file system cache? @xloya , can you help to clarify it more clearly. In GVFS client, indeed, there will be a security vulnerability if the fs cache is enabled. |
The GVFS side uses its own maintained cache, and when GVS initializing Filesystem, it will force not to use FS cache |
What I mean is that we could enable Hadoop Filesystem cache in the server side, but the current
|
What changes were proposed in this pull request?
Cache Hadoop Filesystem instance on Gravitino server to improve the performance
Why are the changes needed?
Fix: #6561
Does this PR introduce any user-facing change?
N/A
How was this patch tested?
N/A