-
Notifications
You must be signed in to change notification settings - Fork 47
Turn [FUSE] into a VFS plugin ('fixed'...) #193
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
The current 'stand-alone server' implementation is over at https://github.com/genodelabs/genode-world/tree/master/src/server/fuse_fs Doing the transformation will...
|
So, to keep this ticket in sync with recent developments on the ML: It seems the patch for Fuse-as-Server no longer quite applies to Fuse-as-Plugin, due to a chicken-and-egg problem in the latter scenario (the chicken and the egg being: VFS plug-in depending on LibC, and LibC being initialized only after all VFS plug-ins are ready :-). Thus the original patch should be expanded to make sure Fuse no longer relies on Libc -- ** I thought about 3 possibilities : 1) (as suggested in the linked EMail) patching to use VFS calls instead of LibC (but this probably requires amending each file system in turn : Fuse-NTFS, Fuse-BFS, and so on, unless they all use the exact same calls that can be stubbed in a common way?), or 2) finding a workaround, e.g. loading the VFS plug-in after the server is fully initialized, dlopen()-style, or 3) sticking to using the Fuse-server, which seems to still work and pass tests, despite the underlying complexity (a C program calls FS, which calls Fuse-Server, which calls LibC NTFS etc code, which calls VFS, which reads the block). Thinking about it, Genode-Labs might be too polite to say so, but one has to wonder if I'm right to push for implementing this in Genode world :-). After all this is of little interest to the community, it might make more sense for everyone that I keep this off-repo, in my own code ? Thus I can rush the more prioritary stuff, to get my code to production (which does not completely precludes my returning to this ticket later), without involving Genode Labs (I'll have more exciting challenges to offer them later, after all, as I'm preparing some reflexion about implementing an indexing layer on top of FS ;) ) I'll give a shot (for a limited time) to 1) and 2) above, and revert to 3) if it turns out to make more sense for everyone. EDIT: So I started on 1), not the actual .c stubbing (let alone implementing the required parts), just the .h headers :
I thought it'd get easier as I went along but the opposite is happening, the more progress I make, the more requirements from contrib/ntfs (and a few from fuse-lib) I encounter... |
Thank you for the very good summary.
That's what I'm hoping for, at least for regular file systems that operate on some kind of block device/image. Intuitively, I'd expect not much more than open, read, write, seek, sync, close. But I may be too optimistic. ;-)
It is unfortunately not merely a matter of initialization order. I tried to hint at the problem of the distinction of the two libc execution contexts (kernel, application) on the mailing list (but this may have been to confusing). But as rule, since the VFS plugin is executed as part of the libc kernel, it cannot play the role of a regular application context calling into the "libc kernel".
That would be very welcome, especially if you give the first option a try. I sense that this would be the most sustainable and clean approach.
You may consider just using the regular libc headers as is. There is no need to stub the interfaces after all. For a first take, you may copy the include path definitions from the libc's import file (https://github.com/genodelabs/genode/blob/master/repos/libports/lib/import/import-libc.mk). In my view, one only needs to replace the implementation of the libc operations called by the fuse code. Once, a a fuse plugin calls, e.g., open, there will be an undefined reference at link time. Your job would be resolving this reference by providing a custom implementation. To make the distinction between the regular libc and this stubbed version a bit more clear, you may consider renaming those I/O operations of interest when compiling the fuse code by adding compile options like:
This way, all operations your your binary interface will be nicely prefixed. |
tl;dr: Details: Turns out if I remove the "LIBS = libc" line and provide the libc includes separately, it still links (?) and still runs (??)... and still aborts at the same place in the libC ; I guess it must be resolving LibC symbols from the "test-vfs-libc" component ? So turns out the "-Dopen=..." aliasing trick is not optional but mandatory ; was happy to have read about that trick too, wouldn't have thought about it otherwise! :-) So using aliasing on a half dozen symbols, I get symbols not found (at runtime), as expected. I provide stubs, and... Still run into LibC. Hmmm. Must still be using other APIs, in addition to the ones I "hijacked". At that point I'm wondering if I'm stuck, because I get no indication of which libc functions are (still) being used. After getting a string of identical libc/kernel errors I decided to dive into the NTFS code to determine what its needs are ; fortunately I found a shortcut, only had one file (unix_io.c) to review. Collecting the whole lot here did the trick:
I then get this:
So now, gotta figure out how to fill the "stat" structure, and then implement pread/pwrite/read/write (probably will be inspired from lwext4/block.cc ?). Regarding the overall impossibility of using libc in a VFS plug-in, that's important to know ; I plan to design an "attribute indexer" VFS plug-in, whose work will consist of "snooping" on file operations, forward file ops to their intended recipient, and dupe-write file-op metadata to a separate recipient. |
Hi, I think you could try to add to your vfs library
|
That's true. If the application is linked against the libc, the dynamic linker will indeed happily resolve the symbols. I haven't seen that coming.
It may also be insightful to review the undefined symbols of the fuse library using I think that @tomga's suggestions sounds even better. If it works, it will be watertight. |
tl,dr: Stuck on an LD crash ; dunno how to diagnose it but will try to figure something out. Details: Went through a few easy 'adventures' yesterday and today, but now hitting a more difficult one. After implementing Block support and so on I got the code working as discussed, Once I saw the NTFS code seemed satisfied with stat(), seek(), read() LD crashes, whether calling from the ctor or from open_dir:
Test-libc loads ld.lib.so at offset 0x30000, so the IP resolves to:
That's a one-liner function:
I can somewhat imagine that a symbol is missing, and LD crashes due to some Misc stuff: *) tried the "no undefined" LD trick, could come in handy indeed!, but gcc does not like it. Been digging a bit, can't formulate in a way that GCC likes (might be more like "whole-archive" or some such) *) I noticed I have to be careful when building a Packet_descriptor, careful not to issue a Packet_descriptor::READ for 0 blocks, otherwise qemu writes out this AHCI error....
and freezes (not sure where the freeze comes from, probably from qemu rather than Genode?) *) for my attr indexer project coming down the line: I'm still putting my ideas in shape; I realized that I should formulate things from the very beginning:
EDIT : here's nm on the VFS plug-in: EDIT 2: Ok, was reviewing the log output in the 'success' case, and now I think there's a hint as to why it fails 8-) tick193_success_when_called_late.txt In the success case, the "config" ROM is read before I do my Fuse/NTFS stuff. It seems to be opposite to the "crash" case. Hmmm on second-second thought, it might not be so much related to "config", but (again) to LibC init, which occurs at about the same time. Might be that in the "early" case, I'm missing some LibC symbols that are not provided by my code/stubs, but they are available in the "late" case. I wonder why LD does not clearly error out in that case though. Maybe I should get the code in shape and zip it up here. EDIT 3: Ok it's definitly a problem with malloc(), which makes perfect sense (I didn't provide it yet, so it's using the one from not-yet-initialized libc, completely obvious train wreck in hindsight). Was easy enough to pin-point with some tracing too. |
Great to read that you got the first life signs of the fuse code!
The IP in child.h does not make any sense because your program is not dealing with the creation of a child component. In my experience, most crashes within ld.lib.so are caused by the corruption of a data structure at the component/application level. Note that the implementation of utilities like the various allocators, avl_tree, etc. reside in the dynamic linker. So if the component code (like the vfs) uses such a utility and messes up, one will observe a crash in ld.lib.so despite the dynamic linker being innocent. Your mentioning of For libc-using applications, global constructors do work because the libc calls them (https://github.com/genodelabs/genode/blob/master/repos/libports/src/lib/libc/component.cc#L65). So libc-using applications are fine. But the global ctors are notably executed after the creation of the libc kernel (which includes the vfs). Hence, the vfs (and its plugins) must not depend on global constructors. (for more background about global constructors and Genode, you may take genodelabs/genode#3509 and the referenced issues as starting point, but maybe its better not to enter this rat hole too deeply) These constraints for VFS plugins are a bit painful and I'm feeling bad for you hitting such walls. But let me assure that they are not arbitrary. Also, this is just a suspicion of mine. (maybe the problem is unrelated the global ctors) Regarding the use of the block session, I think that the code you are taking as reference is very much outdated. Nowadays, we don't deal with the packet-stream API directly because it requires the manual handling of many complicated corner cases. Please better have a look at the modern |
Yay, back on track, after realizing the crash was occuring in LibC malloc() -- as you suspected Norman, that corrupts the instruction pointer and makes it look like LD is to blame. But now after adding the whole malloc shebang, the ntfs_mount() function almost completes, it gets to the very last pread (the one that reads just 8 bytes instead of a whole sector), and then crashes as before. So I suspect it's crashing in some sort of clean-up code, like in free() (except I did implement free()). I might post an update later today or tomorrow announcing I fixed that one last crash, fingers crossed. Don't worry I'll tell you if I have second thoughts about this adventure 8-), but it looks like you were right, I'm very close now : it seems I just have to solve this one last crash, and then file-system mounting is implemented, and I just have to implement the half-dozen VFS plug-in hooks (opendir, unlink, etc). That should just consist of adapting the old FS implementation, to its new VFS-plugin home. Misc: I also found out how to enable full-fledged ntfs logging to (emulated) stderr. Though it's less useful than I expected (ntfs-3g logging focuses on what is read, rather than tracing code coverage). I think we're good re. static C++ ctors, there are only global pointers (pointer to Env, e.g.), and their value gets assigned "late", in component-construct. To be absolutely sure I suppose we could return to the LibC-based FUSE and instrument call_global_constructors-or-what's-his-name(), but FUSE is pure C code, and I'm not seeing hints of missing init. I usually keep an eye out for anything "static" supposed to be initialized even before main() is called (I had bad experiences for years in Haiku with that, so I 100% sympathize with Genode's view on that kind of malpractice ^^) Edit: ideally some day I could skip on global vars, do like the BSD audio_drv which embeds Genode::env inside the BSD structs ; added a ToDo for that. Looked at Block::Connection a bit, seems I should wrap my head around the latter example especially, in part_block ; added a TODO as a reminder that my current code is temporary (though block retrieval seems to work well so far, with the NTFS code being satisfied of what it sees in the boot sector, in the inodes, and so on) and to urgently clean it up once it gets in semi-working shape. |
Oh yeah, forgot to mention something significant : I cannot alias free() like I've been doing so far with read()/write()/etc, because that ends up aliasing not only the LibC free(), but also a Genode symbol (from memory, it looks like "Genode::Avl_allocator::free()"), which makes the linking stage fail, as it looks for "AvAlloc::fuselibc_free()" instead of the correct symbol. I've since realized that I can get away with not aliasing some symbols, and so I'm doing that in the case of free(), I'm overriding the same-named symbol, and it seems to get picked in priority over the LibC symbol. But if the above case ends up being the culprit for trouble I'll take a second look ; maybe I'll move the "-Dfree=fuselibc_free" away from the makefile, and into the .cc files in the vfs plug-in case (the ntfs-3g library case can probably remain as-is). |
Thanks for keeping the story going. It's really nice to watch! :)
Now I can exhale again.
The build system allows you to supply compiler opens for a specific compilation unit, which may become handy in this case. E.g., if you have, let's say, a fuse.cc file, you can specify:
|
There! Fixed it all. With the progress on ntfs side, the code was eventually transitioning from ntfs-3g to libfuse proper, and I had patched the other two makefiles but not this one... So copy-pasted the makefile stuff to that third makefile and now I can see again (after a hiatus for a few days) the "test failed exception" message, from test-libc component. Phew. (and that's still with the plain "unaliased" free() symbol hack ;-) Will have to refactor all three makefiles so that they lift their aliasing definitions from an "include'ed" .inc common file or something, but that can wait. |
Latest : to pass the first (mkdir) test of the test-suite, I have to implement write support. But here, I have to take a little detour, as fuse/ntfs makes writes whose beginning and end offsets are not aligned to 512-byte sector boundaries, which neither the old Block API nor new Block/Job API seem to support (which makes sense). So I'll add some logic in the fake-libc to 1) align offset to 512 bytes boundaries, and 2) perform a read of the sector before performing a write to it, so that I write back its own data (instead of blank zeros) in the "gray zone" between aligned and non-aligned offsets, in addition of course to writing the actual ntfs data at the "hot non-aligned zone" itself. |
This is similar to what the block VFS plugin (https://github.com/genodelabs/genode/blob/master/repos/os/src/lib/vfs/block_file_system.h) is already doing (albeit in a way that could be much improved, see genodelabs/genode#2263). To avoid solving the same problem twice, how do you think about the idea of letting the fuse plugin operate on another file of the VFS instead of a block session? In the VFS config, the path to the "block-device" file could be specified as XML attribute to the fuse plugin, e.g., (just a sketch)
I think this approach brings two benefits. First, it allows one to use of any file present in the VFS to be uses as disk image, e.g., a file mounted as a For an example of a VFS plugin that already works like this, you may take a look at the vfs_ttf plugin at gems/src/lib/vfs/ttf/ (https://github.com/genodelabs/genode/tree/master/repos/gems/src/lib/vfs/ttf). It reads the path for a TTF file from its config, accesses the TTF data from this file via the VFS, and provides a pseudo file system by itself. |
Ok, I'm good on the "read" side of things : found a class named Genode::Readonly_file that allows to read data from the blocks-as-file, and I'm (again) at the point where FUSE wants to write. On the write side though, there is indeed a class "Genode::New_file", but it calls ftruncate(), seek/append etc, so probably won't do for write-in-place operations. I'll look into writing my own "Existing_file" class or something, tomorrow :-) |
Thanks for continuing the story. That sounds very promising! The vfs.h header that you found is merely a convenience wrapper around the raw vfs mechanisms to accommodate the few cases where we use the VFS directly from native Genode applications w/o libc. The raw vfs would be too low-level to be useful (or enjoyable) for the application level. You are right that there is no convenience wrapper for random-access writes. That's just because we currently have no Genode-native applications that requires such patterns. Most file-system-heavy regular applications use the libc after all. You may opt to come up with a new class that models your use case. But since you are working at the vfs "plumbing" level anyway, you could also chose to call raw vfs API directly. I'd recommend to try the latter first. The main advantage of using the vfs API directly in your case is that it gives you the chance to keep up the asynchronous nature of the requests that your plugin receives. You may have noticed that the That's why I find the distinction between application-level and "plumbing" level useful. At the application level, the blocking semantics of a loop like https://github.com/genodelabs/genode/blob/master/repos/os/include/os/vfs.h#L457 are nice because it greatly simplifies the application code that would otherwise be bothered with maintaining state machines. But at the plumbing level where we are now, it is better to think of the code as a big state machine. E.g., if a read takes multiple steps, this fact should best be expressed by state variables. |
I should note that I'm not sure how compatible the fuse API is with the asynchronous way of operation described above. If the fuse file system issues blocking calls, there is probably no simple way to implement it. If so, please don't let me remarks above bother you too much. ;-) |
Latest:
Re. async nature of VFS plug-ins: Makes perfect sense to me, Makes complete sense especially for open-ended wait delays like keyboard input in a terminal, Is that condition detectable only at a low level (say, ahci_drv), or can that be Or maybe a way smarter scheme (if it works at all), would be to detect the condition Anyway that'll be quite an adventure -- for now I'm worried about preparing an official |
I like your idea to test all preconditions for the non-blocking operations of the fuse code before calling it. But should the 3rd-party code have any sequential inter-data-dependencies (reading something, immediately doing something else with the result), we are probably out of luck. In this case, one might end up purusing a cooperative task-scheduling scheme (using setjmp/longjmp), like we do internally in the libc between the kernel and applications contexts. But I guess we are getting too fancy now. Please excuse the distraction. It is of course perfectly fine to let the plugin issue blocking operations and be happy for now. To be perfectly honest, the rump VFS plugin that we use in Sculpt everyday is also using blocking I/O at the backend. The limitations I described apply here just as well.
Please don't worry too much. If the FUSE plugin is able to access a file system that we cannot access otherwise, this is clearly an improvement over the status quo. To avoid wrong expectations by any users, its best to document the state of affairs and your doubts in the README. If others find the plugin useful, or are even willing to invest time in further improving it, all the better! :) |
@ttcoder , it is probably not important now as you are at much further point but I checked this as I was sure that it should work in principal. When I wrote previous proposal for option for
and this yields:
for my test program where |
@tomga That syntax works indeed! The resulting errors are massive though, a couple dozen screens' worth :-o So, here's the snapshot of the code as it is today. It's very naïve code, where each hook calls fuse()->op directly without doing book-keeping of open nodes. It has lots of shortcuts and missing bits, even though it 'passes' the tests. I intend to extend the "test-libc" testsuite to expose the weaknesses, and then I'll be in a good place to fix them. I might take a detour though : with my newfound understanding of Genode, I feel like I could tackle other components needed for my project, e.g. accessing extended attributes via ioctl() calls, and then return to this ticket in a bit :-) Here's the diff of existing files... And the archive of new files... |
@ttcoder I'm glad that it worked for you and that (in some way) it helped. Regarding missing symbols from |
@ttcoder it's nice to see the “FUSE train” moving once again. Enthusiastically, I imported your attached archives but I was not able to build and execute the test scenario (vfs_fuse.mk and fuselibc stubs are missing AFAICT) right away - which is perfectly fine given the WIP nature of the code. Since I have not looked at the FUSE code in quite a while, I took the opportunity and fixed up the old fuse_fs implementations for ext2 and exfat. I'll prepare a commit for inclusion in world that contains those fixes (I still have to properly test them) and removes libc_fuse. Even if those implementations are eventually replaced by the VFS fuse plugin based ones, we could keep them around for now. Now speaking of the VFS plugin, some time ago I attempted to turn the (The long-term solution would be to make the libc more modular in this regard, e.g. separating I/O monitoring and VFS handling, but I fear that is substantially more involved ☺.) |
Glad to see you enter the fray Josef! The previous attach of vfs_fuse.mk might not have come through, here's an attach again, of today's version: Notes:
I see you've fixed the 'server' type Fuse, adding with_libc statements ; I might have been misleading previously, as in my first message I recommended doing that (and even provided a patch), but now I realize this type of patching can be thrown away, as it does not survive the transition to making a VFS plug-in, which does not require with_libc, sorry for not mentionning that :-/ Though now that you do have a working 'fuse-server', you can use that for regression testing when doing the conversion to a 'fuse-vfs' plug-in, so in that way it could be worth the energy. Let's keep in touch and discuss (here or discord/revolt.chat at your convenience), I want to help as much as I can. Tomorrow I should be online starting in the early afternoon (euro time like you guys in Dresden). |
Thanks.
Yes, this is indeed just to get the server working again - while there I already identified 3 bugs that prevented the libc_vfs_fs_test from succeeding. So, as far as this test is concerned, the
In hindsight it was not that laborious but without the demand… it is what it is.
I think for now using the issue tracker is fine. While looking at the direct usage of |
Hello Genode team! So I've uploaded the plug-in to the "ge-drivers" folder of my repo, here's the history : https://chiselapp.com/user/ttcoder/repository/genode-haiku/timeline?udc=1&ss=m&n=3&y=ci&advm=0&chng=genode-haiku%2Fge-drivers%2F* (github seems to "eat" the trailing star "*", make sure to add it to the URL or the page will be blank) I will maintain the plug-in at that repo from now on. I won't go ahead and take it on myself to close this ticket as I believe you guys might want to retain the option to host some FUSE code in genode-world ? Will label this as somewhat 'fixed' to faciliate triage though. Thanks for all the help, I wouldn't have made it to the finish line without it ! |
This relates to the patch in genodelabs/genode#3383
Before applying it, FUSE has to be modernized and become a VFS plugin usable within the 'vfs' server (or directly inside the client), instead of being a stand-alone component/server.
It will then be easier to see if some/all of the patch still applies.
The text was updated successfully, but these errors were encountered: