|
View:
New views
4 Messages
—
Rating Filter:
Alert me
|
|
|
Kernel page cache and FUSEHi all,
I really like FUSE and have I've written my first FUSE filesystem called s3backer <http://code.google.com/p/s3backer/>. All this filesystem does is contain a single normal file which is backed by a network remote data store (Amazon S3). The file is divided up into blocks (typically will be same size as kernel page size) and then you do a loopback mount of a normal filesystem on top of this file. As the "upper" filesystem reads and writes blocks, the "lower" s3backer filesystem reads and writes over the network. I'm sure you've seen a similar arrangement before in other FUSE filesystems. The result is that you treat the single file in the FUSE filesystem more like a hard disk type block device, where the "hard disk" storage is remotely located over the network. My questions all relate to kernel caching of the file data in this scenario. I'm mostly ignorant about how exactly Linux kernel caching works. And this is a complicated scenario because it involves two filesystems (the "upper" one and the "lower" one) and a loopback mount... How do the kernel page cache, and the data blocks read from/written to the FUSE filesystem (on behalf of the "upper" filesystem) interact? How does the kernel handle caching of the underlying file data blocks when doing a loopback mount? Does the fact that the underlying file is within a FUSE filesystem matter at all? If i create a bunch of swap space, will the kernel take advantage of it and therefore do more caching of the FUSE file's data blocks? Or will the kernel refuse to cache file data blocks in swap because it treats the FUSE file like a hard disk because another filesystem is loopback-mounted on top of it? Please see this discussion on the wiki<http://code.google.com/p/s3backer/wiki/ManPage>for more info on some of the questions raised. Thanks, -Archie P.S. Apologies if this topic has already been addressed, the sourceforge mailing list search seems broken. -- Archie L. Cobbs ------------------------------------------------------------------------- This SF.Net email is sponsored by the Moblin Your Move Developer's challenge Build the coolest Linux based applications with Moblin SDK & win great prizes Grand prize is a trip for two to an Open Source event anywhere in the world http://moblin-contest.org/redirect.php?banner_id=100&url=/ _______________________________________________ fuse-devel mailing list fuse-devel@... https://lists.sourceforge.net/lists/listinfo/fuse-devel |
|
|
Re: Kernel page cache and FUSE(Sorry about the late answer, it must have slipped my attention).
On Fri, 18 Jul 2008, Archie Cobbs wrote: > I really like FUSE and have I've written my first FUSE filesystem called > s3backer <http://code.google.com/p/s3backer/>. All this filesystem does is > contain a single normal file which is backed by a network remote data store > (Amazon S3). The file is divided up into blocks (typically will be same size > as kernel page size) and then you do a loopback mount of a normal filesystem > on top of this file. Note, that loop over fuse is something not "supported" in the sense that I haven't really thought about all the nasty corner cases that happen when the machine is out of memory and is trying to free some up by writing out dirty data through the loopback device and then through fuse. This is generally a difficult problem, and even you suggested "open file with O_DIRECT" thing wouldn't solve it, as the cached data belongs to the filesystem, not to the loop device. > As the "upper" filesystem reads and writes blocks, the "lower" s3backer > filesystem reads and writes over the network. I'm sure you've seen a similar > arrangement before in other FUSE filesystems. The result is that you treat > the single file in the FUSE filesystem more like a hard disk type block > device, where the "hard disk" storage is remotely located over the network. Why not NBD? That has been designed especially for this. > My questions all relate to kernel caching of the file data in this scenario. > I'm mostly ignorant about how exactly Linux kernel caching works. And this > is a complicated scenario because it involves two filesystems (the "upper" > one and the "lower" one) and a loopback mount... > > How do the kernel page cache, and the data blocks read from/written to the > FUSE filesystem (on behalf of the "upper" filesystem) interact? > > How does the kernel handle caching of the underlying file data blocks when > doing a loopback mount? Does the fact that the underlying file is within a > FUSE filesystem matter at all? Yes, it matters, when writing out file backed dirty data to free up memory, the kernel has complicated mechanisms to prevent deadlocks when more memory is needed to complete the write. When fuse is involved, the kernel doesn't have any idea that the allocation by the filesystem is special and needs to complete in order to complete the original write request. Miklos ------------------------------------------------------------------------- This SF.Net email is sponsored by the Moblin Your Move Developer's challenge Build the coolest Linux based applications with Moblin SDK & win great prizes Grand prize is a trip for two to an Open Source event anywhere in the world http://moblin-contest.org/redirect.php?banner_id=100&url=/ _______________________________________________ fuse-devel mailing list fuse-devel@... https://lists.sourceforge.net/lists/listinfo/fuse-devel |
|
|
Re: Kernel page cache and FUSEOn Mon, Aug 25, 2008 at 5:37 AM, Miklos Szeredi <miklos@...> wrote:
> > (Amazon S3). The file is divided up into blocks (typically will be same > size > > as kernel page size) and then you do a loopback mount of a normal > filesystem > > on top of this file. > > Note, that loop over fuse is something not "supported" in the sense > that I haven't really thought about all the nasty corner cases that > happen when the machine is out of memory and is trying to free some up > by writing out dirty data through the loopback device and then through > fuse. > Yes, this is an interesting/murky area. I guess it depends on how robust the algorithm for writing back dirty blocks is. For example, in this scenario the system will need to write back certain dirty pages (upper filesystem files) before it can write back other dirty pages (lower filesystem files). So as long as the algorithm keeps trying and cycling through, it should eventually perform correctly... where "correctly" means that if there is any possible way to free memory the system will eventually figure it out. > This is generally a difficult problem, and even you suggested "open > file with O_DIRECT" thing wouldn't solve it, as the cached data > belongs to the filesystem, not to the loop device. > You're referring to the upper filesystem, correct? From my tests it appears that 'direct_io' does indeed prevent any files from a FUSE mount from being cached. For the upper filesystem, mounting with 'sync' should help the "upper" caching problem I'd imagine... Earlier I created a patch to mount(8) and losetup(8) to add a "direct" flag (here <http://article.gmane.org/gmane.linux.utilities.util-linux-ng/1731>)... but then realized that FUSE doesn't support opening files with O_DIRECT... and in any case, the 'direct_io' option does the same thing. > > As the "upper" filesystem reads and writes blocks, the "lower" s3backer > > filesystem reads and writes over the network. I'm sure you've seen a > similar > > arrangement before in other FUSE filesystems. The result is that you > treat > > the single file in the FUSE filesystem more like a hard disk type block > > device, where the "hard disk" storage is remotely located over the > network. > > Why not NBD? That has been designed especially for this. > s3backer is designed to work specifically with Amazon S3, which uses HTTP for access and has weaker guarantees on data and timing than a "normal" block device. > > How does the kernel handle caching of the underlying file data blocks > when > > doing a loopback mount? Does the fact that the underlying file is within > a > > FUSE filesystem matter at all? > > Yes, it matters, when writing out file backed dirty data to free up > memory, the kernel has complicated mechanisms to prevent deadlocks > when more memory is needed to complete the write. > > When fuse is involved, the kernel doesn't have any idea that the > allocation by the filesystem is special and needs to complete in order > to complete the original write request. > If the kernel asks a filesystem to write a dirty page, and the filesystem comes back with ENOMEM, will the kernel try again later? -Archie -- Archie L. Cobbs ------------------------------------------------------------------------- This SF.Net email is sponsored by the Moblin Your Move Developer's challenge Build the coolest Linux based applications with Moblin SDK & win great prizes Grand prize is a trip for two to an Open Source event anywhere in the world http://moblin-contest.org/redirect.php?banner_id=100&url=/ _______________________________________________ fuse-devel mailing list fuse-devel@... https://lists.sourceforge.net/lists/listinfo/fuse-devel |
|
|
Re: Kernel page cache and FUSEOn Mon, 25 Aug 2008, Archie Cobbs wrote:
> > This is generally a difficult problem, and even you suggested "open > > file with O_DIRECT" thing wouldn't solve it, as the cached data > > belongs to the filesystem, not to the loop device. > > > > You're referring to the upper filesystem, correct? From my tests it appears > that 'direct_io' does indeed prevent any files from a FUSE mount from being > cached. Right, but caching isn't really problematic, only caching "dirty" data is. And fuse doesn't do that normally (now that it supports writable mmaps, having dirty pages can happen, but obviously mmap is not practical without caching). > For the upper filesystem, mounting with 'sync' should help the "upper" > caching problem I'd imagine... Oh, OK. Yes, that should help, though I don't know how the 'sync' option is actually implemented. The page cache is still probably involved in that case. > > When fuse is involved, the kernel doesn't have any idea that the > > allocation by the filesystem is special and needs to complete in > > order to complete the original write request. > > > > If the kernel asks a filesystem to write a dirty page, and the filesystem > comes back with ENOMEM, will the kernel try again later? No, but even that's not the biggest problem, because ENOMEM will happen only when the machine is _really_ out of memory, and then nothing better can be done anyway. What is really bad if the allocation deadlocks completely, because it's waiting for the memory to be freed up, and that memory happens to be the one that is currently being written out. This used to be a big problem, but the dirty memory limiting was made more robust so it cannot use up all the memory in the system. But I think there are still corner cases where an allocatation can hang on writeback in the loop over fuse scenario. Miklos ------------------------------------------------------------------------- This SF.Net email is sponsored by the Moblin Your Move Developer's challenge Build the coolest Linux based applications with Moblin SDK & win great prizes Grand prize is a trip for two to an Open Source event anywhere in the world http://moblin-contest.org/redirect.php?banner_id=100&url=/ _______________________________________________ fuse-devel mailing list fuse-devel@... https://lists.sourceforge.net/lists/listinfo/fuse-devel |
| Free Forum Powered by Nabble | Forum Help |