|
View:
New views
1 Messages
—
Rating Filter:
Alert me
|
|
|
Question about FUSE performance with Linux block cacheHi, I have some questions about some performance figures I am seeing that
I believe have some relation to FUSE, although not knowing that much about how the Linux VFS layer, my supposition is that the issue is more influenced by how the Linux block cache works than by FUSE, but I'll ask here because I suspect the expertise exists to answer my question even if it's only tangentially FUSE related. Please feel free to send me elsewhere if there is a better place to ask this question! I have been experimenting with s3backer (http://code.google.com/p/s3backer), in an attempt to see if on a data center colocated server (not EC2!) it would have sufficient performance to support a reasonable network filesystem, so I could expand the 'disk space' available on my server indefinitely. The way s3backer works is: - It is a FUSE filesystem that supports a single file, of a size that the user can specify, which is broken up into blocks, which themselves are stored on Amazon S3 as individual keys. Thus any write to this file will result in the storing of one or more files (4k blocks usually, to match the kernel's block cache block size) to S3, and reading from this file will result in reading one or more of these files back from S3. - The user then makes a 'normal' filesystem of their choosing in this S3-backed file. - The user loopback mounts this file, and voila, they have a filesystem that takes no space on their own system, instead storing all of its data in S3. This is a really elegant solution to the problem of storing data on S3; S3 has really unfortunate limitations (doesn't allow partial writes of files - if you want to change 1 byte of a 1 GB file, you have to re-upload the entire 1 GB!) that s3backer gets around by breaking the file up into many small pieces, and thus any modifications result in a much smaller amount of data to be written to S3, just the 4k blocks surrounding the changed data. It is also elegant because it allows the re-use of so much existing filesystem infrastructure; not needing to write any actual filesystem code, we can re-use existing robust, mature, performant filesystmes like xfs and ext3. OK, that's how s3backer works. Now the issue I am having with it is primarily with performance (there are other issues related to the fragmentation of files into multiple blocks on S3, including the fact that these files are no longer accessible via a web browser as they would if they were stored as real files, and also the performance and cost implications of having to do over one million individual HTTP GET requests to stream a 5 GB file from S3 in this way). I am finding that flushing of changed blocks to S3 is pretty slow, compared to what I think ought to be achievable. Here are some raw numbers that I extracted from a small amount of testing: - With s3backer, I found that when flushing about 85 MB of file data to S3 (this represents approximately 21,000 4k blocks, requiring 21,000 HTTP PUT requests), I get only about 50 kBytes per second of throughput to S3. - I ran some of my own tests for writing files to S3 on the same system, using a multithreaded program that I wrote that can write a file of configurable size a configurable number of times using a configurable number of threads, thus simultaneously issuing many HTTP PUT requests. I found that with this program: - Writing the same 4k file 500 times using 1 thread only (and thus serializing the HTTP PUTs), took 78 seconds, achieving 25 kBytes per second throughput to S3 - Writing the same 4k file 500 times using 10 threads took 3.92 seconds, achieving 510 kBytes per second throughput to S3 - Writing the same 4k file 500 times using 100 threads took 0.869 seconds, achieving 2,301 kBytes per second throughput to S3 - Writing a 200k file (equivalent to 4k file written 500 times) using 1 thread took 1.93 seconds, achieving 1,036 kBytes per second throughput to S3 So, to summarize: - s3backer seems to only write data to S3 at a rate of about 50 kBytes per second - My testing shows that this corresponds to serialized writes of 4k block files to S3 - Parallel writes of 4k block files can achieve up to 40x the performance of serialized writes (using 100 simultaneous writes achieved 46x the performance of serialized writes) - Thus, I conclude that s3backer is writing blocks to S3 one at a time instead of in parallel Now my question is: why would s3backer behave this way? I have looked at the s3backer code and it seems clear to me that it does not serialize its writes; if multiple requests to write blocks were issued simultaneously (via simultaneous FUSE write() calls), s3backer will issue these requests simultaneously to S3. I could be misunderstanding the code though; perhaps Archie Cobbs could comment? If this is truly the case, then the serialization must be happening elsewhere. I have a guess: - Since s3backer is being used by the VFS layer as a loopback-mounted file, it will be asked to write blocks only when the VFS layer flushes them - Probably the VFS layer will only issue one block flush request at a time? This seems to make sense; I would expect that flushing multiple blocks simultaneously (via multiple kernel threads) wouldn't make alot of sense when the blocks are being flushed to a hard drive (which would be the normal use case). Or maybe it's the loopback mount mechanism that enforces this serialization? In either cae, the effect is the same: the VFS block cache flushes blocks one at a time to the same device. Any insights into this? If my hunch is correct, then there is no way to parallelize these block flush operations, and the best that could be done would be to try to speed up the block cache flush operation as much as possible by just copying the flushed blocks into a memory buffer, returning from the FUSE write call as immediately as possible, and using multiple threads to flush these buffered block writes to S3 with as much parallelism as possible (all of this would be accomplished inside the s3backer code itself). Thanks, and best wishes, Bryan ------------------------------------------------------------------------ Bryan Ischo bryan@... 2001 Mazda 626 GLX Hamilton, New Zealand http://www.ischo.com RedHat Fedora Core 5 ------------------------------------------------------------------------ ------------------------------------------------------------------------- This SF.Net email is sponsored by the Moblin Your Move Developer's challenge Build the coolest Linux based applications with Moblin SDK & win great prizes Grand prize is a trip for two to an Open Source event anywhere in the world http://moblin-contest.org/redirect.php?banner_id=100&url=/ _______________________________________________ fuse-devel mailing list fuse-devel@... https://lists.sourceforge.net/lists/listinfo/fuse-devel |
| Free Forum Powered by Nabble | Forum Help |