Question about FUSE performance with Linux block cache

View: New views
1 Messages — Rating Filter:   Alert me  

Question about FUSE performance with Linux block cache

by Bryan Ischo-4 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Hi, I have some questions about some performance figures I am seeing that
I believe have some relation to FUSE, although not knowing that much about
how the Linux VFS layer, my supposition is that the issue is more
influenced by how the Linux block cache works than by FUSE, but I'll ask
here because I suspect the expertise exists to answer my question even if
it's only tangentially FUSE related.  Please feel free to send me
elsewhere if there is a better place to ask this question!

I have been experimenting with s3backer
(http://code.google.com/p/s3backer), in an attempt to see if on a data
center colocated server (not EC2!) it would have sufficient performance to
support a reasonable network filesystem, so I could expand the 'disk
space' available on my server indefinitely.

The way s3backer works is:

- It is a FUSE filesystem that supports a single file, of a size that the
user can specify, which is broken up into blocks, which themselves are
stored on Amazon S3 as individual keys.  Thus any write to this file will
result in the storing of one or more files (4k blocks usually, to match
the kernel's block cache block size) to S3, and reading from this file
will result in reading one or more of these files back from S3.

- The user then makes a 'normal' filesystem of their choosing in this
S3-backed file.

- The user loopback mounts this file, and voila, they have a filesystem
that takes no space on their own system, instead storing all of its data
in S3.

This is a really elegant solution to the problem of storing data on S3; S3
has really unfortunate limitations (doesn't allow partial writes of files
- if you want to change 1 byte of a 1 GB file, you have to re-upload the
entire 1 GB!) that s3backer gets around by breaking the file up into many
small pieces, and thus any modifications result in a much smaller amount
of data to be written to S3, just the 4k blocks surrounding the changed
data.

It is also elegant because it allows the re-use of so much existing
filesystem infrastructure; not needing to write any actual filesystem
code, we can re-use existing robust, mature, performant filesystmes like
xfs and ext3.

OK, that's how s3backer works.  Now the issue I am having with it is
primarily with performance (there are other issues related to the
fragmentation of files into multiple blocks on S3, including the fact that
these files are no longer accessible via a web browser as they would if
they were stored as real files, and also the performance and cost
implications of having to do over one million individual HTTP GET requests
to stream a 5 GB file from S3 in this way).  I am finding that flushing of
changed blocks to S3 is pretty slow, compared to what I think ought to be
achievable.  Here are some raw numbers that I extracted from a small
amount of testing:

- With s3backer, I found that when flushing about 85 MB of file data to S3
(this represents approximately 21,000 4k blocks, requiring 21,000 HTTP PUT
requests), I get only about 50 kBytes per second of throughput to S3.

- I ran some of my own tests for writing files to S3 on the same system,
using a multithreaded program that I wrote that can write a file of
configurable size a configurable number of times using a configurable
number of threads, thus simultaneously issuing many HTTP PUT requests.  I
found that with this program:

  - Writing the same 4k file 500 times using 1 thread only (and thus
serializing the HTTP PUTs), took 78 seconds, achieving 25 kBytes per
second throughput to S3

  - Writing the same 4k file 500 times using 10 threads took 3.92 seconds,
achieving 510 kBytes per second throughput to S3

  - Writing the same 4k file 500 times using 100 threads took 0.869
seconds, achieving 2,301 kBytes per second throughput to S3

  - Writing a 200k file (equivalent to 4k file written 500 times) using 1
thread took 1.93 seconds, achieving 1,036 kBytes per second throughput
to S3

So, to summarize:

- s3backer seems to only write data to S3 at a rate of about 50 kBytes per
second
- My testing shows that this corresponds to serialized writes of 4k block
files to S3
- Parallel writes of 4k block files can achieve up to 40x the performance
of serialized writes (using 100 simultaneous writes achieved 46x the
performance of serialized writes)
- Thus, I conclude that s3backer is writing blocks to S3 one at a time
instead of in parallel

Now my question is: why would s3backer behave this way?

I have looked at the s3backer code and it seems clear to me that it does
not serialize its writes; if multiple requests to write blocks were issued
simultaneously (via simultaneous FUSE write() calls), s3backer will issue
these requests simultaneously to S3.  I could be misunderstanding the code
though; perhaps Archie Cobbs could comment?

If this is truly the case, then the serialization must be happening
elsewhere.  I have a guess:

- Since s3backer is being used by the VFS layer as a loopback-mounted
file, it will be asked to write blocks only when the VFS layer flushes
them

- Probably the VFS layer will only issue one block flush request at a
time?  This seems to make sense; I would expect that flushing multiple
blocks simultaneously (via multiple kernel threads) wouldn't make alot of
sense when the blocks are being flushed to a hard drive (which would be
the normal use case).  Or maybe it's the loopback mount mechanism that
enforces this serialization?  In either cae, the effect is the same: the
VFS block cache flushes blocks one at a time to the same device.

Any insights into this?  If my hunch is correct, then there is no way to
parallelize these block flush operations, and the best that could be done
would be to try to speed up the block cache flush operation as much as
possible by just copying the flushed blocks into a memory buffer,
returning from the FUSE write call as immediately as possible, and using
multiple threads to flush these buffered block writes to S3 with as much
parallelism as possible (all of this would be accomplished inside the
s3backer code itself).

Thanks, and best wishes,
Bryan

------------------------------------------------------------------------
Bryan Ischo                bryan@...            2001 Mazda 626 GLX
Hamilton, New Zealand      http://www.ischo.com     RedHat Fedora Core 5
------------------------------------------------------------------------



-------------------------------------------------------------------------
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK & win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100&url=/
_______________________________________________
fuse-devel mailing list
fuse-devel@...
https://lists.sourceforge.net/lists/listinfo/fuse-devel