Rsync compression problem - sometimes ineffective?

View: New views
2 Messages — Rating Filter:   Alert me  

Rsync compression problem - sometimes ineffective?

by Bodle, Donald E :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message


Running rsync 2.6.9-1.el4.rf on CentOS 4.4 client and remote server.
Backing up user data from 2 different clients using following:

su - $HOSTID -c 'rsync -azr --timeout=600 --log-file=$DEBUGFILE
--log-file-format="%o %f %b %l %i"  --stats --delete --bwlimit=$BANDWDT
--rsh="ssh -P ____"  $STAGE $TARGET:$TARGETDIR'

Using "bytes sent"/"literal data" from statistics as a rough estimation
(I know there is overhead in the bytes sent) of the effectiveness of
compression, most days I see reasonable compression, such as from our
summary (X MBytes compressed=bytes sent; XMbytes uncompressed=Literal
data):

rsync $HOSTID transferred 46.20 MBytes compressed (210.45 MBytes
uncompressed)
52 minutes and 6 seconds
45.50 kBps
6,896 files changed out of 81,720 total files (8.44%)

or

rsync $HOSTID transferred 543.53 MBytes compressed (3.66 GBytes
uncompressed)
2 hours, 16 minutes and 38 seconds
89.12 kBps
7,343 files changed out of 79,944 total files (9.19%)

Some days, I see no evidence of compression, such as this:

rsync $HOSTID transferred 52.10 MBytes compressed (50.06 MBytes
uncompressed)
59 minutes and 48 seconds
53.98 kBps
5,350 files changed out of 80,257 total files (6.67%)

or similarly this:

rsync $HOSTID transferred 1007.55 MBytes compressed (1004.59 MBytes
uncompressed)
3 hours, 38 minutes and 47 seconds
92.27 kBps
9,888 files changed out of 79,306 total files (12.47%)


My initial thought was that days of no apparent compression were when
the majority of the changed files were small files (like when gzipping a
small ASCII file doubles it size) or already compressed files.  But so
far  I haven't been able to confirm this.  I'm not sure this logic
applies since rsync compresses data blocks (at least as I understand
it), and those blocks would be fairly consistent in size (I think).  Is
this general understanding of rsync's compression correct?

I searched the samba.org local archives first, and then Internet wide,
using +rsync +compression +problem, but didn't find any similar posts.
Less restrictive searches didn't help any either.  I also didn't see
anything in the FAQ or current issues and debugging areas.

Has anyone seen this sort of behaviour before?  Can you offer
suggestions of additional diagnostics to  attempt?  What additional
information might be useful to support my contention that this is
related to the data being changed on those "uncompressed" days?

Thanks

Donald E. Bodle, Jr.
Sr. Systems Developer
The Reynolds and Reynolds Co.
(937) 485-1954

Are you okay with today, if tomorrow is the end?
                    - Superchick (So Bright)

This message is confidential and may contain confidential information.
It is intended only for the individual[s] named herein. If this message
is being sent from a member of the legal department, it may also be
legally privileged. If you are not the named addressee[s] you must
delete this email immediately. Do not disseminate, distribute or copy.


--
Please use reply-all for most replies to avoid omitting the mailing list.
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html

Re: Rsync compression problem - sometimes ineffective?

by Matt McCutchen-7 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Thu, 2008-06-12 at 13:35 -0400, Bodle, Donald E wrote:
> Using "bytes sent"/"literal data" from statistics as a rough estimation
> (I know there is overhead in the bytes sent) of the effectiveness of
> compression, most days I see reasonable compression

> My initial thought was that days of no apparent compression were when
> the majority of the changed files were small files (like when gzipping a
> small ASCII file doubles it size) or already compressed files.  But so
> far  I haven't been able to confirm this.  I'm not sure this logic
> applies since rsync compresses data blocks (at least as I understand
> it), and those blocks would be fairly consistent in size (I think).  Is
> this general understanding of rsync's compression correct?

My guess is that the files are already compressed.

To see the actual size (compressed if applicable) of the delta rsync is
sending for each file, use the %b log option, e.g.,
--out-format='%b %i %n%L' .  You can compare those numbers with and
without compression to see which deltas aren't compressing as well as
you expect.  Unfortunately, %b only seems to work on a run that really
updates a destination, so you'll have to use a throwaway destination
(perhaps with --compare-dest to the real one) for the tests; %b ought to
work in --only-write-batch mode.  To investigate why a particular delta
isn't compressing, you could use rdiff to write the delta to a file and
then look at the data inside.

Matt


--
Please use reply-all for most replies to avoid omitting the mailing list.
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html

signature.asc (204 bytes) Download Attachment