Bad AFS performance over wide area due to packet fragmentation problems

View: New views
10 Messages — Rating Filter:   Alert me  

Bad AFS performance over wide area due to packet fragmentation problems

by Rainer Toebbicke :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

It's not the first time that people at external sites complain about
bad AFS performance from our servers.

Usually, it boils down to that every now and then one of the routers
or firewalls in the external networking clouds starts to choke on IP
packet fragmentation - when we repeat the tests with "-nojumbo"
everything works like a charm. The problem is that every time its the
users who detect this, it takes some time until the problem surfaces
at the sysadmin level, and the networking guys swear that there wasn't
a change since ages.

As '-nojumbo' has a measurable price on our own local network where
fragmentation does not exhibit any problem we hesitate to run
everything in that mode.

Ideally, RX would be adaptive and stop using jumbograms when they
cause problems, but I understand that the algorithm to detect those
reliably can be challenging.

How about a config file a la NetRestrict then that turns off
jumbograms to external sites, or allows them to others?

Any brilliant ideas?

--
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
Rainer Toebbicke
European Laboratory for Particle Physics(CERN) - Geneva, Switzerland
Phone: +41 22 767 8985       Fax: +41 22 767 7155
_______________________________________________
OpenAFS-devel mailing list
OpenAFS-devel@...
https://lists.openafs.org/mailman/listinfo/openafs-devel

Re: Bad AFS performance over wide area due to packet fragmentation problems

by Jeffrey Altman-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Rainer Toebbicke wrote:

> It's not the first time that people at external sites complain about bad
> AFS performance from our servers.
>
> Usually, it boils down to that every now and then one of the routers or
> firewalls in the external networking clouds starts to choke on IP packet
> fragmentation - when we repeat the tests with "-nojumbo" everything
> works like a charm. The problem is that every time its the users who
> detect this, it takes some time until the problem surfaces at the
> sysadmin level, and the networking guys swear that there wasn't a change
> since ages.
>
> As '-nojumbo' has a measurable price on our own local network where
> fragmentation does not exhibit any problem we hesitate to run everything
> in that mode.
>
> Ideally, RX would be adaptive and stop using jumbograms when they cause
> problems, but I understand that the algorithm to detect those reliably
> can be challenging.
>
> How about a config file a la NetRestrict then that turns off jumbograms
> to external sites, or allows them to others?
>
> Any brilliant ideas?
Rainer:

There is a bug in Rx related to -nojumbo that Derrick identified
yesterday and for which we are testing a patch.  There are three
instances of rxi_AdjustDgramPackets() which instead of being passed
the constant RX_MAX_FRAGS (4) should instead be passed 'rxi_nSendFrags'
which is set to 1 when -nojumbo is enabled.   By always setting this
value to RX_MAX_FRAGS, the sender always attempts to send a jumbogram
even when that entity explicitly has said it does not want it.

The patch can be found at:

/afs/andrew.cmu.edu/usr/shadow/rx-nojumbo-really.diff

As another work around, the client can set the RxMaxMTU size to 1431.
This value is small enough to disable the sending of jumbo grams.

Prior to 1.5.34 the Windows client set RxMaxMTU size to 1260 which
in effect disabled the use of jumbograms on both the clients and
the servers.  This was done to permit OAFW to work with the Cisco
VPN 4.x client.  I removed this in 1.5.34 because it the Cisco VPN
5.x client had been widely deployed and does not suffer from the
inability to send IP fragments.

Unfortunately, there are a large number of other network devices
that also have trouble transmitting IP fragments.  Removing the
RxMaxMTU restriction enabled the OAFW client to once again start
sending jumbograms even though the RxNoJumbo setting is enabled.  The
above patch fixes this problem.

Jeffrey Altman


smime.p7s (4K) Download Attachment

Re: Bad AFS performance over wide area due to packet fragmentation problems

by Derrick Brashear :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

> Ideally, RX would be adaptive and stop using jumbograms when they cause
> problems, but I understand that the algorithm to detect those reliably can
> be challenging.

I'd actually like to do something like this. We have path mtu
discovery code for some platforms (not currently enabled, in 1.5.x
only, look at ADAPT_PMTU ifdef'd code), incidentally, but until it's
there for more than Solaris and Linux there's limited value to it.
_______________________________________________
OpenAFS-devel mailing list
OpenAFS-devel@...
https://lists.openafs.org/mailman/listinfo/openafs-devel

Re: Bad AFS performance over wide area due to packet fragmentation problems

by Harald Barth-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message


> As '-nojumbo' has a measurable price on our own local network where
> fragmentation does not exhibit any problem we hesitate to run
> everything in that mode.

I run with -nojumbo and with RX_MAX_FRAG patched to 1.

On which clients/servers do you see that this would have a performance
penalty?

Last time I checked on Linux there was no difference in letting the RX
code produce 4 UDP packets a ~1400 bytes which then are as many
eternet frames compared to let RX produce 1 packet a ~5600 bytes and
then the OS fragment it into 4 ethernet packets. Yes, there was a
difference back in the days of SunOS 4.1.4....

Harald.
_______________________________________________
OpenAFS-devel mailing list
OpenAFS-devel@...
https://lists.openafs.org/mailman/listinfo/openafs-devel

Re: Bad AFS performance over wide area due to packet fragmentation problems

by Jeffrey Hutzelman :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

--On Monday, August 25, 2008 07:51:37 PM +0200 Harald Barth <haba@...>
wrote:

>
>> As '-nojumbo' has a measurable price on our own local network where
>> fragmentation does not exhibit any problem we hesitate to run
>> everything in that mode.
>
> I run with -nojumbo and with RX_MAX_FRAG patched to 1.
>
> On which clients/servers do you see that this would have a performance
> penalty?
>
> Last time I checked on Linux there was no difference in letting the RX
> code produce 4 UDP packets a ~1400 bytes which then are as many
> eternet frames compared to let RX produce 1 packet a ~5600 bytes and
> then the OS fragment it into 4 ethernet packets. Yes, there was a
> difference back in the days of SunOS 4.1.4....

There is a significant difference, because if the OS fragments the packet,
then all four fragments must make it to the other end in order for anything
to be received.  If any of the fragments is dropped (say, due to network
congestion) then the entire packet (all four fragments) needs to be
retransmitted.  This sort of behavior makes congestion worse, and is why
path MTU discovery is so important.

-- Jeff
_______________________________________________
OpenAFS-devel mailing list
OpenAFS-devel@...
https://lists.openafs.org/mailman/listinfo/openafs-devel

Re: Bad AFS performance over wide area due to packet fragmentation problems

by Jeffrey Altman-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Harald Barth wrote:
>> As '-nojumbo' has a measurable price on our own local network where
>> fragmentation does not exhibit any problem we hesitate to run
>> everything in that mode.
>
> I run with -nojumbo and with RX_MAX_FRAG patched to 1.

The fact that you had to set RX_MAX_FRAG to 1 in order to make -nojumbo
work was a bug that has now been fixed.




smime.p7s (4K) Download Attachment

Re: Bad AFS performance over wide area due to packet fragmentation problems

by Harald Barth-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

> > I run with -nojumbo and with RX_MAX_FRAG patched to 1.
>
> The fact that you had to set RX_MAX_FRAG to 1 in order to make -nojumbo
> work was a bug that has now been fixed.

If that is what "-nojumbo" was supposed to do, what is it called when you
stuff (or do not stuff) serveral (small) rx-packets into one UDP packet?

All this stuff confuses me and a lot of network equipment and I allway
try to turn it OFF. Working MTU discovery for the whole path would be
nice. Can we do better than the usual ICMP based one that seldom works
because of firewalls NAT and the like?

Harald.
_______________________________________________
OpenAFS-devel mailing list
OpenAFS-devel@...
https://lists.openafs.org/mailman/listinfo/openafs-devel

Re: Bad AFS performance over wide area due to packet fragmentation problems

by Jeffrey Hutzelman :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

--On Monday, August 25, 2008 11:56:12 PM +0200 Harald Barth <haba@...>
wrote:

> All this stuff confuses me and a lot of network equipment and I allway
> try to turn it OFF. Working MTU discovery for the whole path would be
> nice. Can we do better than the usual ICMP based one that seldom works
> because of firewalls NAT and the like?

No, we cannot.  Where there is an intermediate segment whose MTU is lower
than those of the segments directly attached to the endpoints, path MTU
discovery requires the cooperation of the network.  The ICMP
destination-unreachable fragmentation-needed subcode is the mechanism
provided by the Internet Protocol for performing path MTU discovery.

For all that they are evil, NAT's do not break path MTU discovery.
What breaks path MTU discovery are

- Overzealous firewalls which block ICMP destination-unreachable codes,
  generally because they have been configured by someone who does not
  really understand networking and has heard that ICMP is evil.

- Incorrectly-configured routers which directly connect segments with
  differing MTU's, but do not generate ICMP destination-unreachable
  fragmentation-needed messages when necessary.

Both of these generally completely break TCP when there is an intermediate
segment with a lower MTU.  Thus, if you have a broken deployment, it should
be noticed pretty quickly and should break things other than AFS.


-- Jeff
_______________________________________________
OpenAFS-devel mailing list
OpenAFS-devel@...
https://lists.openafs.org/mailman/listinfo/openafs-devel

Re: Bad AFS performance over wide area due to packet fragmentation problems

by Derrick Brashear :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Mon, Aug 25, 2008 at 5:56 PM, Harald Barth <haba@...> wrote:
>> > I run with -nojumbo and with RX_MAX_FRAG patched to 1.
>>
>> The fact that you had to set RX_MAX_FRAG to 1 in order to make -nojumbo
>> work was a bug that has now been fixed.
>
> If that is what "-nojumbo" was supposed to do, what is it called when you
> stuff (or do not stuff) serveral (small) rx-packets into one UDP packet?

Well, the issue was the number of fragments allowed in a packet (which
for n > 1 is a jumbogram) was initialized from a macro, always,
instead of to the reduced value set in global (rx) variables when
-nojumbo was given.

Effectively this meant in those cases -nojumbo did not disable jumbograms.

What I would like to see is simple large packet support; Regardless of
path mtu, doing so would be no less likely to work (note that I did
not say no worse) than today. However, I feel that unless we can work
out path mtu discovery on more platforms, allowing this would be
detrimental. The only compromise would be to rework the "mtu
advertising" code we have now:

Default behavior: no jumbograms, no large datagrams.
Override with -jumbo: allow jumbograms, fall back to old behavior
Supported pmtu discovery: allow large datagrams up to max mtu if
discovered mtu is same or greater than the max advertised mtu.

Note that this would potentially allow one side supporting pmtu to
transmit large datagrams while the other side might need to transmit
only non-jumbogram single fragment packets in reply.
_______________________________________________
OpenAFS-devel mailing list
OpenAFS-devel@...
https://lists.openafs.org/mailman/listinfo/openafs-devel

Re: Bad AFS performance over wide area due to packet fragmentation problems

by Rainer Toebbicke :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Harald Barth schrieb:

>> As '-nojumbo' has a measurable price on our own local network where
>> fragmentation does not exhibit any problem we hesitate to run
>> everything in that mode.
>
> I run with -nojumbo and with RX_MAX_FRAG patched to 1.
>
> On which clients/servers do you see that this would have a performance
> penalty?
>
> Last time I checked on Linux there was no difference in letting the RX
> code produce 4 UDP packets a ~1400 bytes which then are as many
> eternet frames compared to let RX produce 1 packet a ~5600 bytes and
> then the OS fragment it into 4 ethernet packets. Yes, there was a
> difference back in the days of SunOS 4.1.4....
>
> Harald.
>

Harald,

I used an RX memory-two-memory transfer program with exact timing.
Yes, there is a measurable difference of the order of 10-20%.

This was early this year, under 1.4.4 or 1.4.6, on reasonably powered
Linuces. Admittedly in laboratory conditions, machines on same switch,
identical configurations that give a good speed match and hence small
queue sizes. Transfer rates in excess of 110 MB/s. Realistic
conditions usually produce a much wider spread.

Not that I'm in favour of jumbograms - just that they still do a good
job. The RX implementation certainly deserves some more attention with
respect to queue handling, window adjustments and the like, the gain
may then turn out smaller.

For wide-area the gain is probably negligible, hence turning them off
is likely to get rid of a problem for free. I haven't yet come around
trying that RX patch Derrick mentioned.



--
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
Rainer Toebbicke
European Laboratory for Particle Physics(CERN) - Geneva, Switzerland
Phone: +41 22 767 8985       Fax: +41 22 767 7155
_______________________________________________
OpenAFS-devel mailing list
OpenAFS-devel@...
https://lists.openafs.org/mailman/listinfo/openafs-devel
LightInTheBox - Buy quality products at wholesale price!