|
View:
New views
10 Messages
—
Rating Filter:
Alert me
|
|
|
Bad AFS performance over wide area due to packet fragmentation problemsIt's not the first time that people at external sites complain about
bad AFS performance from our servers. Usually, it boils down to that every now and then one of the routers or firewalls in the external networking clouds starts to choke on IP packet fragmentation - when we repeat the tests with "-nojumbo" everything works like a charm. The problem is that every time its the users who detect this, it takes some time until the problem surfaces at the sysadmin level, and the networking guys swear that there wasn't a change since ages. As '-nojumbo' has a measurable price on our own local network where fragmentation does not exhibit any problem we hesitate to run everything in that mode. Ideally, RX would be adaptive and stop using jumbograms when they cause problems, but I understand that the algorithm to detect those reliably can be challenging. How about a config file a la NetRestrict then that turns off jumbograms to external sites, or allows them to others? Any brilliant ideas? -- =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= Rainer Toebbicke European Laboratory for Particle Physics(CERN) - Geneva, Switzerland Phone: +41 22 767 8985 Fax: +41 22 767 7155 _______________________________________________ OpenAFS-devel mailing list OpenAFS-devel@... https://lists.openafs.org/mailman/listinfo/openafs-devel |
|
|
Re: Bad AFS performance over wide area due to packet fragmentation problemsRainer Toebbicke wrote:
> It's not the first time that people at external sites complain about bad > AFS performance from our servers. > > Usually, it boils down to that every now and then one of the routers or > firewalls in the external networking clouds starts to choke on IP packet > fragmentation - when we repeat the tests with "-nojumbo" everything > works like a charm. The problem is that every time its the users who > detect this, it takes some time until the problem surfaces at the > sysadmin level, and the networking guys swear that there wasn't a change > since ages. > > As '-nojumbo' has a measurable price on our own local network where > fragmentation does not exhibit any problem we hesitate to run everything > in that mode. > > Ideally, RX would be adaptive and stop using jumbograms when they cause > problems, but I understand that the algorithm to detect those reliably > can be challenging. > > How about a config file a la NetRestrict then that turns off jumbograms > to external sites, or allows them to others? > > Any brilliant ideas? There is a bug in Rx related to -nojumbo that Derrick identified yesterday and for which we are testing a patch. There are three instances of rxi_AdjustDgramPackets() which instead of being passed the constant RX_MAX_FRAGS (4) should instead be passed 'rxi_nSendFrags' which is set to 1 when -nojumbo is enabled. By always setting this value to RX_MAX_FRAGS, the sender always attempts to send a jumbogram even when that entity explicitly has said it does not want it. The patch can be found at: /afs/andrew.cmu.edu/usr/shadow/rx-nojumbo-really.diff As another work around, the client can set the RxMaxMTU size to 1431. This value is small enough to disable the sending of jumbo grams. Prior to 1.5.34 the Windows client set RxMaxMTU size to 1260 which in effect disabled the use of jumbograms on both the clients and the servers. This was done to permit OAFW to work with the Cisco VPN 4.x client. I removed this in 1.5.34 because it the Cisco VPN 5.x client had been widely deployed and does not suffer from the inability to send IP fragments. Unfortunately, there are a large number of other network devices that also have trouble transmitting IP fragments. Removing the RxMaxMTU restriction enabled the OAFW client to once again start sending jumbograms even though the RxNoJumbo setting is enabled. The above patch fixes this problem. Jeffrey Altman |
|
|
Re: Bad AFS performance over wide area due to packet fragmentation problems> Ideally, RX would be adaptive and stop using jumbograms when they cause
> problems, but I understand that the algorithm to detect those reliably can > be challenging. I'd actually like to do something like this. We have path mtu discovery code for some platforms (not currently enabled, in 1.5.x only, look at ADAPT_PMTU ifdef'd code), incidentally, but until it's there for more than Solaris and Linux there's limited value to it. _______________________________________________ OpenAFS-devel mailing list OpenAFS-devel@... https://lists.openafs.org/mailman/listinfo/openafs-devel |
|
|
Re: Bad AFS performance over wide area due to packet fragmentation problems> As '-nojumbo' has a measurable price on our own local network where > fragmentation does not exhibit any problem we hesitate to run > everything in that mode. I run with -nojumbo and with RX_MAX_FRAG patched to 1. On which clients/servers do you see that this would have a performance penalty? Last time I checked on Linux there was no difference in letting the RX code produce 4 UDP packets a ~1400 bytes which then are as many eternet frames compared to let RX produce 1 packet a ~5600 bytes and then the OS fragment it into 4 ethernet packets. Yes, there was a difference back in the days of SunOS 4.1.4.... Harald. _______________________________________________ OpenAFS-devel mailing list OpenAFS-devel@... https://lists.openafs.org/mailman/listinfo/openafs-devel |
|
|
Re: Bad AFS performance over wide area due to packet fragmentation problems--On Monday, August 25, 2008 07:51:37 PM +0200 Harald Barth <haba@...>
wrote: > >> As '-nojumbo' has a measurable price on our own local network where >> fragmentation does not exhibit any problem we hesitate to run >> everything in that mode. > > I run with -nojumbo and with RX_MAX_FRAG patched to 1. > > On which clients/servers do you see that this would have a performance > penalty? > > Last time I checked on Linux there was no difference in letting the RX > code produce 4 UDP packets a ~1400 bytes which then are as many > eternet frames compared to let RX produce 1 packet a ~5600 bytes and > then the OS fragment it into 4 ethernet packets. Yes, there was a > difference back in the days of SunOS 4.1.4.... There is a significant difference, because if the OS fragments the packet, then all four fragments must make it to the other end in order for anything to be received. If any of the fragments is dropped (say, due to network congestion) then the entire packet (all four fragments) needs to be retransmitted. This sort of behavior makes congestion worse, and is why path MTU discovery is so important. -- Jeff _______________________________________________ OpenAFS-devel mailing list OpenAFS-devel@... https://lists.openafs.org/mailman/listinfo/openafs-devel |
|
|
Re: Bad AFS performance over wide area due to packet fragmentation problemsHarald Barth wrote:
>> As '-nojumbo' has a measurable price on our own local network where >> fragmentation does not exhibit any problem we hesitate to run >> everything in that mode. > > I run with -nojumbo and with RX_MAX_FRAG patched to 1. The fact that you had to set RX_MAX_FRAG to 1 in order to make -nojumbo work was a bug that has now been fixed. |
|
|
Re: Bad AFS performance over wide area due to packet fragmentation problems> > I run with -nojumbo and with RX_MAX_FRAG patched to 1.
> > The fact that you had to set RX_MAX_FRAG to 1 in order to make -nojumbo > work was a bug that has now been fixed. If that is what "-nojumbo" was supposed to do, what is it called when you stuff (or do not stuff) serveral (small) rx-packets into one UDP packet? All this stuff confuses me and a lot of network equipment and I allway try to turn it OFF. Working MTU discovery for the whole path would be nice. Can we do better than the usual ICMP based one that seldom works because of firewalls NAT and the like? Harald. _______________________________________________ OpenAFS-devel mailing list OpenAFS-devel@... https://lists.openafs.org/mailman/listinfo/openafs-devel |
|
|
Re: Bad AFS performance over wide area due to packet fragmentation problems--On Monday, August 25, 2008 11:56:12 PM +0200 Harald Barth <haba@...>
wrote: > All this stuff confuses me and a lot of network equipment and I allway > try to turn it OFF. Working MTU discovery for the whole path would be > nice. Can we do better than the usual ICMP based one that seldom works > because of firewalls NAT and the like? No, we cannot. Where there is an intermediate segment whose MTU is lower than those of the segments directly attached to the endpoints, path MTU discovery requires the cooperation of the network. The ICMP destination-unreachable fragmentation-needed subcode is the mechanism provided by the Internet Protocol for performing path MTU discovery. For all that they are evil, NAT's do not break path MTU discovery. What breaks path MTU discovery are - Overzealous firewalls which block ICMP destination-unreachable codes, generally because they have been configured by someone who does not really understand networking and has heard that ICMP is evil. - Incorrectly-configured routers which directly connect segments with differing MTU's, but do not generate ICMP destination-unreachable fragmentation-needed messages when necessary. Both of these generally completely break TCP when there is an intermediate segment with a lower MTU. Thus, if you have a broken deployment, it should be noticed pretty quickly and should break things other than AFS. -- Jeff _______________________________________________ OpenAFS-devel mailing list OpenAFS-devel@... https://lists.openafs.org/mailman/listinfo/openafs-devel |
|
|
Re: Bad AFS performance over wide area due to packet fragmentation problemsOn Mon, Aug 25, 2008 at 5:56 PM, Harald Barth <haba@...> wrote:
>> > I run with -nojumbo and with RX_MAX_FRAG patched to 1. >> >> The fact that you had to set RX_MAX_FRAG to 1 in order to make -nojumbo >> work was a bug that has now been fixed. > > If that is what "-nojumbo" was supposed to do, what is it called when you > stuff (or do not stuff) serveral (small) rx-packets into one UDP packet? Well, the issue was the number of fragments allowed in a packet (which for n > 1 is a jumbogram) was initialized from a macro, always, instead of to the reduced value set in global (rx) variables when -nojumbo was given. Effectively this meant in those cases -nojumbo did not disable jumbograms. What I would like to see is simple large packet support; Regardless of path mtu, doing so would be no less likely to work (note that I did not say no worse) than today. However, I feel that unless we can work out path mtu discovery on more platforms, allowing this would be detrimental. The only compromise would be to rework the "mtu advertising" code we have now: Default behavior: no jumbograms, no large datagrams. Override with -jumbo: allow jumbograms, fall back to old behavior Supported pmtu discovery: allow large datagrams up to max mtu if discovered mtu is same or greater than the max advertised mtu. Note that this would potentially allow one side supporting pmtu to transmit large datagrams while the other side might need to transmit only non-jumbogram single fragment packets in reply. _______________________________________________ OpenAFS-devel mailing list OpenAFS-devel@... https://lists.openafs.org/mailman/listinfo/openafs-devel |
|
|
Re: Bad AFS performance over wide area due to packet fragmentation problemsHarald Barth schrieb:
>> As '-nojumbo' has a measurable price on our own local network where >> fragmentation does not exhibit any problem we hesitate to run >> everything in that mode. > > I run with -nojumbo and with RX_MAX_FRAG patched to 1. > > On which clients/servers do you see that this would have a performance > penalty? > > Last time I checked on Linux there was no difference in letting the RX > code produce 4 UDP packets a ~1400 bytes which then are as many > eternet frames compared to let RX produce 1 packet a ~5600 bytes and > then the OS fragment it into 4 ethernet packets. Yes, there was a > difference back in the days of SunOS 4.1.4.... > > Harald. > Harald, I used an RX memory-two-memory transfer program with exact timing. Yes, there is a measurable difference of the order of 10-20%. This was early this year, under 1.4.4 or 1.4.6, on reasonably powered Linuces. Admittedly in laboratory conditions, machines on same switch, identical configurations that give a good speed match and hence small queue sizes. Transfer rates in excess of 110 MB/s. Realistic conditions usually produce a much wider spread. Not that I'm in favour of jumbograms - just that they still do a good job. The RX implementation certainly deserves some more attention with respect to queue handling, window adjustments and the like, the gain may then turn out smaller. For wide-area the gain is probably negligible, hence turning them off is likely to get rid of a problem for free. I haven't yet come around trying that RX patch Derrick mentioned. -- =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= Rainer Toebbicke European Laboratory for Particle Physics(CERN) - Geneva, Switzerland Phone: +41 22 767 8985 Fax: +41 22 767 7155 _______________________________________________ OpenAFS-devel mailing list OpenAFS-devel@... https://lists.openafs.org/mailman/listinfo/openafs-devel |
| Free Forum Powered by Nabble | Forum Help |