I/O Scheduling results in poor responsiveness

View: New views
15 Messages — Rating Filter:   Alert me  

I/O Scheduling results in poor responsiveness

by Nathan Grennan-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

     Why is the command below all that is needed to bring the system to
it's knees? Why doesn't the io scheduler, CFQ, which is supposed to be
all about fairness starve other processes? Example, if I open a new file
in vim, and hold down "i" while this is running it will pause the
display of new "i"s for seconds, sometimes until the dd write is
completely finished. Another example is applications like firefox,
thunderbird, xchat, and pidgin will stop refreshing for 10+ seconds.

  dd if=/dev/zero of=test-file bs=2M count=2048

  I understand the main difference between using oflag=direct or not
relates to if the io scheduler is used, and if the file is cached or
not. I can see this clearly by watching cached rise without
oflag=direct, stay the same with it, and go way down when I delete the
file after running dd without oflag=direct.

  The system in question is running Fedora 8. It is an E6600, 4gb
memory, and 2x300gb Seagate sata drives. The drives are setup with md
raid 1, and the filesystem is ext3. But I also see this with plenty of
other systems with more cpu, less cpu, less memory, raid, and no raid.
 
  I have tried various tweaks to sys.vm settings, tried changing the
scheduler to as or deadline. Nothing seem to get it to behave, other
than oflag=direct.

  Using dd if=/dev/zero is just an easy test case.  I see this when
copying large files, creating large files, and using virtualization
software that does heavy i/o on large files.


 
  The command below seems to result in cpu idle 0 and io wait 100%. As
shown by "vmstat 1"

dd if=/dev/zero of=test-file bs=2M count=2048

2048+0 records in
2048+0 records out
4294967296 bytes (4.3 GB) copied, 94.7903 s, 45.3 MB/s


  The command below seems to work much better for responsiveness. The
cpu idle will be around 50, and the io wait will be around 50.

dd if=/dev/zero of=test-file2 bs=2M count=2048 oflag=direct

2048+0 records in
2048+0 records out
4294967296 bytes (4.3 GB) copied, 115.733 s, 37.1 MB/s

--
fedora-list mailing list
fedora-list@...
To unsubscribe: https://www.redhat.com/mailman/listinfo/fedora-list

Re: I/O Scheduling results in poor responsiveness

by Les Mikesell-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Nathan Grennan wrote:

>     Why is the command below all that is needed to bring the system to
> it's knees? Why doesn't the io scheduler, CFQ, which is supposed to be
> all about fairness starve other processes? Example, if I open a new file
> in vim, and hold down "i" while this is running it will pause the
> display of new "i"s for seconds, sometimes until the dd write is
> completely finished. Another example is applications like firefox,
> thunderbird, xchat, and pidgin will stop refreshing for 10+ seconds.
>
>  dd if=/dev/zero of=test-file bs=2M count=2048
>
>  I understand the main difference between using oflag=direct or not
> relates to if the io scheduler is used, and if the file is cached or
> not. I can see this clearly by watching cached rise without
> oflag=direct, stay the same with it, and go way down when I delete the
> file after running dd without oflag=direct.
>
>  The system in question is running Fedora 8. It is an E6600, 4gb memory,
> and 2x300gb Seagate sata drives. The drives are setup with md raid 1,
> and the filesystem is ext3. But I also see this with plenty of other
> systems with more cpu, less cpu, less memory, raid, and no raid.

Can you compare to systems with SCSI drives?  I think this is telling
you that your disk controller is eating all the CPU when the controller
and DMA should be doing all the work.

--
   Les Mikesell
    lesmikesell@...

--
fedora-list mailing list
fedora-list@...
To unsubscribe: https://www.redhat.com/mailman/listinfo/fedora-list

Re: I/O Scheduling results in poor responsiveness

by Nathan Grennan-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Les Mikesell wrote:
> Can you compare to systems with SCSI drives?  I think this is telling
> you that your disk controller is eating all the CPU when the
> controller and DMA should be doing all the work.
>
  Are you saying you think that the controller isn't using DMA? You
think the controller or driver is just poorly written?

  I will try a SCSI system, but the closest I have is a CentOS 4.6
machine, which using a kernel based on 2.6.9. It is also a server, so I
can't run firefox, thunderbird, xchat, or pidgin on it.

--
fedora-list mailing list
fedora-list@...
To unsubscribe: https://www.redhat.com/mailman/listinfo/fedora-list

Re: I/O Scheduling results in poor responsiveness

by Les Mikesell-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Nathan Grennan wrote:
>
>> Can you compare to systems with SCSI drives?  I think this is telling
>> you that your disk controller is eating all the CPU when the
>> controller and DMA should be doing all the work.
>>
>  Are you saying you think that the controller isn't using DMA? You think
> the controller or driver is just poorly written?

I'm not sure how to tell - but disk activity shouldn't take a lot of CPU
other than tying it up in iowait if it doesn't have anything else to do.

>  I will try a SCSI system, but the closest I have is a CentOS 4.6
> machine, which using a kernel based on 2.6.9. It is also a server, so I
> can't run firefox, thunderbird, xchat, or pidgin on it.

There are 2 things that could be going wrong - one is that the driver is
keeping the cpu too busy to do anything else, and the other is that the
system might need to page in some needed code or flush a work buffer
before doing anything else (neither of which seems likely in your vi
insert character example) and the disk heads are far away and busy with
the writing.  The latter case could be helped by putting the OS on a
different drive from your data with large writes.

--
    Les Mikesell
     lesmikesell@...

--
fedora-list mailing list
fedora-list@...
To unsubscribe: https://www.redhat.com/mailman/listinfo/fedora-list

Re: I/O Scheduling results in poor responsiveness

by Pasi Kärkkäinen :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Tue, Mar 04, 2008 at 11:37:31PM -0800, Nathan Grennan wrote:

>     Why is the command below all that is needed to bring the system to
> it's knees? Why doesn't the io scheduler, CFQ, which is supposed to be
> all about fairness starve other processes? Example, if I open a new file
> in vim, and hold down "i" while this is running it will pause the
> display of new "i"s for seconds, sometimes until the dd write is
> completely finished. Another example is applications like firefox,
> thunderbird, xchat, and pidgin will stop refreshing for 10+ seconds.
>
>  dd if=/dev/zero of=test-file bs=2M count=2048
>
>  I understand the main difference between using oflag=direct or not
> relates to if the io scheduler is used, and if the file is cached or
> not. I can see this clearly by watching cached rise without
> oflag=direct, stay the same with it, and go way down when I delete the
> file after running dd without oflag=direct.
>
>  The system in question is running Fedora 8. It is an E6600, 4gb
> memory, and 2x300gb Seagate sata drives. The drives are setup with md
> raid 1, and the filesystem is ext3. But I also see this with plenty of
> other systems with more cpu, less cpu, less memory, raid, and no raid.
>

What motherboard/chipset do you have? which sata chipset?

Are you using ncq?

Did you try limiting the memory to 2G or even 1G ?

Are you running 32bit or 64bit OS?

>  I have tried various tweaks to sys.vm settings, tried changing the
> scheduler to as or deadline. Nothing seem to get it to behave, other
> than oflag=direct.
>

Did you also try noop?

 
>  Using dd if=/dev/zero is just an easy test case.  I see this when
> copying large files, creating large files, and using virtualization
> software that does heavy i/o on large files.
>
>
>
>  The command below seems to result in cpu idle 0 and io wait 100%. As
> shown by "vmstat 1"
>

Maybe also try iostat.. maybe it shows you something more/important in this
case.

There are also some caching/flushing related vm parameters which might
affect these things..

-- Pasi

--
fedora-list mailing list
fedora-list@...
To unsubscribe: https://www.redhat.com/mailman/listinfo/fedora-list

Re: I/O Scheduling results in poor responsiveness

by Chris Snook :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Nathan Grennan wrote:

>     Why is the command below all that is needed to bring the system to
> it's knees? Why doesn't the io scheduler, CFQ, which is supposed to be
> all about fairness starve other processes? Example, if I open a new file
> in vim, and hold down "i" while this is running it will pause the
> display of new "i"s for seconds, sometimes until the dd write is
> completely finished. Another example is applications like firefox,
> thunderbird, xchat, and pidgin will stop refreshing for 10+ seconds.
>
>  dd if=/dev/zero of=test-file bs=2M count=2048
>
>  I understand the main difference between using oflag=direct or not
> relates to if the io scheduler is used, and if the file is cached or
> not. I can see this clearly by watching cached rise without
> oflag=direct, stay the same with it, and go way down when I delete the
> file after running dd without oflag=direct.
>
>  The system in question is running Fedora 8. It is an E6600, 4gb memory,
> and 2x300gb Seagate sata drives. The drives are setup with md raid 1,
> and the filesystem is ext3. But I also see this with plenty of other
> systems with more cpu, less cpu, less memory, raid, and no raid.
>
>  I have tried various tweaks to sys.vm settings, tried changing the
> scheduler to as or deadline. Nothing seem to get it to behave, other
> than oflag=direct.
>
>  Using dd if=/dev/zero is just an easy test case.  I see this when
> copying large files, creating large files, and using virtualization
> software that does heavy i/o on large files.
>
>
>
>  The command below seems to result in cpu idle 0 and io wait 100%. As
> shown by "vmstat 1"
>
> dd if=/dev/zero of=test-file bs=2M count=2048
>
> 2048+0 records in
> 2048+0 records out
> 4294967296 bytes (4.3 GB) copied, 94.7903 s, 45.3 MB/s
>
>
>  The command below seems to work much better for responsiveness. The cpu
> idle will be around 50, and the io wait will be around 50.
>
> dd if=/dev/zero of=test-file2 bs=2M count=2048 oflag=direct
>
> 2048+0 records in
> 2048+0 records out
> 4294967296 bytes (4.3 GB) copied, 115.733 s, 37.1 MB/s
>

CFQ is optimized for throughput, not latency.  When you're doing dd without
oflag=direct, you're dirtying memory faster than it can be written to disk, so
pdflush will spawn up to 8 threads (giving it 8 threads' worth of CFQ time),
which starves out vim's extremely frequent syncing of its session file.

The 8 threaded behavior of pdflush is a bit of a hack, and upstream is working
on pageout improvements that should obviate it, but that work is still experimental.

vim's behavior is a performance/robustness tradeoff, and is expected to be slow
when the system is doing a lot of I/O.

As for your virtualization, this is why most virtualization software (including
Xen and KVM) allows you to use a block device, such as a logical volume, to
which it can do direct I/O, which takes pdflush out of the picture.

Ultimately, if latency is a high priority for you, you should switch to the
deadline scheduler.

-- Chris

--
fedora-list mailing list
fedora-list@...
To unsubscribe: https://www.redhat.com/mailman/listinfo/fedora-list

Re: I/O Scheduling results in poor responsiveness

by Nathan Grennan-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Pasi Kärkkäinen wrote:
>
> What motherboard/chipset do you have? which sata chipset?
>
>  
ASUS P5B Premium

ICH8R in ACHI mode

> Are you using ncq?
>  
Yes, I tried turning it off.
> Did you try limiting the memory to 2G or even 1G ?
>
>  
No, I haven't tried that one, though had thought about it.
> Are you running 32bit or 64bit OS?
>
>  
x86_64 across the board. I have thought of trying it on i686 systems,
though the only ones I have running Fedora 8 are laptops.
>
> Did you also try noop?
>
>  
No, I thought about trying that.
>  
> Maybe also try iostat.. maybe it shows you something more/important in this
> case.
>  
I have looked it it, but not in detail.
> There are also some caching/flushing related vm parameters which might
> affect these things..
>  
That is what I meant by sys.vm settings, but after some reading this
morning I might have been configuring them in the wrong direction. In
this case less may be more.

--
fedora-list mailing list
fedora-list@...
To unsubscribe: https://www.redhat.com/mailman/listinfo/fedora-list

Re: I/O Scheduling results in poor responsiveness

by Tom Horsley-3 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Tue, 04 Mar 2008 23:37:31 -0800
Nathan Grennan <fedora-list@...> wrote:

> Why is the command below all that is needed to bring the system to
> it's knees?

I see a lot of variation in this kind of thing. Copying a big
backup file to my external usb hard driver made my system stutter
a bit (though nothing like what you describe), but I have recently
dd'ed up some disk images for xen, and the system worked perfectly
well during that operation.

This was on fedora 8 x86_64 with sata disks.

--
fedora-list mailing list
fedora-list@...
To unsubscribe: https://www.redhat.com/mailman/listinfo/fedora-list

Re: I/O Scheduling results in poor responsiveness

by Bruno Wolff III :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Wed, Mar 05, 2008 at 14:35:41 -0500,
  Chris Snook <csnook@...> wrote:
>
> Ultimately, if latency is a high priority for you, you should switch to the
> deadline scheduler.

Is there an easy way to do this? My desktop seems pretty sluggish since
switching to rawhide and I suspect it is disk IO related. I'd like to
try out another scheduler and see if it helps.

--
fedora-list mailing list
fedora-list@...
To unsubscribe: https://www.redhat.com/mailman/listinfo/fedora-list

Re: I/O Scheduling results in poor responsiveness

by Chris Snook :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Bruno Wolff III wrote:
> On Wed, Mar 05, 2008 at 14:35:41 -0500,
>   Chris Snook <csnook@...> wrote:
>> Ultimately, if latency is a high priority for you, you should switch to the
>> deadline scheduler.
>
> Is there an easy way to do this? My desktop seems pretty sluggish since
> switching to rawhide and I suspect it is disk IO related. I'd like to
> try out another scheduler and see if it helps.

At the grub screen, hit 'a' to append kernel arguments, and add
elevator=deadline to the list of parameters.  If you like the results,
you can add it permanently by editing /boot/grub/grub.conf.

It's also possible that your sluggish rawhide performance is due to all
the extra debug options that are turned on in the rawhide kernel.  I've
seen overhead as high as 30% on some workloads.  There's been some
discussion of adding a 'nodebug' kernel variant to rawhide that's
compiled with roughly the same options as the stable Fedora kernel, but
I don't know when or if that's going to happen.

        -- Chris

--
fedora-list mailing list
fedora-list@...
To unsubscribe: https://www.redhat.com/mailman/listinfo/fedora-list

Re: I/O Scheduling results in poor responsiveness

by Bill Davidsen :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Nathan Grennan wrote:

>     Why is the command below all that is needed to bring the system to
> it's knees? Why doesn't the io scheduler, CFQ, which is supposed to be
> all about fairness starve other processes? Example, if I open a new file
> in vim, and hold down "i" while this is running it will pause the
> display of new "i"s for seconds, sometimes until the dd write is
> completely finished. Another example is applications like firefox,
> thunderbird, xchat, and pidgin will stop refreshing for 10+ seconds.
>
>  dd if=/dev/zero of=test-file bs=2M count=2048
>
>  I understand the main difference between using oflag=direct or not
> relates to if the io scheduler is used, and if the file is cached or
> not. I can see this clearly by watching cached rise without
> oflag=direct, stay the same with it, and go way down when I delete the
> file after running dd without oflag=direct.
>
>  The system in question is running Fedora 8. It is an E6600, 4gb memory,
> and 2x300gb Seagate sata drives. The drives are setup with md raid 1,
> and the filesystem is ext3. But I also see this with plenty of other
> systems with more cpu, less cpu, less memory, raid, and no raid.
>
>  I have tried various tweaks to sys.vm settings, tried changing the
> scheduler to as or deadline. Nothing seem to get it to behave, other
> than oflag=direct.
>
Known problem with the io schedulers, and discussed from time to time on
the RAID list. The current io schedulers don't split drive access fairly
between read and write, so when you get a huge batch of write queued
reads suffer. In your case, the vi problem may be an issue of doing a
write to the file and that write being at the end of the io queue.

Note: the optimization is for throughput, not responsiveness, you may
see more pleasing results with the deadline scheduler. You also may want
to look at using NCQ and setting the queue_depth in /sys. I can't
explain it without looking up the details, so there's something for you
to check.

--
Bill Davidsen <davidsen@...>
   "We have more to fear from the bungling of the incompetent than from
the machinations of the wicked."  - from Slashdot

--
fedora-list mailing list
fedora-list@...
To unsubscribe: https://www.redhat.com/mailman/listinfo/fedora-list

Re: I/O Scheduling results in poor responsiveness

by Nathan Grennan-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Chris Snook wrote:

> At the grub screen, hit 'a' to append kernel arguments, and add
> elevator=deadline to the list of parameters.  If you like the results,
> you can add it permanently by editing /boot/grub/grub.conf.
>
> It's also possible that your sluggish rawhide performance is due to
> all the extra debug options that are turned on in the rawhide kernel.  
> I've seen overhead as high as 30% on some workloads.  There's been
> some discussion of adding a 'nodebug' kernel variant to rawhide that's
> compiled with roughly the same options as the stable Fedora kernel,
> but I don't know when or if that's going to happen.
>
  I figured it all out. The main issue was my use of a Firefox 3.0
nightly. From Firefox 3.0b3 on Firefox is very fsync happy. As in ever
time you load a page it fsync about eight times. Do things in the middle
of a big write and performance goes all to hell. I have filed a bug
upstream, https://bugzilla.mozilla.org/show_bug.cgi?id=421482 .

  I ran across this idea by reading, http://kerneltrap.org/node/14148 .
The first e-mail from Ingo as a reply to another one of his earlier
e-mails mentions a case a lot like mine. Quad-Core, 4gb ram, and 30
second pauses in vim. He mentions vim uses fsync. He mentions an option,
but it isn't good enough. You have to also change set swapsync, like below.

 I then straced vim and found any hiccups in the output were directly
related to when vim ran fsync. I set the options, and the problem seemed
to go away.

 Finally I turned to Firefox 3.0 nightly, and straced it. I found the
problem. I then went back to Firefox 2.0.0.12, and straced it. I found
it didn't have the same problem. So as nice as Firefox 3.0b3 or later
is, it is a recipe for unhappiness.

set swapsync=sync
set nofsync

  But this is just a symptom of a bigger problem. As the Kernel Trap url
above mentions, ext3 + fsync = crappiness. So my next step will be to
talk to the right developers, learn as much as possible, and see if a
solution can be found. Otherwise I may completely give up on ext3 and
move to another filesystem.

--
fedora-list mailing list
fedora-list@...
To unsubscribe: https://www.redhat.com/mailman/listinfo/fedora-list

Re: I/O Scheduling results in poor responsiveness

by Bruno Wolff III :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Thu, Mar 06, 2008 at 11:18:54 -0500,
  Chris Snook <csnook@...> wrote:
>
> At the grub screen, hit 'a' to append kernel arguments, and add
> elevator=deadline to the list of parameters.  If you like the results,
> you can add it permanently by editing /boot/grub/grub.conf.

I gave this a try and my perception is that it helped a bit when switching
windows under high disk IO. It didn't seem to help much with programs
taking a long time to exit. I suspect this might be do to lots of dirty
pages stacking up and then a sync occuring. I'll try playing with the
value for how long they sit around and see if that helps some more.

Thanks for the help with setting the scheduler.

--
fedora-list mailing list
fedora-list@...
To unsubscribe: https://www.redhat.com/mailman/listinfo/fedora-list

Re: I/O Scheduling results in poor responsiveness

by Nathan Grennan-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Nathan Grennan wrote:

>
>  I figured it all out. The main issue was my use of a Firefox 3.0
> nightly. From Firefox 3.0b3 on Firefox is very fsync happy. As in ever
> time you load a page it fsync about eight times. Do things in the
> middle of a big write and performance goes all to hell. I have filed a
> bug upstream, https://bugzilla.mozilla.org/show_bug.cgi?id=421482 .
>
>  I ran across this idea by reading, http://kerneltrap.org/node/14148 .
> The first e-mail from Ingo as a reply to another one of his earlier
> e-mails mentions a case a lot like mine. Quad-Core, 4gb ram, and 30
> second pauses in vim. He mentions vim uses fsync. He mentions an
> option, but it isn't good enough. You have to also change set
> swapsync, like below.
>
> I then straced vim and found any hiccups in the output were directly
> related to when vim ran fsync. I set the options, and the problem
> seemed to go away.
>
> Finally I turned to Firefox 3.0 nightly, and straced it. I found the
> problem. I then went back to Firefox 2.0.0.12, and straced it. I found
> it didn't have the same problem. So as nice as Firefox 3.0b3 or later
> is, it is a recipe for unhappiness.
>
> set swapsync=sync
> set nofsync
>
>  But this is just a symptom of a bigger problem. As the Kernel Trap
> url above mentions, ext3 + fsync = crappiness. So my next step will be
> to talk to the right developers, learn as much as possible, and see if
> a solution can be found. Otherwise I may completely give up on ext3
> and move to another filesystem.
>
  The latest news is that this is most likely, because of Firefox 3.0b3+
use of sqlite. There was already another bug abotu poor zoom performance
which relates to sqlite.

https://bugzilla.mozilla.org/show_bug.cgi?id=417732

--
fedora-list mailing list
fedora-list@...
To unsubscribe: https://www.redhat.com/mailman/listinfo/fedora-list

Re: I/O Scheduling results in poor responsiveness

by Pasi Kärkkäinen :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Thu, Mar 06, 2008 at 11:36:49AM -0500, Bill Davidsen wrote:

> Nathan Grennan wrote:
> >    Why is the command below all that is needed to bring the system to
> >it's knees? Why doesn't the io scheduler, CFQ, which is supposed to be
> >all about fairness starve other processes? Example, if I open a new file
> >in vim, and hold down "i" while this is running it will pause the
> >display of new "i"s for seconds, sometimes until the dd write is
> >completely finished. Another example is applications like firefox,
> >thunderbird, xchat, and pidgin will stop refreshing for 10+ seconds.
> >
> > dd if=/dev/zero of=test-file bs=2M count=2048
> >
> > I understand the main difference between using oflag=direct or not
> >relates to if the io scheduler is used, and if the file is cached or
> >not. I can see this clearly by watching cached rise without
> >oflag=direct, stay the same with it, and go way down when I delete the
> >file after running dd without oflag=direct.
> >
> > The system in question is running Fedora 8. It is an E6600, 4gb memory,
> >and 2x300gb Seagate sata drives. The drives are setup with md raid 1,
> >and the filesystem is ext3. But I also see this with plenty of other
> >systems with more cpu, less cpu, less memory, raid, and no raid.
> >
> > I have tried various tweaks to sys.vm settings, tried changing the
> >scheduler to as or deadline. Nothing seem to get it to behave, other
> >than oflag=direct.
> >
> Known problem with the io schedulers, and discussed from time to time on
> the RAID list. The current io schedulers don't split drive access fairly
> between read and write, so when you get a huge batch of write queued
> reads suffer. In your case, the vi problem may be an issue of doing a
> write to the file and that write being at the end of the io queue.
>
> Note: the optimization is for throughput, not responsiveness, you may
> see more pleasing results with the deadline scheduler. You also may want
> to look at using NCQ and setting the queue_depth in /sys. I can't
> explain it without looking up the details, so there's something for you
> to check.
>

Hi!

Do you happen to know if it's possible to check current queue depth "in
use"? Meaning how many commands are currently queued..

-- Pasi

--
fedora-list mailing list
fedora-list@...
To unsubscribe: https://www.redhat.com/mailman/listinfo/fedora-list