Bug 3094 - POOR I/O perfomance on VIA chipsets
Summary: POOR I/O perfomance on VIA chipsets
Status: REJECTED INSUFFICIENT_DATA
Alias: None
Product: IO/Storage
Classification: Unclassified
Component: IDE (show other bugs)
Hardware: i386 Linux
: P2 high
Assignee: io_ide@kernel-bugs.osdl.org
URL:
Keywords:
: 3420 (view as bug list)
Depends on:
Blocks:
 
Reported: 2004-07-18 09:45 UTC by Ulrik Mikaelsson
Modified: 2010-05-21 05:47 UTC (History)
6 users (show)

See Also:
Kernel Version: seen from 2.6.0 through 2.6.8-rc1
Subsystem:
Regression: No
Bisected commit-id:


Attachments
Kernel output from /proc/kmsg while pressing Sysrq + T a bunch of times. (226.00 KB, text/plain)
2004-07-19 12:03 UTC, Ulrik Mikaelsson
Details
Dmesg on my Machine (14.53 KB, text/plain)
2004-10-05 12:45 UTC, Ulrik Mikaelsson
Details

Description Ulrik Mikaelsson 2004-07-18 09:45:26 UTC
Distribution: Gentoo Linux 
Hardware Environment: AMD Athlon XP/2500 
  00:00.0 Host bridge: VIA Technologies, Inc. VT8377 [KT400 AGP] Host Bridge 
  00:01.0 PCI bridge: VIA Technologies, Inc. VT8235 PCI Bridge 
  00:10.0 USB Controller: VIA Technologies, Inc. USB (rev 80) 
  00:10.1 USB Controller: VIA Technologies, Inc. USB (rev 80) 
  00:10.2 USB Controller: VIA Technologies, Inc. USB (rev 80) 
  00:10.3 USB Controller: VIA Technologies, Inc. USB 2.0 (rev 82) 
  00:11.0 ISA bridge: VIA Technologies, Inc. VT8235 ISA Bridge 
  00:11.1 IDE interface: VIA Technologies, Inc. 
VT82C586A/B/VT82C686/A/B/VT8233/A/C/VT8235 PIPC Bus Master IDE (rev 06) 
  00:11.5 Multimedia audio controller: VIA Technologies, Inc. VT8233/A/8235 
AC97 Audio Controller (rev 50) 
  00:12.0 Ethernet controller: VIA Technologies, Inc. VT6102 [Rhine-II] (rev 
74) 
  01:00.0 VGA compatible controller: nVidia Corporation NV31 [GeForce FX 5600] 
(rev a1) 
 
Software Environment:  
  glibc-2.3.3.20040420 
   
Problem Description: 
  Whenever I create some I/O load on my system, the system becomes REALLY slow 
and unresponsive. If I am at the same time playing some music or watching a 
movie, the increased I/O-load is enough to cause glitches in the audio and/or 
video. When observing "top", I see an IO-wait of 99.8% or something like that. 
 
  I assume this is related to the VIA IDE chip, since the problem does not 
show on any of my Intel-based machines, but is easily reproduced on a friends 
VIA-based machine. 
 
Steps to reproduce: 
  Copy a large file and try to listen to music at the same time on a VIA-based 
Motherboard.
Comment 1 Ulrik Mikaelsson 2004-07-19 12:03:03 UTC
Created attachment 3400 [details]
Kernel output from /proc/kmsg while pressing Sysrq + T a bunch of times.

This is kernel output with stack-traces from all active tasks at various
snapshots, while running "du -sh /usr" and playing an mp3 using Juk trough
Artsd. Observe the stack-traces containing reiser-entreis. The rest is probably
completely useless.

Comment if you want other debug output.
Comment 2 Diego Calleja 2004-08-27 11:31:25 UTC
Is DMA enabled while running the machine?

I've a similar IDE chipset and it works well even without any configuration, I
just let the kernel to set up the DMA.
Comment 3 Ulrik Mikaelsson 2004-08-27 12:31:57 UTC
Yes. DMA is turned on. One thing of interest, though: The sound-application that
seems to have by far largest issues with this is the arts-server from kde 3.2.
arts from kde 3.3 behaves slightly better, and plays somewhat better, but there
are still glitches on heavy I/O. Does system-I/O respect nice-levels, btw?
Mplayer seems to be the app that is least affected by the I/O. Perhaps
differences in how the audio-application feeds ALSA/OSS makes the problem more
or less apparent?

The system is still really slow and unresponsive. And though arts from Kde 3.3
was an improvement of the problem, it's still not solved.
Comment 4 Bartlomiej Zolnierkiewicz 2004-10-05 12:40:39 UTC
dmesg is needed
Comment 5 Ulrik Mikaelsson 2004-10-05 12:45:55 UTC
Created attachment 3767 [details]
Dmesg on my Machine
Comment 6 Bartlomiej Zolnierkiewicz 2004-10-05 12:58:03 UTC
*** Bug 3420 has been marked as a duplicate of this bug. ***
Comment 7 Bartlomiej Zolnierkiewicz 2004-12-08 10:18:41 UTC
*** Bug 3420 has been marked as a duplicate of this bug. ***
Comment 8 Adrian Bunk 2006-01-13 08:31:12 UTC
Is this issue still present in kernel 2.6.15?
Comment 9 Adrian Bunk 2006-04-22 10:12:40 UTC
Please reopen this bug if it's still present in kernel 2.6.16.
Comment 10 Zenith88 2010-05-18 22:40:28 UTC
Still present in 2.6.32-21 even in Intel chipset: 845 w/ P4-3.06HT

Copying files from SATA DVDRW to IDE HDD takes 100% on one virtual core and 30% on another. Copying files over the network creates same kind of CPU load.

Reopen this as it needs to be looked after.
Comment 11 Zenith88 2010-05-18 23:37:57 UTC
Actually, I underestimated the load on the 2nd core - it jumps from 50% to 90% all the time. I can't see how this can be 'rejected insuff data' where there are plenty of reporters willing to supply. Please reopen.
Comment 12 Alan 2010-05-18 23:44:22 UTC
Is your box unresponsive or just loaded - the latter isn't a surprise especially if using an IDE style interface.
Comment 13 Zenith88 2010-05-19 00:47:26 UTC
It is a surprise in DMA mode.
Comment 14 Zenith88 2010-05-19 00:51:12 UTC
Re-testing with SATA destination - same 100% load.
Comment 15 Robert Hancock 2010-05-19 00:59:08 UTC
Zenith88, your problem is highly unlikely to be related to the problem this ancient bug report was about, despite some superficial resemblance in the symptoms. Please create a new bug report and provide more details (dmesg output from bootup, for starters).
Comment 16 Zenith88 2010-05-19 01:12:50 UTC
I am on Linux since 1995 and sick and tired of fighting over bug reports. The recent 'shortcomings' have discredited Linux enough for you folks to get moving on resolving serious issues instead of pumping functionality that nobody needs into buggy kernels while maintaining compatibility with hardware which became obsolete before my kids were born. You have the information - run with it. I don't know why I even bother reporting bugs. Everything I touch does not work! Can't burn DVDs, getting MIDI to work is a royal PITA, Firefox is 10x slower than under Windows, GRUB is chronically incapable of dual-boot, 3D is a joke... It all rhymes with 'guinea pig'.
Comment 17 Robert Hancock 2010-05-19 01:14:33 UTC
You haven't provided any useful information whatsoever that can determine what is causing your problem (which is almost certainly nothing to do with the one reported here 6 years ago). Without that, you are just whining, and this is not the place for that.
Comment 18 Zenith88 2010-05-19 03:08:16 UTC
It just shows to the world that 6 years is not ehough for you to fix piss poor IO subsystem. Ask Microsoft for help - here's the idea.
Comment 19 Artem S. Tashkinov 2010-05-19 06:28:57 UTC
Zenith88,

Stop whining, post your `dmesg`, `lspci` and `lspci -vvv` data or even start a new bug report WITH the required information.

P.S. And I have to agree with you IO subsystem in Linux needs a lot of work. "IO wait" time is just an inexcusable mess.
Comment 20 Alan 2010-05-19 11:16:44 UTC
I/O wait is measuring the amount of time the machine is waiting for the disk, not how much it is using CPU. Disks haven't gotten much faster in the past ten years (SSD aside) while processors dramatically did. There isn't a lot that can be done about the speed of disks,. Getting the kernel to schedule I/O better can help a bit but the fundamental problem is that a disk on a good day can only really do about 200 operations/second.
Comment 21 Artem S. Tashkinov 2010-05-19 17:17:37 UTC
(In reply to comment #20)
> I/O wait is measuring the amount of time the machine is waiting for the disk,
> not how much it is using CPU. Disks haven't gotten much faster in the past
> ten
> years (SSD aside) while processors dramatically did. There isn't a lot that
> can
> be done about the speed of disks,. Getting the kernel to schedule I/O better
> can help a bit but the fundamental problem is that a disk on a good day can
> only really do about 200 operations/second.

What really enrages people (and me too) is that under all circumstances CPU usage shown by `top` is always 100% whenever a single process is writing or reading to a spinning storage at full speed.

It is _wrong_ and it is _misleading_ because then people think that the Linux kernel sucks at I/O operations, everyone instantly recollects Windows experience where (e.g. on my own PC) HDD access causes maximum 2% CPU usage.

Fix the kernel idle time computation, otherwise people will keep complaining indefinitely.

`cat /dev/sda > /dev/null` results in load average rapidly climbing to 1.32 and counting on my PC, that's just insane and wrong.
Comment 22 Alan 2010-05-19 20:53:44 UTC
It is how load average is defined in the standards, and how it has been defined since the 1970s. Tools rely on that exact behaviour

So WONTFIX.
Comment 23 Zenith88 2010-05-19 23:10:08 UTC
The worst thing I can do to linux is to shut up. Enjoy yourself, Alan. That's the only thing left.
Comment 24 Artem S. Tashkinov 2010-05-20 05:28:07 UTC
(In reply to comment #22)
> It is how load average is defined in the standards, and how it has been
> defined
> since the 1970s. Tools rely on that exact behaviour
> 

Alan, IO wait should not consume 100% of (even imaginary) CPU time - that's _wrong_, in fact I don't know any other OS that treats IO activity this way.

CPU is _not_ 100% busy waiting for IO operations to complete, however people using Linux are made believe it _is_ the case.
Comment 25 Tejun Heo 2010-05-20 09:08:51 UTC
Artem, if some tool is showing iowait as cpu time consumed, that tool needs to be fixed.  The kernel is simply exporting what it's doing.  The presentation is upto system monitoring tools.  I recall lots of tools which kind of gave the wrong signals years ago but these days I don't recall seeing such things too often.  There always is a problem with people who are fixated on weird stuff but I don't think it's possible to satisfy everyone.  We can't show cached as free memory for people who claim the kernel is wasting all memory, right?  That's a very bad solution for a virtually non-existing problem.

Zenith88, if Linux grinds your gears that much and makes you angry, go ahead and use or do something else.  Life is short.  Do something you like and, well, more importantly to me, don't get in the way of other people at the very least.  If you wanna actually delve into solving a technical issue, please stay technical from this point on.

Thanks.
Comment 26 Artem S. Tashkinov 2010-05-20 15:49:11 UTC
(In reply to comment #25)
> Artem, if some tool is showing iowait as cpu time consumed, that tool needs
> to
> be fixed.  The kernel is simply exporting what it's doing.  The presentation
> is
> upto system monitoring tools.  I recall lots of tools which kind of gave the
> wrong signals years ago but these days I don't recall seeing such things too
> often.  There always is a problem with people who are fixated on weird stuff
> but I don't think it's possible to satisfy everyone.  We can't show cached as
> free memory for people who claim the kernel is wasting all memory, right? 
> That's a very bad solution for a virtually non-existing problem.
> 

Are you aware of any other tools except htop and top (from procps package)? They both show IO wait as CPU load.

What about kernel itself? load average to my knowledge is statistical data exported by kernel itself and according to your words this data can be trusted, or I am missing something here?  Why do we have this kind of situation (I need to quote myself here):

(In reply to comment #21)
> 
> `cat /dev/sda > /dev/null` results in load average rapidly climbing to 1.32
> and
> counting on my PC, that's just insane and wrong.

I've just rerun this test for 10 minutes and load average climbed to 2.14 on my 4 cores SMP system (2 real cores, 2 HT's). That means that intensive I/O operations consume a mind boggling 50% of my CPU power and mind that I have a really fast CPU (3.4GHz Intel Core i5).

On this issue there are two almost identical bugs filed by me, bug 14531 and bug 14648 but it seems like no one cares.
Comment 27 Tejun Heo 2010-05-20 16:14:28 UTC
(In reply to comment #26)
> Are you aware of any other tools except htop and top (from procps package)?
> They both show IO wait as CPU load.

No, they don't.  They clearly distinguish between sys, user, idle and iowait.  You're confusing cpu usage and load average.

> What about kernel itself? load average to my knowledge is statistical data
> exported by kernel itself and according to your words this data can be
> trusted,
> or I am missing something here?  Why do we have this kind of situation (I
> need
> to quote myself here):

It just seems that you're misunderstanding what 'load' means.

> I've just rerun this test for 10 minutes and load average climbed to 2.14 on
> my
> 4 cores SMP system (2 real cores, 2 HT's). That means that intensive I/O
> operations consume a mind boggling 50% of my CPU power and mind that I have a
> really fast CPU (3.4GHz Intel Core i5).

Load is the average number of threads which are running or willing to run.  It doesn't have much to do with actual cpu usage.  Load average can and does go way above the number of processors.  It doesn't mean the kernel is creating virtual cpus for you.

> On this issue there are two almost identical bugs filed by me, bug 14531 and
> bug 14648 but it seems like no one cares.

If you don't clearly understand the issue technically (which is expected for anyone who's not working on the specific area), it usually is a much better idea to report what problem you're experiencing as a user.  Sure, your conjecture or gut feeling can be helpful and it's a good idea to present them but please always keep in mind the possibility of them being completely wrong and/or irrelevant.

Good: While a large file is being copied, the system is very sluggish.  I can't hardly move mouse pointer and simple command like ls takes minutes to complete.  While this is happening, the load avg is such and such and I suspect this might have something to do with the sluggishness.  My system information is such and such...

Bad: While a large file is being copied, 150% of my cpu is being used (misinterpreting load avg as cpu usage), please fix.

So, what actual problem are you seeing other than the expected high load avg?
Comment 28 Artem S. Tashkinov 2010-05-20 16:56:48 UTC
> Load is the average number of threads which are running or willing to run. 
> It
> doesn't have much to do with actual cpu usage.  Load average can and does go
> way above the number of processors.  It doesn't mean the kernel is creating
> virtual cpus for you.

I haven't said anything about creation of virtual CPUs, but I've always thought that one can estimate real system load (it maybe above 100%) by diving load average by number of CPUs and for "normal" tasks like mathematical calculations this principle holds true. E.g. :

`cat < /dev/urandom > /dev/null` on all Unixes will result in 1.0 load average. That means that one CPU or CPU core is 100% busy. Run this task twice and you'll get 2.0 load average or for 4 cores system that means half of the CPUs are busy churning random numbers.

Why this principle doesn't hold true for tasks which are bound by IO operations is beyond my understanding. Why IO wait time is even included in load average statistics is also beyond my understanding.

So, on behalf of almost all Linux users I have to ask this hard question: how one is supposed to understand/interpret top/htop/whatever_else_utility numbers when IDLE time is shown as _0%_ for _all_ IO intensive operations? In OS, everyone hates but most people still use, IO wait time is shown as kernel time but otherwise CPU is mostly _idle_ (90% of more, depending on chipset, IO drivers, etc) - in Linux CPU is _100% busy_ - and it makes no sense, e.g.

`cat /dev/sda > /dev/null`

will result in top showing:

Cpu(s):  4.3%us,  4.0%sy,  0.0%ni, 75.4%id, 15.8%wa,  0.0%hi,  0.5%si,  0.0%st

As you can clearly see my 4 CPUs system is approximately 25% busy, i.e. one CPU is 100% busy doing something.

So, _please_, fix the kernel, fix top/htop/load_average, fix whatever you want, but, please, stop confusing people with numbers that make no sense.
Comment 29 Tejun Heo 2010-05-20 17:07:09 UTC
You're mixing load average and cpu usage.  Two are somewhat related but not the same thing.  If you don't understand that, just stay away from system monitoring utilities which distinguish idle and iowait.  There are quite a few pretty monitoring graphical widgets which don't draw attention to iowait (either draw it the same as idle or in subtle tone).  Your request to change the meaning of load avg is not different from requesting the kernel to add cached/buffered memory to free memory as some people can't wrap their heads around what those numbers actually mean.

I don't think this is going anywhere and am fairly sure that further bug reports on the subject won't lead anywhere.

Let's meet again in a more productive discussion.  Thanks.
Comment 30 Artem S. Tashkinov 2010-05-20 17:42:59 UTC
In the second part of my message I referred to top/htop output which clearly shows that IO wait is treated like it is _real_ CPU usage but you just ignored it and now

(In reply to comment #29)
> I don't think this is going anywhere and am fairly sure that further bug
> reports on the subject won't lead anywhere.
> 
> Let's meet again in a more productive discussion.  Thanks.

you sound like "even though most sane people trust their eyes, in Linux you should not trust your eyes, everything is so complicated we don't even have utilities which can show real CPU usage".

And until we have such utilities and metrics which allow to instantly _see_ and _not_ deduct real CPU usage people _will_ keep on bitching that Linux is broken with my motherboard/chipset/HDD Linux because IO wait OR Linux gobbles down my precious CPU cycles.

Just run Windows and see how easier it's to _see_ whether system is busy or occupied. In Linux everything is such a way no layman (except kernel developers) can really understand.

If you insist that everything is just fine and works correctly as intended I have to resign and keep silent from now on and realize that Linux is only meant for über geeks.
Comment 31 Alan 2010-05-20 18:51:27 UTC
It's defined by a standard. It's expected behaviour since the 1970s. "I don't like it" doesn't trump standards, especially given most users *expect* the current behaviour.

The kernel provides a lot of information including cpu loading per cpu, I/O loadings and the like. It' up to applications how they display it.
Comment 32 Tejun Heo 2010-05-20 19:02:17 UTC
Artem, just use gnome-system-monitor which is as easy as windows.  Don't think about load avg.  Just forget about it.
Comment 33 Zenith88 2010-05-20 23:45:21 UTC
This would have been credible if linux was not so painfully slow in the same operations.
Comment 34 Artem S. Tashkinov 2010-05-21 05:47:53 UTC
(In reply to comment #32)
> Artem, just use gnome-system-monitor which is as easy as windows.  Don't
> think
> about load avg.  Just forget about it.

That's a relief, thank you. And I'm terribly sorry for adding htop into the mix, it indeed correctly shows CPU usage.

(In reply to comment #31)
> It's defined by a standard. It's expected behaviour since the 1970s. "I don't
> like it" doesn't trump standards, especially given most users *expect* the
> current behaviour.

If load average includes IO wait time, let it be. Thank you for the information.

Note You need to log in before you can comment on or make changes to this bug.