Bug 3094
Summary: | POOR I/O perfomance on VIA chipsets | ||
---|---|---|---|
Product: | IO/Storage | Reporter: | Ulrik Mikaelsson (rawler) |
Component: | IDE | Assignee: | io_ide (io_ide) |
Status: | REJECTED INSUFFICIENT_DATA | ||
Severity: | high | CC: | alan, aros, bunk, diegocg, tj, zenith22.22.22 |
Priority: | P2 | ||
Hardware: | i386 | ||
OS: | Linux | ||
Kernel Version: | seen from 2.6.0 through 2.6.8-rc1 | Subsystem: | |
Regression: | No | Bisected commit-id: | |
Attachments: |
Kernel output from /proc/kmsg while pressing Sysrq + T a bunch of times.
Dmesg on my Machine |
Description
Ulrik Mikaelsson
2004-07-18 09:45:26 UTC
Created attachment 3400 [details]
Kernel output from /proc/kmsg while pressing Sysrq + T a bunch of times.
This is kernel output with stack-traces from all active tasks at various
snapshots, while running "du -sh /usr" and playing an mp3 using Juk trough
Artsd. Observe the stack-traces containing reiser-entreis. The rest is probably
completely useless.
Comment if you want other debug output.
Is DMA enabled while running the machine? I've a similar IDE chipset and it works well even without any configuration, I just let the kernel to set up the DMA. Yes. DMA is turned on. One thing of interest, though: The sound-application that seems to have by far largest issues with this is the arts-server from kde 3.2. arts from kde 3.3 behaves slightly better, and plays somewhat better, but there are still glitches on heavy I/O. Does system-I/O respect nice-levels, btw? Mplayer seems to be the app that is least affected by the I/O. Perhaps differences in how the audio-application feeds ALSA/OSS makes the problem more or less apparent? The system is still really slow and unresponsive. And though arts from Kde 3.3 was an improvement of the problem, it's still not solved. dmesg is needed Created attachment 3767 [details]
Dmesg on my Machine
*** Bug 3420 has been marked as a duplicate of this bug. *** *** Bug 3420 has been marked as a duplicate of this bug. *** Is this issue still present in kernel 2.6.15? Please reopen this bug if it's still present in kernel 2.6.16. Still present in 2.6.32-21 even in Intel chipset: 845 w/ P4-3.06HT Copying files from SATA DVDRW to IDE HDD takes 100% on one virtual core and 30% on another. Copying files over the network creates same kind of CPU load. Reopen this as it needs to be looked after. Actually, I underestimated the load on the 2nd core - it jumps from 50% to 90% all the time. I can't see how this can be 'rejected insuff data' where there are plenty of reporters willing to supply. Please reopen. Is your box unresponsive or just loaded - the latter isn't a surprise especially if using an IDE style interface. It is a surprise in DMA mode. Re-testing with SATA destination - same 100% load. Zenith88, your problem is highly unlikely to be related to the problem this ancient bug report was about, despite some superficial resemblance in the symptoms. Please create a new bug report and provide more details (dmesg output from bootup, for starters). I am on Linux since 1995 and sick and tired of fighting over bug reports. The recent 'shortcomings' have discredited Linux enough for you folks to get moving on resolving serious issues instead of pumping functionality that nobody needs into buggy kernels while maintaining compatibility with hardware which became obsolete before my kids were born. You have the information - run with it. I don't know why I even bother reporting bugs. Everything I touch does not work! Can't burn DVDs, getting MIDI to work is a royal PITA, Firefox is 10x slower than under Windows, GRUB is chronically incapable of dual-boot, 3D is a joke... It all rhymes with 'guinea pig'. You haven't provided any useful information whatsoever that can determine what is causing your problem (which is almost certainly nothing to do with the one reported here 6 years ago). Without that, you are just whining, and this is not the place for that. It just shows to the world that 6 years is not ehough for you to fix piss poor IO subsystem. Ask Microsoft for help - here's the idea. Zenith88, Stop whining, post your `dmesg`, `lspci` and `lspci -vvv` data or even start a new bug report WITH the required information. P.S. And I have to agree with you IO subsystem in Linux needs a lot of work. "IO wait" time is just an inexcusable mess. I/O wait is measuring the amount of time the machine is waiting for the disk, not how much it is using CPU. Disks haven't gotten much faster in the past ten years (SSD aside) while processors dramatically did. There isn't a lot that can be done about the speed of disks,. Getting the kernel to schedule I/O better can help a bit but the fundamental problem is that a disk on a good day can only really do about 200 operations/second. (In reply to comment #20) > I/O wait is measuring the amount of time the machine is waiting for the disk, > not how much it is using CPU. Disks haven't gotten much faster in the past > ten > years (SSD aside) while processors dramatically did. There isn't a lot that > can > be done about the speed of disks,. Getting the kernel to schedule I/O better > can help a bit but the fundamental problem is that a disk on a good day can > only really do about 200 operations/second. What really enrages people (and me too) is that under all circumstances CPU usage shown by `top` is always 100% whenever a single process is writing or reading to a spinning storage at full speed. It is _wrong_ and it is _misleading_ because then people think that the Linux kernel sucks at I/O operations, everyone instantly recollects Windows experience where (e.g. on my own PC) HDD access causes maximum 2% CPU usage. Fix the kernel idle time computation, otherwise people will keep complaining indefinitely. `cat /dev/sda > /dev/null` results in load average rapidly climbing to 1.32 and counting on my PC, that's just insane and wrong. It is how load average is defined in the standards, and how it has been defined since the 1970s. Tools rely on that exact behaviour So WONTFIX. The worst thing I can do to linux is to shut up. Enjoy yourself, Alan. That's the only thing left. (In reply to comment #22) > It is how load average is defined in the standards, and how it has been > defined > since the 1970s. Tools rely on that exact behaviour > Alan, IO wait should not consume 100% of (even imaginary) CPU time - that's _wrong_, in fact I don't know any other OS that treats IO activity this way. CPU is _not_ 100% busy waiting for IO operations to complete, however people using Linux are made believe it _is_ the case. Artem, if some tool is showing iowait as cpu time consumed, that tool needs to be fixed. The kernel is simply exporting what it's doing. The presentation is upto system monitoring tools. I recall lots of tools which kind of gave the wrong signals years ago but these days I don't recall seeing such things too often. There always is a problem with people who are fixated on weird stuff but I don't think it's possible to satisfy everyone. We can't show cached as free memory for people who claim the kernel is wasting all memory, right? That's a very bad solution for a virtually non-existing problem. Zenith88, if Linux grinds your gears that much and makes you angry, go ahead and use or do something else. Life is short. Do something you like and, well, more importantly to me, don't get in the way of other people at the very least. If you wanna actually delve into solving a technical issue, please stay technical from this point on. Thanks. (In reply to comment #25) > Artem, if some tool is showing iowait as cpu time consumed, that tool needs > to > be fixed. The kernel is simply exporting what it's doing. The presentation > is > upto system monitoring tools. I recall lots of tools which kind of gave the > wrong signals years ago but these days I don't recall seeing such things too > often. There always is a problem with people who are fixated on weird stuff > but I don't think it's possible to satisfy everyone. We can't show cached as > free memory for people who claim the kernel is wasting all memory, right? > That's a very bad solution for a virtually non-existing problem. > Are you aware of any other tools except htop and top (from procps package)? They both show IO wait as CPU load. What about kernel itself? load average to my knowledge is statistical data exported by kernel itself and according to your words this data can be trusted, or I am missing something here? Why do we have this kind of situation (I need to quote myself here): (In reply to comment #21) > > `cat /dev/sda > /dev/null` results in load average rapidly climbing to 1.32 > and > counting on my PC, that's just insane and wrong. I've just rerun this test for 10 minutes and load average climbed to 2.14 on my 4 cores SMP system (2 real cores, 2 HT's). That means that intensive I/O operations consume a mind boggling 50% of my CPU power and mind that I have a really fast CPU (3.4GHz Intel Core i5). On this issue there are two almost identical bugs filed by me, bug 14531 and bug 14648 but it seems like no one cares. (In reply to comment #26) > Are you aware of any other tools except htop and top (from procps package)? > They both show IO wait as CPU load. No, they don't. They clearly distinguish between sys, user, idle and iowait. You're confusing cpu usage and load average. > What about kernel itself? load average to my knowledge is statistical data > exported by kernel itself and according to your words this data can be > trusted, > or I am missing something here? Why do we have this kind of situation (I > need > to quote myself here): It just seems that you're misunderstanding what 'load' means. > I've just rerun this test for 10 minutes and load average climbed to 2.14 on > my > 4 cores SMP system (2 real cores, 2 HT's). That means that intensive I/O > operations consume a mind boggling 50% of my CPU power and mind that I have a > really fast CPU (3.4GHz Intel Core i5). Load is the average number of threads which are running or willing to run. It doesn't have much to do with actual cpu usage. Load average can and does go way above the number of processors. It doesn't mean the kernel is creating virtual cpus for you. > On this issue there are two almost identical bugs filed by me, bug 14531 and > bug 14648 but it seems like no one cares. If you don't clearly understand the issue technically (which is expected for anyone who's not working on the specific area), it usually is a much better idea to report what problem you're experiencing as a user. Sure, your conjecture or gut feeling can be helpful and it's a good idea to present them but please always keep in mind the possibility of them being completely wrong and/or irrelevant. Good: While a large file is being copied, the system is very sluggish. I can't hardly move mouse pointer and simple command like ls takes minutes to complete. While this is happening, the load avg is such and such and I suspect this might have something to do with the sluggishness. My system information is such and such... Bad: While a large file is being copied, 150% of my cpu is being used (misinterpreting load avg as cpu usage), please fix. So, what actual problem are you seeing other than the expected high load avg? > Load is the average number of threads which are running or willing to run.
> It
> doesn't have much to do with actual cpu usage. Load average can and does go
> way above the number of processors. It doesn't mean the kernel is creating
> virtual cpus for you.
I haven't said anything about creation of virtual CPUs, but I've always thought that one can estimate real system load (it maybe above 100%) by diving load average by number of CPUs and for "normal" tasks like mathematical calculations this principle holds true. E.g. :
`cat < /dev/urandom > /dev/null` on all Unixes will result in 1.0 load average. That means that one CPU or CPU core is 100% busy. Run this task twice and you'll get 2.0 load average or for 4 cores system that means half of the CPUs are busy churning random numbers.
Why this principle doesn't hold true for tasks which are bound by IO operations is beyond my understanding. Why IO wait time is even included in load average statistics is also beyond my understanding.
So, on behalf of almost all Linux users I have to ask this hard question: how one is supposed to understand/interpret top/htop/whatever_else_utility numbers when IDLE time is shown as _0%_ for _all_ IO intensive operations? In OS, everyone hates but most people still use, IO wait time is shown as kernel time but otherwise CPU is mostly _idle_ (90% of more, depending on chipset, IO drivers, etc) - in Linux CPU is _100% busy_ - and it makes no sense, e.g.
`cat /dev/sda > /dev/null`
will result in top showing:
Cpu(s): 4.3%us, 4.0%sy, 0.0%ni, 75.4%id, 15.8%wa, 0.0%hi, 0.5%si, 0.0%st
As you can clearly see my 4 CPUs system is approximately 25% busy, i.e. one CPU is 100% busy doing something.
So, _please_, fix the kernel, fix top/htop/load_average, fix whatever you want, but, please, stop confusing people with numbers that make no sense.
You're mixing load average and cpu usage. Two are somewhat related but not the same thing. If you don't understand that, just stay away from system monitoring utilities which distinguish idle and iowait. There are quite a few pretty monitoring graphical widgets which don't draw attention to iowait (either draw it the same as idle or in subtle tone). Your request to change the meaning of load avg is not different from requesting the kernel to add cached/buffered memory to free memory as some people can't wrap their heads around what those numbers actually mean. I don't think this is going anywhere and am fairly sure that further bug reports on the subject won't lead anywhere. Let's meet again in a more productive discussion. Thanks. In the second part of my message I referred to top/htop output which clearly shows that IO wait is treated like it is _real_ CPU usage but you just ignored it and now (In reply to comment #29) > I don't think this is going anywhere and am fairly sure that further bug > reports on the subject won't lead anywhere. > > Let's meet again in a more productive discussion. Thanks. you sound like "even though most sane people trust their eyes, in Linux you should not trust your eyes, everything is so complicated we don't even have utilities which can show real CPU usage". And until we have such utilities and metrics which allow to instantly _see_ and _not_ deduct real CPU usage people _will_ keep on bitching that Linux is broken with my motherboard/chipset/HDD Linux because IO wait OR Linux gobbles down my precious CPU cycles. Just run Windows and see how easier it's to _see_ whether system is busy or occupied. In Linux everything is such a way no layman (except kernel developers) can really understand. If you insist that everything is just fine and works correctly as intended I have to resign and keep silent from now on and realize that Linux is only meant for über geeks. It's defined by a standard. It's expected behaviour since the 1970s. "I don't like it" doesn't trump standards, especially given most users *expect* the current behaviour. The kernel provides a lot of information including cpu loading per cpu, I/O loadings and the like. It' up to applications how they display it. Artem, just use gnome-system-monitor which is as easy as windows. Don't think about load avg. Just forget about it. This would have been credible if linux was not so painfully slow in the same operations. (In reply to comment #32) > Artem, just use gnome-system-monitor which is as easy as windows. Don't > think > about load avg. Just forget about it. That's a relief, thank you. And I'm terribly sorry for adding htop into the mix, it indeed correctly shows CPU usage. (In reply to comment #31) > It's defined by a standard. It's expected behaviour since the 1970s. "I don't > like it" doesn't trump standards, especially given most users *expect* the > current behaviour. If load average includes IO wait time, let it be. Thank you for the information. |