Latest working kernel version: 2.6.25 Earliest failing kernel version: 2.6.24 Distribution: Slackware 10-12 First machine: CPU - AMD Athlon3600+(2ГГц) Chipset - nForce 6150(MCP51) RAM - 3G DDR2 Video - internal GeForce6150 Kernel - 2.6.25.4(own built) Copy speed - 1.7GByte/s on another kernel: Kernel - 2.6.23.5(own built) Copy speed - 43.5GByte/s -------------------------------------------- Second machine: CPU - PII-350 MB i440BX RAM - 128M SDRAM Video - 3DFX Voodoo3 Kernel - 2.6.21.5(Vanila from slackware distribution) Copy speed - 11.3GByte/s Steps to reproduce: dd if=/dev/zero of=/dev/null bs=16M count=10000
Reply-To: akpm@linux-foundation.org (switched to email. Please respond via emailed reply-to-all, not via the bugzilla web interface). On Thu, 24 Jul 2008 10:57:42 -0700 (PDT) bugme-daemon@bugzilla.kernel.org wrote: > http://bugzilla.kernel.org/show_bug.cgi?id=11156 > > Summary: Old kernels copy memory faster than new > Product: IO/Storage > Version: 2.5 > KernelVersion: 2.6.24, 2.6.25 > Platform: All > OS/Version: Linux > Tree: Mainline > Status: NEW > Severity: normal > Priority: P1 > Component: Block Layer > AssignedTo: axboe@kernel.dk > ReportedBy: smal.root@gmail.com > > > Latest working kernel version: 2.6.25 > Earliest failing kernel version: 2.6.24 > Distribution: Slackware 10-12 > > First machine: > CPU - AMD Athlon3600+(2______) > Chipset - nForce 6150(MCP51) > RAM - 3G DDR2 > Video - internal GeForce6150 > Kernel - 2.6.25.4(own built) > Copy speed - 1.7GByte/s > > on another kernel: > Kernel - 2.6.23.5(own built) > Copy speed - 43.5GByte/s > -------------------------------------------- > Second machine: > CPU - PII-350 > MB i440BX > RAM - 128M SDRAM > Video - 3DFX Voodoo3 > Kernel - 2.6.21.5(Vanila from slackware distribution) > Copy speed - 11.3GByte/s > > Steps to reproduce: > dd if=/dev/zero of=/dev/null bs=16M count=10000 > lol. OK, who did that? Perhaps ZERO_PAGE changes?
>>(switched to email. Please respond via emailed reply-to-all, not via the bugzilla web interface). i don't like mail list. it not humanoidal. >>lol. OK, who did that? court tester. >>Perhaps ZERO_PAGE changes? am not kernel developer. i will trace program tomorrow
Probably ZERO_PAGE(In reply to comment #2) > >>(switched to email. Please respond via emailed reply-to-all, not via the > bugzilla web interface). > > i don't like mail list. it not humanoidal. > > >>lol. OK, who did that? > court tester. > > >>Perhaps ZERO_PAGE changes? > am not kernel developer. i will trace program tomorrow Probably is ZERO_PAGE changes. Thank you for noticing this and reporting it, but I think the issue is that we just changed the way /dev/zero (and a few other things) works. It is known that it will be slower in benchmarks like this, but should be OK for more useful work. Rather than use /dev/zero as the input, maybe you could just use some malloc allocated memory or a tmpfs file for example. If the slowdown disappears, that would confirm it is the /dev/zero changes.
Mmm >>extern unsigned long empty_zero_page[PAGE_SIZE/sizeof(unsigned long)]; every call division? but for 32-bit x86 cpu unsigned long is 32bit. PS. haveno time for test now. am wanna try to build kernel with old ZERO_PAGE and look at result... i think tomorrow
>>or a tmpfs file for example on /dev/ram the fastest PC show better result.
The issue is that the old code to read from /dev/zero would do tricks to make it very fast, but it was a significant complication to the virtual memory manager, which was deemed not useful for real world applications. Performance critical code would not be reading swaths of zeroes like this (because it is just useless work, it is already known to be a zero result). Of course there may be some corner cases where some real workload suffers, but we have not run into one yet, and this does not look like one either. But thanks for reporting. It is very good to know people are keeping an eye on things like this, so it is very helpful. Can we close this bug?
On Thu, 24 Jul 2008, Andrew Morton wrote: > > http://bugzilla.kernel.org/show_bug.cgi?id=11156 > > > > Kernel - 2.6.25.4(own built) > > Copy speed - 1.7GByte/s > > > > Kernel - 2.6.23.5(own built) > > Copy speed - 43.5GByte/s > > > > Steps to reproduce: > > dd if=/dev/zero of=/dev/null bs=16M count=10000 > > lol. OK, who did that? > > Perhaps ZERO_PAGE changes? Yes, the ZERO_PAGE changes: readprofile clearly shows lots of time in clear_user() on 2.6.24 onwards, clearing each page instead of using the ZERO_PAGE. I see Nick has already answered this, and the bug is now closed (guess he's on 2.6.23 whereas I'm on later ;). I agree with him, copying from /dev/zero to /dev/null is not an operation which deserves VM tricks to optimize; but I wanted to add one point. The particular awfulness of those dd rates (on machines I've tried I see new kernels as 10 to 30 times worse than old kernels at that test) owes a lot to the large blocksize (16M) being used. That blocksize will not fit in the processor's memory cache, so repeatedly clearing the pages is very slow. Bring the blocksize down to something that easily fits in the L2 cache, perhaps 1M or 256k, and new kernels then appear only twice(ish) as bad as old. Nothing to be proud of, but not nearly so bad as the bs=16M case. Hugh