Most recent kernel where this bug did *NOT* occur: 2.6.16 for PS3 Distribution: System slow down with new mailined version of PS3 kernel. Old 2.6.16 version of kernel hich is relased with "Linux Distributor Starter's Kit v1.1" don't have this problem. Hardware Environment: I used Japanese PS3 60GB mode to reproduce the problem. Software Environment: linux/kernel/git/geoff/ps3-linux.git tree d2bada44de47b2a7bf7a7d2a11ba28b82589849a Problem Description: Steps to reproduce: 1. Run the following script #! /bin/sh -x for sz in 512; do sync;sync; dd if=/dev/zero of=/var/tmp/${sz}mb bs=1M count=$sz sync;sync; for i in 1 2 3 4 5; do dd of=/dev/null if=/var/tmp/${sz}mb bs=4K sync;sync; time md5sum /var/tmp/${sz}mb sync;sync; done done 2 compare results with 2.6.16 kernel and 2.6.21-rc4 kernel With 2.6.16, I got + dd of=/dev/null if=/var/tmp/512mb bs=4K 131072+0 records in 131072+0 records out 536870912 bytes (537 MB) copied, 18.464 seconds, 29.1 MB/s + sync + sync + md5sum /var/tmp/512mb aa559b4e3523a6c931f08f4df52d58f2 /var/tmp/512mb real 0m18.476s user 0m2.912s sys 0m1.188s However with 2.6.21-rc4, it shows just half of dd speed. + dd of=/dev/null if=/var/tmp/512mb bs=4K 131072+0 records in 131072+0 records out 536870912 bytes (537 MB) copied, 34.2244 seconds, 15.7 MB/s + sync + sync + md5sum /var/tmp/512mb aa559b4e3523a6c931f08f4df52d58f2 /var/tmp/512mb real 0m23.794s user 0m4.384s sys 0m1.727s
Created attachment 10939 [details] a script to reproduce the problem
Created attachment 10940 [details] Result with 2.6.16 kernel
Created attachment 10941 [details] result with 2.6.21-rc4
- We started to see the problem in 2.6.19 after enabling the 2nd PPE thread. - Before we used only one PPE thread on 2.6.19, and we didn't notice any interactive sluggyness (the storage driver wasn't ported to 2.6.19 before that, so no data about that). - If interrupts are routed to the 2nd PPE thread, the system is much slower. - 2.6.19 distributed the interrupts randomly over the 2 PPEs. Depending on which interrupt(s) got routed to the 2nd PPE thread, the system was slower or faster. E.g. If the gelic (Ethernet) interrupt is routed to the 2nd PPE and you use NFS root, it's very slow. - The current kernel routes all interrupts to the 1st PPE thread (cfr. `if (cpu == PS3_BINDING_CPU_ANY) cpu = 0;'), but it's still slower then 2.6.16. - You can easily test the additional slowdown by changing the default to `cpu = 1;'. - 2.6.16 distributed the interrupts randomly over the 2 PPEs, too. So just routing interrupts to the 2nd PPE thread is not the sole cause of the problem.
This bug is come from hypervisor side behavior. Geoff already pushed the fixes to kernel.org: commit ps3-linux-patches-709611b712bd6f036ed363fe636a25220a2942e1 commit ps3-linux-dc41a7414189b43747100a50c5155d82846227bb The relevant fixes are listed below. ps3-wip/ps3-fix-slowdown-bug.diff ps3-wip/ps3snd-kill-iopte-failure.diff ps3-wip/ps3snd-fix-noise.diff set CONFIG_DEBUG_SLAB=n