Bug 8260

Summary: System slow down with ps3-linux git tree kernel
Product: Platform Specific/Hardware Reporter: Hiroyuki Machida (Hiroyuki.Mach)
Component: PS3Assignee: Geoff Levand (geoffrey.levand)
Status: CLOSED PATCH_ALREADY_AVAILABLE    
Severity: high    
Priority: P2    
Hardware: i386   
OS: Linux   
Kernel Version: 2.6.21-rc4 Subsystem:
Regression: --- Bisected commit-id:
Attachments: a script to reproduce the problem
Result with 2.6.16 kernel
result with 2.6.21-rc4

Description Hiroyuki Machida 2007-03-25 17:13:44 UTC
Most recent kernel where this bug did *NOT* occur: 2.6.16 for PS3

Distribution:
System slow down with new mailined version of PS3 kernel.
Old 2.6.16 version of kernel hich is relased with 
"Linux Distributor Starter's Kit v1.1" don't have this problem.

Hardware Environment:

I used Japanese PS3 60GB mode to reproduce the problem. 

Software Environment:

linux/kernel/git/geoff/ps3-linux.git
tree	d2bada44de47b2a7bf7a7d2a11ba28b82589849a


Problem Description:

Steps to reproduce:

1. Run the following script

#! /bin/sh -x

for sz in 512; do 
	sync;sync;
	dd if=/dev/zero of=/var/tmp/${sz}mb bs=1M count=$sz
	sync;sync;

	for i in  1 2 3 4 5; do
		dd of=/dev/null if=/var/tmp/${sz}mb bs=4K
		sync;sync;
		time md5sum /var/tmp/${sz}mb
		sync;sync;
	done
done

2 compare results with 2.6.16 kernel and 2.6.21-rc4 kernel

With 2.6.16, I got 

+ dd of=/dev/null if=/var/tmp/512mb bs=4K
131072+0 records in
131072+0 records out
536870912 bytes (537 MB) copied, 18.464 seconds, 29.1 MB/s
+ sync
+ sync
+ md5sum /var/tmp/512mb
aa559b4e3523a6c931f08f4df52d58f2  /var/tmp/512mb

real    0m18.476s
user    0m2.912s
sys     0m1.188s


However with 2.6.21-rc4, it shows just half of dd speed.

+ dd of=/dev/null if=/var/tmp/512mb bs=4K
131072+0 records in
131072+0 records out
536870912 bytes (537 MB) copied, 34.2244 seconds, 15.7 MB/s
+ sync
+ sync
+ md5sum /var/tmp/512mb
aa559b4e3523a6c931f08f4df52d58f2  /var/tmp/512mb

real    0m23.794s
user    0m4.384s
sys     0m1.727s
Comment 1 Hiroyuki Machida 2007-03-25 17:14:59 UTC
Created attachment 10939 [details]
a script to reproduce the problem
Comment 2 Hiroyuki Machida 2007-03-25 17:15:36 UTC
Created attachment 10940 [details]
Result with 2.6.16 kernel
Comment 3 Hiroyuki Machida 2007-03-25 17:16:04 UTC
Created attachment 10941 [details]
result with 2.6.21-rc4
Comment 4 Geert Uytterhoeven 2007-03-28 08:10:42 UTC
  - We started to see the problem in 2.6.19 after enabling the 2nd PPE thread.
  - Before we used only one PPE thread on 2.6.19, and we didn't notice any
    interactive sluggyness (the storage driver wasn't ported to 2.6.19 before
    that, so no data about that).
  - If interrupts are routed to the 2nd PPE thread, the system is much slower.
  - 2.6.19 distributed the interrupts randomly over the 2 PPEs. Depending on
    which interrupt(s) got routed to the 2nd PPE thread, the system was slower
    or faster. E.g. If the gelic (Ethernet) interrupt is routed to the 2nd PPE
    and you use NFS root, it's very slow.
  - The current kernel routes all interrupts to the 1st PPE thread (cfr.
    `if (cpu == PS3_BINDING_CPU_ANY) cpu = 0;'), but it's still slower then
    2.6.16.
  - You can easily test the additional slowdown by changing the default to
    `cpu = 1;'.
  - 2.6.16 distributed the interrupts randomly over the 2 PPEs, too. So just
    routing interrupts to the 2nd PPE thread is not the sole cause of the
    problem.
Comment 5 Hiroyuki Machida 2007-04-22 22:51:59 UTC
This bug is come from hypervisor side behavior.
Geoff already pushed the fixes to kernel.org:
  commit ps3-linux-patches-709611b712bd6f036ed363fe636a25220a2942e1
  commit ps3-linux-dc41a7414189b43747100a50c5155d82846227bb

The relevant fixes are listed below.
  ps3-wip/ps3-fix-slowdown-bug.diff
  ps3-wip/ps3snd-kill-iopte-failure.diff
  ps3-wip/ps3snd-fix-noise.diff
  set CONFIG_DEBUG_SLAB=n