Bug 15341

Summary: Load goes up very high when we traverse the large tree
Product: IO/Storage Reporter: Frederic MASSOT (frederic)
Component: SCSIAssignee: linux-scsi (linux-scsi)
Status: CLOSED CODE_FIX    
Severity: normal CC: akpm, axboe, florian, rjw
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 2.6.32 Subsystem:
Regression: Yes Bisected commit-id:
Bug Depends on:    
Bug Blocks: 14230    
Attachments: dmesg of kernel 2.6.32
Config file of kernel 2.6.32
dmesg after "echo w > /proc/sysrq-trigger"
config file for kernel 2.6.33 with CFQ
dmesg for kernel 2.6.33 with CFQ

Description Frederic MASSOT 2010-02-17 11:43:25 UTC
Created attachment 25088 [details]
dmesg of kernel 2.6.32

Hi,

When I do a "find" or a "du" on a large tree the load goes up very high (over 160) to hang the system. A ctrl-C can kill the command and load back down quietly. I not had this problem with version 2.6.31.

There is no error message in the logs.

Regards.
Comment 1 Frederic MASSOT 2010-02-17 11:44:11 UTC
Created attachment 25089 [details]
Config file of kernel 2.6.32
Comment 2 Andrew Morton 2010-02-18 03:18:44 UTC
Please do a `
Comment 3 Andrew Morton 2010-02-18 03:19:23 UTC
Please do a `ps aux' and look for lots of tasks stuck in "D" state.  Tell us what they are.
Comment 4 Frederic MASSOT 2010-02-19 13:22:55 UTC
Hi,

There are tons of process in "D" state, over there and load up.

Below is the list of processes when the load rises during an rsync backup

Regards.

$ ps auxf |grep D
USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
root       606  0.0  0.0      0     0 ?        D    Feb15   0:13  \_ [kjournald]
root      1306  0.0  0.0      0     0 ?        D    Feb15   0:02  \_ [flush-8:0]
root      1706  0.0  0.0   3164   980 ?        Ds   Feb15   0:11 /sbin/syslogd
backuppc  1635  0.0  0.0   1824   412 ?        Ss   05:38   0:00  |       \_ dash -c nice -n 19 /usr/bin/sudo /usr/bin/rsync --server --sender --numeric-ids --perms --owner --group -D --links --hard-links --times --block-size=2048 --recursive --ignore-times . /
root      1636  3.3  2.0  71600 64240 ?        D    05:38  10:18  |           \_ /usr/bin/rsync --server --sender --numeric-ids --perms --owner --group -D --links --hard-links --times --block-size=2048 --recursive --ignore-times . /
fredo    14179  0.0  0.0   3604   648 pts/0    D+   10:48   0:00              \_ grep D
root     14168  3.3  0.0      0     0 ?        D    10:48   0:00          \_ [php]
root     14171  0.0  0.0      0     0 ?        D    10:48   0:00  \_ [watchdog]
www-data 13817  0.0  0.6  80096 20424 ?        D    10:40   0:00  \_ /usr/sbin/apache2 -k start
www-data 13821  0.2  1.1  91192 36468 ?        D    10:40   0:01  \_ /usr/sbin/apache2 -k start
www-data 13829  0.2  1.3  86168 42340 ?        D    10:40   0:01  \_ /usr/sbin/apache2 -k start
www-data 13842  0.1  0.8  82024 27228 ?        D    10:40   0:00  \_ /usr/sbin/apache2 -k start
www-data 13852  0.0  0.6  80140 19396 ?        D    10:40   0:00  \_ /usr/sbin/apache2 -k start
www-data 13903  0.6  1.3  89280 41788 ?        D    10:42   0:02  \_ /usr/sbin/apache2 -k start
www-data 13920  0.5  1.4  92652 45708 ?        D    10:43   0:01  \_ /usr/sbin/apache2 -k start
www-data 13939  0.3  1.2  90192 39556 ?        D    10:43   0:01  \_ /usr/sbin/apache2 -k start
www-data 13944  0.1  0.7  80024 23992 ?        D    10:43   0:00  \_ /usr/sbin/apache2 -k start
www-data 13948  0.1  0.7  83484 22368 ?        D    10:43   0:00  \_ /usr/sbin/apache2 -k start
www-data 13965  0.0  0.6  81632 18832 ?        D    10:44   0:00  \_ /usr/sbin/apache2 -k start
www-data 13977  0.0  0.5  79876 18608 ?        D    10:44   0:00  \_ /usr/sbin/apache2 -k start
www-data 13983  0.0  0.5  79984 18132 ?        D    10:44   0:00  \_ /usr/sbin/apache2 -k start
www-data 13997  0.2  0.6  81928 20724 ?        D    10:44   0:00  \_ /usr/sbin/apache2 -k start
www-data 13999  0.1  0.5  81708 18588 ?        D    10:44   0:00  \_ /usr/sbin/apache2 -k start
www-data 14025  0.0  0.5  79928 17952 ?        D    10:45   0:00  \_ /usr/sbin/apache2 -k start
www-data 14028  0.1  0.7  83372 22508 ?        D    10:45   0:00  \_ /usr/sbin/apache2 -k start
www-data 14035  0.1  0.6  81596 20956 ?        D    10:45   0:00  \_ /usr/sbin/apache2 -k start
www-data 14038  0.0  0.5  81644 17464 ?        D    10:45   0:00  \_ /usr/sbin/apache2 -k start
www-data 14040  0.0  0.6  80016 19324 ?        D    10:45   0:00  \_ /usr/sbin/apache2 -k start
www-data 14060  0.0  0.6  82048 19060 ?        D    10:45   0:00  \_ /usr/sbin/apache2 -k start
www-data 14063  1.3  1.5  93604 46796 ?        D    10:45   0:02  \_ /usr/sbin/apache2 -k start
www-data 14072  0.0  0.5  79664 16804 ?        D    10:45   0:00  \_ /usr/sbin/apache2 -k start
www-data 14085  0.0  0.5  79388 16976 ?        D    10:46   0:00  \_ /usr/sbin/apache2 -k start
www-data 14093  0.1  0.5  79872 18176 ?        D    10:46   0:00  \_ /usr/sbin/apache2 -k start
www-data 14095  0.0  0.5  79636 15648 ?        D    10:46   0:00  \_ /usr/sbin/apache2 -k start
www-data 14098  0.0  0.5  79768 18396 ?        D    10:46   0:00  \_ /usr/sbin/apache2 -k start
www-data 14105  0.0  0.5  79404 18332 ?        D    10:46   0:00  \_ /usr/sbin/apache2 -k start
www-data 14106  0.1  0.5  81876 18008 ?        D    10:46   0:00  \_ /usr/sbin/apache2 -k start
www-data 14108  0.0  0.5  81520 16980 ?        D    10:46   0:00  \_ /usr/sbin/apache2 -k start
www-data 14109  0.1  0.5  81256 17484 ?        D    10:46   0:00  \_ /usr/sbin/apache2 -k start
www-data 14110  0.0  0.5  79848 17104 ?        D    10:46   0:00  \_ /usr/sbin/apache2 -k start
www-data 14112  0.0  0.5  81688 17316 ?        D    10:46   0:00  \_ /usr/sbin/apache2 -k start
www-data 14127  0.0  0.5  79520 15980 ?        D    10:47   0:00  \_ /usr/sbin/apache2 -k start
www-data 14128  0.0  0.4  79144 13084 ?        D    10:47   0:00  \_ /usr/sbin/apache2 -k start
www-data 14132  0.0  0.4  79484 13220 ?        D    10:47   0:00  \_ /usr/sbin/apache2 -k start
www-data 14135  0.0  0.4  79504 13292 ?        D    10:47   0:00  \_ /usr/sbin/apache2 -k start
www-data 14136  0.0  0.4  79144 12748 ?        D    10:47   0:00  \_ /usr/sbin/apache2 -k start
www-data 14137  0.0  0.4  79144 12748 ?        D    10:47   0:00  \_ /usr/sbin/apache2 -k start
www-data 14138  0.0  0.4  79144 12748 ?        D    10:47   0:00  \_ /usr/sbin/apache2 -k start
www-data 14139  0.0  0.4  79504 13292 ?        D    10:47   0:00  \_ /usr/sbin/apache2 -k start
www-data 14140  0.0  0.4  79504 13296 ?        D    10:47   0:00  \_ /usr/sbin/apache2 -k start
www-data 14142  0.0  0.4  79504 13292 ?        D    10:47   0:00  \_ /usr/sbin/apache2 -k start
www-data 14149  0.0  0.4  79172 12944 ?        D    10:47   0:00  \_ /usr/sbin/apache2 -k start
www-data 14151  0.0  0.5  79840 15740 ?        D    10:47   0:00  \_ /usr/sbin/apache2 -k start
www-data 14152  5.5  1.2  89856 39336 ?        D    10:47   0:01  \_ /usr/sbin/apache2 -k start
www-data 14154  0.0  0.4  79172 12944 ?        D    10:47   0:00  \_ /usr/sbin/apache2 -k start
www-data 14155  0.0  0.4  79180 12944 ?        D    10:47   0:00  \_ /usr/sbin/apache2 -k start
www-data 14156  0.0  0.5  79840 15728 ?        D    10:47   0:00  \_ /usr/sbin/apache2 -k start
www-data 14157  0.0  0.4  79648 13336 ?        D    10:47   0:00  \_ /usr/sbin/apache2 -k start
www-data 14159  0.0  0.3  79344 12132 ?        D    10:47   0:00  \_ /usr/sbin/apache2 -k start
www-data 14160  0.4  0.5  81264 17420 ?        D    10:47   0:00  \_ /usr/sbin/apache2 -k start
www-data 14163  0.1  0.5  79312 16396 ?        D    10:47   0:00  \_ /usr/sbin/apache2 -k start
www-data 14164  0.0  0.4  79144 13016 ?        D    10:47   0:00  \_ /usr/sbin/apache2 -k start
www-data 14170  0.0  0.4  79504 13268 ?        D    10:48   0:00  \_ /usr/sbin/apache2 -k start
www-data 14173  0.0  0.3  79120 12048 ?        D    10:48   0:00  \_ /usr/sbin/apache2 -k start
www-data 14177  0.0  0.4  79504 13268 ?        D    10:48   0:00  \_ /usr/sbin/apache2 -k start
Comment 5 Andrew Morton 2010-02-19 17:01:21 UTC
Please do an `echo w > /proc/sysrq-trigger' then run `dmesg' so we can see where these D-state tasks are stuck.

Please use bugzilla attachments, as copy-n-paste wordwraps the text which makes a real mess these dumps.

Thanks.
Comment 6 Frederic MASSOT 2010-02-19 17:32:52 UTC
Hi,

In attachments the dmesg output.

Sorry for the dirty copy-n-paste.

Regards.
Comment 7 Frederic MASSOT 2010-02-19 17:35:06 UTC
Created attachment 25119 [details]
dmesg after "echo w > /proc/sysrq-trigger"
Comment 8 Andrew Morton 2010-02-19 17:54:48 UTC
OK, thanks.  Looks like pretty widespread disk starvation caused by the single seeky reader process.  It's unlikely to be scsi-specific, more likely due to iosched changes.

I set the regression flag (2.6.31 -> 2.6.32) and cc'ed Jens.
Comment 9 Jens Axboe 2010-02-23 12:38:12 UTC
That looks pretty funky, most processes are stuck waiting for IO or waiting for others to finish their IO (on inode->i_mutex). If you switch to another IO scheduler, does the problem go away? To do so, do:

# echo deadline > /sys/block/sda/queue/scheduler

(or use sdb, if that is the device in question) and repeat your find/du run.
Comment 10 Frederic MASSOT 2010-02-23 18:37:47 UTC
Hi,

With deadline IO scheduler, I was able to run "find" on a large tree and even compile a kernel 2.6.32, the load has not exceeded 3.  :o)

Regards.
Comment 11 Jens Axboe 2010-02-23 22:24:32 UTC
Would it be possible for you to test 2.6.33-rc8?

And in the interest of fixing 2.6.32 as well, are you familiar with using git and git bisect in particular? It would be a very restricted set of commits, since we are dealing with a single file only.
Comment 12 Frederic MASSOT 2010-02-25 12:04:29 UTC
Hi,

I test the new kernel 2.6.33, I was able to run "find" on a large tree, the load has not exceeded 3.  :o)

Regards.
Comment 13 Frederic MASSOT 2010-02-25 12:06:09 UTC
Created attachment 25212 [details]
config file for kernel 2.6.33 with CFQ
Comment 14 Frederic MASSOT 2010-02-25 12:06:48 UTC
Created attachment 25213 [details]
dmesg for kernel 2.6.33 with CFQ
Comment 15 Jens Axboe 2010-02-25 13:34:23 UTC
OK, that's good at least. We need to do something about 2.6.32, though. If you are willing to try and bisect the problem, it would be much appreciated. Checking revisions between 2.6.31 and 2.6.32 for CFQ, there are only 12 of them. So you would not need to compile and check more than 4 kernels.

The way you would do this, is git clone Linus -git tree:

$ git clone git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6.git
$ git checkout v2.6.32

Start the bisect testing:

$ git bisect start block/cfq-iosched.c
$ git bisect good v2.6.31
$ git bisect bad v2.6.32

Now copy in your .config and run make oldconfig and compile the kernel+modules like you always do. Boot and test the kernel. Lets assume this first test is good, you'd then cd back into your source directory and do:

$ git bisect good

which would hand you a new revision to build, boot, and test. Likewise, if the resulting kernel image didn't work well, you would run

$ git bisect bad

instead. You repeat this until git tells you what the bad revision was. And then you post that here :-)
Comment 16 Frederic MASSOT 2010-03-05 18:48:08 UTC
Hi,

The first test is very bad. Whaoo !

I waited several tens of seconds after typing the password for the shell, a top allowed me to see that the charge had risen to over 90. The HDD led was only blinking. The server took several minutes to reboot, finally, I pressed the reset button.

The kernel version is : 2.6.32-rc2-00036-g8e29675

Regards.
Comment 17 Frederic MASSOT 2010-04-09 19:30:59 UTC
Hi,

The kernel 2.6.33 seemed to be better than the 2.6.32 until I reboot with a fsck.

The verification of 80GB took over 2 hours, unbelievable. With kernel 2.6.31 and earlier it did not take more than 10 or 15 minutes.

Regards.
Comment 18 Florian Mickler 2010-10-02 20:59:44 UTC
Were you able to finish the bisection?
Comment 19 Florian Mickler 2011-02-02 20:55:11 UTC
I'm closing this now as it is fixed in the mainline kernel. Please post a note, if you investigated this further...