Latest working kernel version: - Earliest failing kernel version: - Distribution: Debian testing Hardware Environment: Lenovo Thinkpad T61, 2.5Ghz Core2 Duo T9300, intel chipset, SATA disk, 2 GB RAM, NVidia video Software Environment: Gnome / console Problem Description: All tasks accessing dm-crypt'ed disk space become unresponsive for long periods of time when one i/o intensive (linear access) task is running on dm-crypt. Steps to reproduce: (obviously replace sda9 with a partition where you don't have any valuable data) # cryptsetup create sda9_crypt /dev/sda9 # time nice nice cat /dev/zero >/dev/mapper/sda9_crypt or: # cryptsetup remove sda9_crypt # if necessary # cryptsetup luksFormat /dev/sda9 # cryptsetup luksOpen /dev/sda9 sda9_crypt # time nice nice cat /dev/zero >/dev/mapper/sda9_crypt then after waiting some 10 seconds or so (until most binaries are dropped from the disk caches) try to start any program. Or e.g. "killall -STOP cat" will take 3-6 minutes. More complete/wordy description follows: I did install the system with the root fs (reiserfs) and swap on LVM on dm-crypt, by using the Debian installer's ability to do so. Quite soon I discovered that when I ran a compilation with make -j3 of a software which requires hundreds of MB of RAM per gcc instance, and thus touching swap during the compilation, that xorg (then using the open "nv" driver) almost froze. The vesa driver did show far better behaviour, so I did report that as a bug against the nv driver (but soon found out that the closed-source nvidia driver behaved the same) here: http://bugs.freedesktop.org/show_bug.cgi?id=15716 But the longer I'm using this machine, I'm suspecting something bad is going on in the I/O layer really (and probably is dm-crypt related) and it's not really the fault of xorg at all. One thing I did notice some time ago is that ionice -c 3 doesn't help at all reducing the impact of a "cat /dev/sdaX" run on the responsiveness of the machine (also experimenting with the different io schedulers didn't seem to help). Also I've felt the need to set up resource limits with ulimit -v to prevent casual runaway processes (I'm a user-space developer) from swapping and taking me minutes each time to get back control. Currently I'm running a pristine 2.6.22.19 from kernel.org (in 64bit mode; I haven't tried 32bit kernels so far). Today I've noticed that when running the above tests (writing zeroes to sda9_crypt), cat is running merrily along, as I can see from the "System Monitor" gnome panel applet it is using about half the cpu power of one core (shown in bright blue), and displaying the rest as wait time (dark blue), which is I think expected (some read benchmarks with cat from the root partition device also showed about 50% usage of one core with about 40MByte/sec throughput, which is actually the native disk throughput). *But* if I try to open for example a new gnome-terminal, or even just want to run "killall -STOP cat" (even at the console (ctl-alt-f1 to an existant root login)), this takes ages, more precisely about 3-6 minutes. If I just hit ctl-z from the gnome-terminal where I started the above cat instance, it more or less instantly stops (which is rather expected as the shell shouldn't have to access the disk for that) and all the other pending actions are then being run immediately. So my impression is that any 'fairness' in io scheduling seems to be completely broken when using dm-crypt. I suspect that there might be a problem with multiple I/O jobs going on at once *all using dm-crypt*, kind of like dm-crypt had it's own purely fifo order scheduler (with a huge backlog) or something. This is consistent with people having their root partition not on dmcrypt telling me that they don't see the problem when trying the above cat tests. I've tried renicing the kcryptd processes to priorities 0, 10 and 19 (default is -5), but only priority 10 did seem to make it any better if at all. Also switching off the second core didn't help in this case. (I'm now considering moving everything off dm-crypt to get decent system behaviour.) Thanks, Christian. Some further data: - someone asked whether I have DMA enabled, and whether my sata disk is in AHCI or compat mode. I don't know how to enable DMA on sata disks but thought that there's no need to do this manually; I've looked at the kernel logs, which say: May 10 04:50:09 novo kernel: scsi0 : ahci May 10 04:50:09 novo kernel: ata1: SATA max UDMA/133 cmd 0xffffc20000068100 ctl 0x0000000000000000 bmdma 0x0000000000000000 irq 313 and May 10 04:50:09 novo kernel: ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 300) May 10 04:50:09 novo kernel: ata1.00: ATA-8: FUJITSU MHY2250BH, 0084000D, max UDMA/100 May 10 04:50:09 novo kernel: ata1.00: 488397168 sectors, multi 16: LBA48 NCQ (depth 31/32) May 10 04:50:09 novo kernel: ata1.00: configured for UDMA/100 May 10 04:50:09 novo kernel: scsi 0:0:0:0: Direct-Access ATA FUJITSU MHY2250B 0084 PQ: 0 ANSI: 5 - also I've been asked for vmstat data: $ vmstat 1 1 0 643308 744080 5756 95944 0 0 0 0 404 227 0 1 99 0 1 4 643308 350156 202312 95968 0 0 64 118212 797 86879 0 44 39 17 0 3 643308 256780 229636 95936 0 0 0 105692 1098 316 0 42 0 58 1 3 643308 205284 266660 95964 0 0 0 36976 891 259 0 35 0 65 0 4 643308 167772 301256 95964 0 0 0 34596 948 318 0 34 0 66 the second row is after I started the cat then 0 3 643308 134736 346312 96008 0 0 0 12288 1018 359 0 31 0 69 1 2 643308 148028 346312 96008 0 0 0 0 928 251 0 30 0 70 procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu---- r b swpd free buff cache si so bi bo in cs us sy id wa 2 2 643308 182980 346312 96008 0 0 0 0 1049 301 1 32 0 68 0 3 643308 200612 346312 96008 0 0 0 0 948 272 1 32 0 67 0 0 643308 330444 346312 96008 0 0 0 64 895 301 1 16 40 44 0 0 643308 330476 346312 96008 0 0 0 0 398 195 1 1 97 0 the second row there is after I stopped it again. this is without triggering the load of another program (which would again make me have wait minutes to get back control) - I've run # smartctl -t short /dev/sda # smartctl -a /dev/sda 2>&1|less .. Device Model: FUJITSU MHY2250BH .. Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error # 1 Short offline Completed without error 00% 725 -
Could this be the same problem as the one in http://bugzilla.kernel.org/show_bug.cgi?id=10378 and/or the one being discussed in the thread starting in http://lkml.org/lkml/2008/2/28/150 ? I can't test right now, but will asap if that makes sense. Christian.
This might be a problem [in combination] with CONFIG_USER_SCHED. I'm running a new kernel with this change now: @@ -1,7 +1,7 @@ # # Automatically generated make config: don't edit # Linux kernel version: 2.6.27.7 -# Wed Dec 3 19:24:49 2008 +# Wed Dec 3 19:29:40 2008 # CONFIG_64BIT=y # CONFIG_X86_32 is not set @@ -81,10 +81,8 @@ CONFIG_IKCONFIG_PROC=y CONFIG_LOG_BUF_SHIFT=16 # CONFIG_CGROUPS is not set CONFIG_HAVE_UNSTABLE_SCHED_CLOCK=y -CONFIG_GROUP_SCHED=y -CONFIG_FAIR_GROUP_SCHED=y -# CONFIG_RT_GROUP_SCHED is not set -CONFIG_USER_SCHED=y +# CONFIG_GROUP_SCHED is not set +# CONFIG_USER_SCHED is not set # CONFIG_CGROUP_SCHED is not set CONFIG_SYSFS_DEPRECATED=y CONFIG_SYSFS_DEPRECATED_V2=y and things seem to be much better; I haven't run the above tests again yet, though, since I've used up all disk partitions atm and it's late at night.
Although the kernel I was running then didn't seem to have CONFIG_USER_SCHED enabled: see https://bugs.freedesktop.org/show_bug.cgi?id=15716 Could it be that config-2.6.22.19 was using USER_SCHED without configuring it? Or something else changed? Or it's multiple problems effecting the whole thing.
bugme-daemon@bugzilla.kernel.org wrote: > http://bugzilla.kernel.org/show_bug.cgi?id=10781 > ------- Comment #3 from christian-bko@jaeger.mine.nu 2008-12-03 19:48 > ------- > Although the kernel I was running then didn't seem to have CONFIG_USER_SCHED > enabled: see https://bugs.freedesktop.org/show_bug.cgi?id=15716 > Could it be that config-2.6.22.19 was using USER_SCHED without configuring > it? > Or something else changed? Or it's multiple problems effecting the whole > thing. Too many things changed in kernel since 2.6.22, please use more recent kernel if possible. If you want to play with it, add with this one liner patch in the beginning: http://www2.kernel.org/pub/linux/kernel/people/agk/patches/2.6/2.6.25/dm-crypt-add-cond_resched.patch You will probably need to modify it for < 2.6.25 to apply correctly: Index: home/data/linux-2.6.24.y/drivers/md/dm-crypt.c =================================================================== --- home.orig/data/linux-2.6.24.y/drivers/md/dm-crypt.c +++ home/data/linux-2.6.24.y/drivers/md/dm-crypt.c @@ -374,6 +374,7 @@ static int crypt_convert(struct crypt_co break; ctx->sector++; + cond_resched(); } return r;
> please use more recent kernel if possible. As you can see from the diff file I'm using "Linux kernel version: 2.6.27.7" now. As shown on the above-mentioned URL https://bugs.freedesktop.org/show_bug.cgi?id=15716 I've been upgrading to a newer kernel long ago already. I've been using various kernels since: chris@novo:/boot$ l config* -rw-r--r-- 1 root root 63490 2008-04-27 22:10 config-2.6.22.19 -rw-r--r-- 1 root root 72175 2008-06-11 06:50 config-2.6.25.6 -rw-r--r-- 1 root root 73063 2008-06-22 21:44 config-2.6.25.8 -rw-r--r-- 1 root root 73064 2008-07-03 22:17 config-2.6.25.10 -rw-r--r-- 1 root root 75183 2008-07-19 20:32 config-2.6.26 -rw-r--r-- 1 root root 75352 2008-09-11 23:03 config-2.6.26.3 -rw-r--r-- 1 root root 75352 2008-09-12 00:23 config-2.6.26.5 -rw-r--r-- 1 root root 77281 2008-10-16 16:47 config-2.6.27.1 -rw-r--r-- 1 root root 75352 2008-10-30 13:33 config-2.6.26.7.old -rw-r--r-- 1 root root 75352 2008-10-30 14:04 config-2.6.26.6 -rw-r--r-- 1 root root 75352 2008-11-08 11:49 config-2.6.26.7 -rw-r--r-- 1 root root 77293 2008-11-08 12:55 config-2.6.27.5.old -rw-r--r-- 1 root root 77290 2008-11-08 13:12 config.old -rw-r--r-- 1 root root 77290 2008-11-08 13:12 config-2.6.27.5 -rw-r--r-- 1 root root 77216 2008-12-03 20:39 config-2.6.27.7 -rw-r--r-- 1 root root 77216 2008-12-03 20:39 config I've never seen much improvement by going to a newer kernel, only a little maybe. I did move my swap from the logical volume on dm-crypt to a logical volume on a plain text backed volume group, and have moved most of my root filesystem to an unencrypted logical volume too, which both/together seem to have mitigated the issue a little bit. But the most articulate improvement aside from switching off one core seems to be switching off CONFIG_USER_SCHED in the newest kernel. Though again, forgive me that I can't run the above test case right now. I'll do and if the problem persists also try your patch--thanks.
BTW one thing I also tested recently was whether I could get a faster disk or swap by using 3 USB sticks in a raid-0 setup (with 16k chunks). This raid device is generally quite a bit faster than my internal (crappy laptop-)disk, linear reading is about twice as fast (~60MB/sec, my laptop has 3 USB ports but two of them are on the same USB bus as it turns out), linear writing is about the same (~30MB/sec), random reading (find -type f on reiserfs with cold cache) is about 4-5 times faster. Using dm-crypt on top of that raid device didn't slow those speeds down, and actually I noticed that writing and reading from that device didn't slow my desktop down at all! I was already starting to suspect that my internal laptop disk is somehow just broken; so I made a clear text raid-0 setup again, created reiserfs on it, then created a non-sparse 4G file on that, then losetup'ed that file to a loop device which I cryptsetup luksFormat'ed and luksOpen'ed, and mkswap and swapon (and swapoff my old swap). But that swap was very bad, it brought my desktop right back to >20 seconds reaction times. Of course I don't know which point in the chain is the exact culprit. And this was with 2.6.27.5 with CONFIG_USER_SCHED=y, I may try again now with the 2.6.27.7 kernel without USER_SCHED.