Bug 7388 - Oops in kcryptd
Summary: Oops in kcryptd
Status: CLOSED CODE_FIX
Alias: None
Product: IO/Storage
Classification: Unclassified
Component: LVM2/DM (show other bugs)
Hardware: i386 Linux
: P2 blocking
Assignee: Alasdair G Kergon
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2006-10-19 17:07 UTC by Bjoern B. Brandenburg
Modified: 2007-12-04 23:05 UTC (History)
5 users (show)

See Also:
Kernel Version: 2.6.18.1
Subsystem:
Regression: ---
Bisected commit-id:


Attachments
Kernel config for my case (39.66 KB, text/plain)
2007-06-29 02:30 UTC, Tommi
Details

Description Bjoern B. Brandenburg 2006-10-19 17:07:29 UTC
Most recent kernel where this bug did not occur: Don't know, never seen before.
So I'd have to say: 2.6.18
Distribution: Arch Linux
Hardware Environment: Toshiba S2410
Software Environment: 
The command 

  cryptsetup luksOpen /dev/loop0 otp

triggered the Oops. The process is now defunct and unkillable. I've been using
LUKS for about two months and have never witnessed this. System was under
moderate load at the same time (KDE starting up, after a cold boot, thus lots of
hdd activity). Underlying file system XFS.

Problem Description:

Process cryptsetup got stuck in kernel space, cannot be killed. Kernel dumps
Oops to console:

Oct 20 01:53:35 DS-12 device-mapper: ioctl: 4.7.0-ioctl (2006-06-24)
initialised: dm-devel@redhat.com
Oct 20 01:53:40 DS-12 BUG: unable to handle kernel paging request at virtual
address 90eddaf9
Oct 20 01:53:40 DS-12 printing eip:
Oct 20 01:53:40 DS-12 c015022e
Oct 20 01:53:40 DS-12 *pde = 00000000
Oct 20 01:53:40 DS-12 Oops: 0002 [#1]
Oct 20 01:53:40 DS-12 PREEMPT SMP 
Oct 20 01:53:40 DS-12 Modules linked in: sha256 aes dm_crypt dm_mod rfcomm hidp
l2cap bluetooth xfs ext2 prism54 ppdev pcmcia eepro100 lp snd_seq_oss
snd_seq_midi_event snd_seq snd_seq_device e100 mii yenta_socket rsrc_nonstatic
pcmcia_core snd_pcm_oss snd_mixer_oss tsdev joydev ohci1394 ieee1394 smsc_ircc2
irda crc_ccitt parport_pc parport snd_intel8x0 snd_intel8x0m snd_ac97_codec
snd_ac97_bus snd_pcm snd_timer rtc psmouse serio_raw snd soundcore intel_agp
uhci_hcd snd_page_alloc agpgart shpchp pci_hotplug pcspkr evdev usbcore ext3 jbd
mbcache ide_cd cdrom ide_disk generic piix ide_core
Oct 20 01:53:40 DS-12 CPU:    0
Oct 20 01:53:40 DS-12 EIP:    0060:[<c015022e>]    Not tainted VLI
Oct 20 01:53:40 DS-12 EFLAGS: 00010093   (2.6.18-ARCH #1) 
Oct 20 01:53:40 DS-12 EIP is at add_element+0xe/0x30
Oct 20 01:53:40 DS-12 eax: 00302d6d   ebx: da2cb80e   ecx: 642f6b63   edx: d9520718
Oct 20 01:53:40 DS-12 esi: 00000202   edi: d9520718   ebp: dc1f0240   esp: d9f0def8
Oct 20 01:53:40 DS-12 ds: 007b   es: 007b   ss: 0068
Oct 20 01:53:40 DS-12 Process kcryptd/0 (pid: 3000, ti=d9f0c000 task=dff60570
task.ti=d9f0c000)
Oct 20 01:53:40 DS-12 Stack: da2cb80e c015038b dc1f0240 e0ff3080 d9520718
d9f0df24 e108c748 c176c600 
Oct 20 01:53:40 DS-12 0000007c 00000000 00000000 c176c600 c176c600 00000000
00000000 00000001 
Oct 20 01:53:40 DS-12 00000001 0000007d 00000000 00000000 d9520724 d9520728
dd7604c0 d9520718 
Oct 20 01:53:40 DS-12 Call Trace:
Oct 20 01:53:40 DS-12 [<c015038b>] mempool_free+0x5b/0x90
Oct 20 01:53:40 DS-12 [<e108c748>] kcryptd_do_work+0x58/0x70 [dm_crypt]
Oct 20 01:53:40 DS-12 [<c0135ffb>] run_workqueue+0x7b/0xf0
Oct 20 01:53:40 DS-12 [<e108c6f0>] kcryptd_do_work+0x0/0x70 [dm_crypt]
Oct 20 01:53:40 DS-12 [<c0136be7>] worker_thread+0x117/0x140
Oct 20 01:53:40 DS-12 [<c011de00>] default_wake_function+0x0/0x10
Oct 20 01:53:40 DS-12 [<c0136ad0>] worker_thread+0x0/0x140
Oct 20 01:53:40 DS-12 [<c0139937>] kthread+0xf7/0x100
Oct 20 01:53:40 DS-12 [<c0139840>] kthread+0x0/0x100
Oct 20 01:53:40 DS-12 [<c0101005>] kernel_thread_helper+0x5/0x10
Oct 20 01:53:40 DS-12 Code: 40 0c 85 c0 7e 0b 8b 51 10 48 89 41 0c 8b 04 82 c3
0f 0b 1a 00 9c f7 34 c0 eb eb 89 f6 53 89 c3 8b 48 0c 3b 48 08 7d 0e 8b 43 10
<89> 14 88 8d 41 01 89 43 0c 5b c3 0f 0b 14 00 9c f7 34 c0 eb e8 
Oct 20 01:53:40 DS-12 EIP: [<c015022e>] add_element+0xe/0x30 SS:ESP 0068:d9f0def8
Oct 20 01:53:40 DS-12 <6>note: kcryptd/0[3000] exited with preempt_count 1


Steps to reproduce:

Don't know, never witnessed before.
Comment 1 Bjoern B. Brandenburg 2007-05-09 07:54:46 UTC
The crash has been occurring about one out of ten mounts with every kernel
version I have run since then. My current kernel version is 2.6.21.1 and the
kernel got stuck again yesterday. This is really quite annoying... I am quite
willing to test patches.

May  8 08:13:19 DS-12 device-mapper: ioctl: 4.11.0-ioctl (2006-10-12)
initialised: dm-devel@redhat.
com
May  8 08:13:23 DS-12 BUG: unable to handle kernel paging request at virtual
address e112d1a0
May  8 08:13:23 DS-12 printing eip:
May  8 08:13:23 DS-12 c0158d48
May  8 08:13:23 DS-12 *pde = 1b25e067
May  8 08:13:23 DS-12 *pte = 00000000
May  8 08:13:23 DS-12 Oops: 0000 [#1]
May  8 08:13:23 DS-12 PREEMPT SMP
May  8 08:13:23 DS-12 Modules linked in: sha256 aes dm_crypt dm_mod ipv6 rfcomm
hidp l2cap xfs ext2
 usbhid hid ff_memless hci_usb bluetooth prism54 ppdev joydev pcmcia snd_seq_oss
snd_seq_midi_event
 snd_seq snd_seq_device smsc_ircc2 lp irda crc_ccitt snd_pcm_oss snd_mixer_oss
eepro100 e100 mii rt
c_sysfs parport_pc parport rtc_proc rtc_dev ohci1394 ieee1394 yenta_socket
rtc_cmos rtc_core rtc_li
b psmouse serio_raw rsrc_nonstatic pcmcia_core intel_agp agpgart snd_intel8x0
snd_intel8x0m snd_ac9
7_codec ac97_bus snd_pcm snd_timer snd soundcore snd_page_alloc pcspkr sg
uhci_hcd tsdev shpchp pci
_hotplug evdev thermal processor fan button battery ac usbcore ext3 jbd mbcache
sr_mod cdrom sd_mod
 generic piix ide_core ata_piix libata
May  8 08:13:23 DS-12 CPU:    0
May  8 08:13:23 DS-12 EIP:    0060:[<c0158d48>]    Not tainted VLI
May  8 08:13:23 DS-12 EFLAGS: 00010286   (2.6.21-ARCH #1)
May  8 08:13:23 DS-12 EIP is at mempool_free+0x18/0x90
May  8 08:13:23 DS-12 eax: d31a7d44   ebx: e112d194   ecx: db8fe240   edx: e112d194
May  8 08:13:23 DS-12 esi: c8f31f20   edi: d31a7d44   ebp: 00000202   esp: c8f31ecc
May  8 08:13:23 DS-12 ds: 007b   es: 007b   fs: 00d8  gs: 0000  ss: 0068
May  8 08:13:23 DS-12 Process kcryptd/0 (pid: 5897, ti=c8f30000 task=cefa5550
task.ti=c8f30000)
May  8 08:13:23 DS-12 Stack: 00000202 e109a080 c8f31f20 c8c65480 e107ec7e
c867f200 0000007c 0000000
0
May  8 08:13:23 DS-12 00000000 c867f200 d31a7d44 00000000 00000000 c033f041
c01207d1 00000001
May  8 08:13:23 DS-12 c1405aa0 dfc06c30 00000001 00000003 c8c4c6f0 c867f200
c867f200 00000000
May  8 08:13:23 DS-12 Call Trace:
May  8 08:13:23 DS-12 [<e107ec7e>] kcryptd_do_work+0x31e/0x3d0 [dm_crypt]
May  8 08:13:23 DS-12 [<c033f041>] __sched_text_start+0x321/0x950
May  8 08:13:23 DS-12 [<c01207d1>] __activate_task+0x21/0x40
May  8 08:13:23 DS-12 [<c0138f24>] run_workqueue+0x94/0x140
May  8 08:13:23 DS-12 [<e107e960>] kcryptd_do_work+0x0/0x3d0 [dm_crypt]
May  8 08:13:23 DS-12 [<c0139a57>] worker_thread+0x147/0x170
May  8 08:13:23 DS-12 [<c0123100>] default_wake_function+0x0/0x10
May  8 08:13:23 DS-12 [<c0139910>] worker_thread+0x0/0x170
May  8 08:13:23 DS-12 [<c013c8ca>] kthread+0xba/0xf0
May  8 08:13:23 DS-12 [<c013c810>] kthread+0x0/0xf0
May  8 08:13:23 DS-12 [<c0104e83>] kernel_thread_helper+0x7/0x14
May  8 08:13:23 DS-12 =======================
May  8 08:13:23 DS-12 Code: 74 26 00 89 d1 89 c2 89 c8 e9 85 85 01 00 90 8d 74
26 00 83 ec 10 89 5c
 24 04 89 d3 89 7c 24 0c 89 c7 89 74 24 08 0f ae f0 89 f6 <8b> 42 0c 3b 42 08 7d
1a 89 d0 e8 f9 88
1e 00 89 c6 8b 43 0c 3b
May  8 08:13:23 DS-12 EIP: [<c0158d48>] mempool_free+0x18/0x90 SS:ESP 0068:c8f31ecc
Comment 2 Tommi 2007-06-29 02:29:17 UTC
This same thing happened to me. Can't tell which kernel version didn't have this as I've just recently began using dm-crypt.
Kernel: 2.6.21-gentoo
I did luksOpen on the crypted device's partition and this happened after I tried 'file -s /dev/mapped/test' ('file' process also got stuck, can't kill)
Filesystem is ext2 with journal, yet I don't think it has meaning in this situation.
I'll add my kernel config as attachment, if I manage :)
[   20.840224] device-mapper: ioctl: 4.11.0-ioctl (2006-10-12) initialised: dm-devel@redhat.com
...
[   26.969926] Adding 2008116k swap on /dev/mapper/crypt-swap.  Priority:-1 extents:1 across:2008116k
...
[ 2556.221092] BUG: unable to handle kernel paging request at virtual address f8875196
[ 2556.221096]  printing eip:
[ 2556.221098] c01476c3
[ 2556.221100] *pde = 018e5067
[ 2556.221102] Oops: 0000 [#1]
[ 2556.221104] PREEMPT 
[ 2556.221106] Modules linked in: nvidia(P)
[ 2556.221110] CPU:    0
[ 2556.221111] EIP:    0060:[<c01476c3>]    Tainted: P       VLI
[ 2556.221112] EFLAGS: 00010286   (2.6.21-gentoo #22)
[ 2556.221118] EIP is at mempool_free+0x13/0xc0
[ 2556.221121] eax: f785bfc4   ebx: c1940f28   ecx: c1a7c440   edx: f8875196
[ 2556.221123] esi: f8875196   edi: f785bfc4   ebp: c02b4810   esp: c1940ee4
[ 2556.221126] ds: 007b   es: 007b   fs: 00d8  gs: 0000  ss: 0068
[ 2556.221129] Process kcryptd/0 (pid: 385, ti=c1940000 task=f7c25a30 task.ti=c1940000)
[ 2556.221131] Stack: c19179c0 c1940f28 f175f7c0 c19179c0 c02b4aea f020c680 0000007c 00000000 
[ 2556.221136]        f020c680 f785bfc4 f785bd10 00000006 00000000 00000012 fffe9be1 f7c25a30 
[ 2556.221141]        f7fafac0 f020c680 f020c680 00000000 00000000 00000001 00000001 0000007d 
[ 2556.221146] Call Trace:
[ 2556.221148]  [<c02b4aea>] kcryptd_do_work+0x2da/0x380
[ 2556.221153]  [<c02b4810>] kcryptd_do_work+0x0/0x380
[ 2556.221156]  [<c012f790>] run_workqueue+0xa0/0x180
[ 2556.221160]  [<c012fd20>] worker_thread+0x0/0x170
[ 2556.221163]  [<c012fe67>] worker_thread+0x147/0x170
[ 2556.221166]  [<c011b879>] __wake_up_common+0x39/0x60
[ 2556.221169]  [<c011c7f0>] default_wake_function+0x0/0x10
[ 2556.221172]  [<c012fd20>] worker_thread+0x0/0x170
[ 2556.221175]  [<c0132c88>] kthread+0xa8/0xe0
[ 2556.221178]  [<c0132be0>] kthread+0x0/0xe0
[ 2556.221181]  [<c0104b63>] kernel_thread_helper+0x7/0x14
[ 2556.221184]  =======================
[ 2556.221186] Code: 34 1e 21 00 8d 74 26 00 eb d2 31 db e9 28 ff ff ff 8d b4 26 00 00 00 00 83 ec 10 89 74 24 08 89 7c 24 0c 89 d6 89 5c 24 04 89 c7 <8b> 02
 39 42 04 7d 27 9c 5b fa 89 e0 25 00 f0 ff ff ff 40 14 8b 
[ 2556.221205] EIP: [<c01476c3>] mempool_free+0x13/0xc0 SS:ESP 0068:c1940ee4
Comment 3 Tommi 2007-06-29 02:30:48 UTC
Created attachment 11903 [details]
Kernel config for my case
Comment 4 Daniel Drake 2007-07-01 12:58:57 UTC
2 more reports of this:
http://article.gmane.org/gmane.linux.kernel.device-mapper.dm-crypt/2066
https://bugs.gentoo.org/show_bug.cgi?id=183787

Both tainted, unfortunately. But the descriptions and traces are near identical.
Comment 5 Milan Broz 2007-07-01 13:22:13 UTC
Is this reproducible with 2.6.22-rc ? There are some dm-crypt issues fixed.

If you want patches for 2.6.21, try this
http://www.kernel.org/pub/linux/kernel/people/agk/patches/2.6/2.6.21/
Probably these:
dm-crypt-fix-call-to-clone_init.patch
dm-crypt-fix-avoid-cloned-bio-ref-after-free.patch
Comment 6 Daniel Drake 2007-07-04 09:44:51 UTC
The user on the Gentoo bug is unable to reproduce the problem even on the kernel where the crash was witnessed.

I emailed Jan from the gmane thread asking him to test 2.6.22-rc7 and post results here.
Comment 7 Laurent Parenteau 2007-07-16 13:35:43 UTC
I've seen this behavior with different kernels:
	2.6.21.6, 2.6.22-rc4, 2.6.22, 2.6.22.1

Also, the following patches were applied to the kernels, when they weren't already:
	dm-crypt-disable-barriers.patch
	dm-crypt-fix-avoid-cloned-bio-ref-after-free.patch
	dm-crypt-fix-call-to-clone_init.patch
	dm-crypt-fix-remove-first_clone.patch
	dm-crypt-use-smaller-bvecs-in-clones.patch
	dm-io-fix-panic-on-large-request.patch
	dm-merge-max_hw_sector.patch
	dm-use-singlethread-workqueues.patch

Here's two differents stacktrace we got.  The first one has been produced with the kernel 2.6.22.1 with dm-io-fix-panic-on-large-request.patch, dm-merge-max_hw_sector.patch and dm-use-singlethread-workqueues.patch applied.

===============================================================================

BUG: unable to handle kernel NULL pointer dereference at virtual address
0000000c
 printing eip:
c01404c8
*pdpt = 0000000002198001
*pde = 0000000000000000
Oops: 0000 [#1]
PREEMPT SMP
CPU:    1
EIP:    0060:[<c01404c8>] Not tainted VLI
EFLAGS: 00010282 (2.6.22.1-cp6012 #1)
EIP is at mempool_free+0x18/0x90
eax: f7fc3c3c  ebx: 00000000  ecx: 00000000  edx: 00000000
esi: f7db4c80  edi: f7fc3c3c  ebp: 00000000  esp: f7dcff10
ds: 007b es: 007b fs: 00d8 gs: 0000 ss:0068
Process kcryptd/0 (pid: 890, ti=f7dce000 task=dfed3a50 task.ti=f7dce000)
Stack: 00000000 f7dcff58 f7db4c80 00000000 c03496a2 f784ed80 0000007c
00000000
       f7fc3c3c 00000000 dfed3a50 c04792e0 c03cf839 f7dcffd0 c03cf860
00000001
       11b88896 00000006 f784ed80 f784ed80 00000000 00000000 00000001
00000001
Call trace:
 [<c03496a2>] kcryptd_do_work+0x262/0x310
 [<c03cf839>] schedule+0x2e9/0x970
 [<c03cf860>] schedule+0x310/0x970
 [<c0349440>] kcryptd_do_work+0x0/0x310
 [<c012b8ea>] run_workqueue+0x7a/0x100
 [<c012eed0>] autoremove_wake_function+0x0/0x50
 [<c012c23c>] worker_thread+0x9c/0x100
 [<c012eed0>] autoremove_wake_function+0x0/0x50
 [<c012c1a0>] worker_thread+0x0/0x100
 [<c012ebf2>] kthread+0x42/0x70
 [<c012ebb0>] kthread+0x0/0x70
 [<c01036c3>] kernel_thread_helper+0x7/0x14
 =========================
Code: 74 26 00 89 d1 89 c2 89 c8 c9 95 8e 01 00 90 8d 74 26 00 83 ec 10
89 5c 24 04 89 d3 89 7c 24 0c 89 c7 89 74 24 08 0f
ae f0 89 f6 <8b> 42 0c 3b 42 08 7d 1a 89 d0 e8 f9 4a 28 00 89 c6 8b 43
0c 3b
EIP: [<c01404c8>] mempool_free+0x18/0x90 SS:ESP 0068:f7dcff10

===============================================================================



===============================================================================

BUG: unable to handle kernel paging request at virtual address f990c19e
 printing eip:
c013f368
*pde = 00000000
Oops: 0000 [#1]
PREEMPT SMP
CPU:    0
EIP:    0060:[<c013f368>] Not tainted VLI
EFLAGS: 00010282 (2.6.21.6-cp6012 #1)
EIP is at mempool_free+0x18/0x90
eax: f7d94c3c  ebx: f880c192  ecx: c21b4940  edx: f880c192
esi: c212f3c0  edi: f7d94c3c  ebp: 00000000  esp: dfdf5edc
ds: 007b es: 007b fs: 00d8 gs: 0000 ss:0068
Process kcryptd/0 (pid: 905, ti=dfdf4000 task=dfc070b0 task.ti=dfdf4000)
Stack: 00000000 dfdf5f24 c212f3c0 00000000 c0345822 dfc61ac0 0000007c
00000000
       f7d94c3c dfdf5f6c dfdf5f68 00000000 c0120a62 c04673a0 00000001
00000003
       c21f8e70 00000001 dfc61ac0 dfc61ac0 00000000 00000000 00000001
00000001
Call trace:
 [<c0345822>] kcryptd_do_work+0x262/0x310
 [<c0120a62>] __do_softirq+0x82/0xf0
 [<c012bae3>] run_workqueue+0x93/0x140
 [<c03455c0>] kcryptd_do_work+0x0/0x310
 [<c012c337>] worker_thread+0x147/0x170
 [<c01175c0>] default_wake_function+0x0/0x10
 [<c012c1f0>] worker_thread+0x0/0x170
 [<c012ee4b>] kthread+0xbb/0xf0
 [<c012ed90>] kthread+0x0/0xf0
 [<c0103783>] kernel_thread_helper+0x7/0x14
 =========================
Code: 74 26 00 89 d1 89 c2 89 c8 c9 95 8e 01 00 90 8d 74 26 00 83 ec 10
89 5c 24 04 89 d3 89 7c 24 0c 89 c7 89 74 24 09 0f
ae f0 89 f6 <8b> 42 0c 3b 42 08 7d 1a 89 d0 e8 f9 4a 28 00 89 c6 8b 43
0c 3b
EIP: [<c013f368>] mempool_free+0x18/0x90 SS:ESP 0068:dfdf5edc

===============================================================================
Comment 8 Jan C. Nordholz 2007-07-16 15:13:23 UTC
(In reply to comment #6)
> The user on the Gentoo bug is unable to reproduce the problem even on the
> kernel where the crash was witnessed.
> 
> I emailed Jan from the gmane thread asking him to test 2.6.22-rc7 and post
> results here.
> 

Hi, better late than never... I just managed to reproduce it again. I'm experiencing this one since I don't know when (though I believe I guessed at the kernel version when it started in my gmane message). The backtrace has always been the same on my machine, but it differs slightly from the ones I see here... anyway, here it is. If I can help debugging this, let me know, I'll do most anything to my kernel. ;)

BUG: unable to handle kernel NULL pointer dereference at virtual address 00000004
 printing eip:
c01456e3
*pde = 00000000
Oops: 0000 [#1]
PREEMPT 
Modules linked in: dm_crypt xt_NFQUEUE xt_tcpudp xt_state xt_limit xt_CONNMARK xt_connmark xt_multiport ipt_REDIRECT ipt_MASQUERADE ipt_LOG nfnetlink_queue nfnetlink iptable_mangle iptable_nat nf_nat nf_conntrack_ipv4 nf_conntrack iptable_filter ip_tables x_tables ipv6 xfrm4_mode_transport esp4 deflate aes des cbc sha256 md5 hmac crypto_hash af_key ebtable_broute bridge llc ebtable_nat ebtable_filter ebtables dm_mod snd_virmidi snd_seq_virmidi snd_ca0106 snd_ac97_codec snd_pcm_oss snd_mixer_oss snd_pcm ac97_bus snd_page_alloc snd_seq_dummy snd_seq_oss snd_seq_midi snd_rawmidi snd_seq_midi_event snd_seq snd_timer snd_seq_device snd soundcore usb_storage sd_mod scsi_mod nls_utf8 nls_cp850 nls_iso8859_1 hisax_fcpcipnp hisax_isac hisax via686a i2c_isa w83781d hwmon_vid hwmon i2c_viapro i2c_core msr isdn_bsdcomp isdn cpuid rtc
CPU:    0
EIP:    0060:[<c01456e3>]    Not tainted VLI
EFLAGS: 00010286   (2.6.22-rc7 #1)
EIP is at mempool_free+0x13/0xb0
eax: ca35ac3c   ebx: ca35ac3c   ecx: 00000001   edx: 00000000
esi: 00000000   edi: ca35ac3c   ebp: d04dbef4   esp: d04dbee4
ds: 007b   es: 007b   fs: 0000  gs: 0000  ss: 0068
Process kcryptd/0 (pid: 7690, ti=d04da000 task=c34f50b0 task.ti=d04da000)
Stack: 00000200 ca35ac3c c6f21e80 d7e82660 d04dbf04 e0a4575a c6f21e80 d04dbf4c 
       d04dbf78 e0a45bde d15f2660 0000007c 00000000 0000b087 d15f2660 ca35ac3c 
       d15f26a0 d8890000 f0f5bfe8 c039f280 c039f430 c039f280 c34f50b0 d8fa3860 
Call Trace:
 [<c0104eca>] show_trace_log_lvl+0x1a/0x30
 [<c0104f89>] show_stack_log_lvl+0xa9/0xd0
 [<c0105180>] show_registers+0x1d0/0x310
 [<c01053c4>] die+0x104/0x230
 [<c0116848>] do_page_fault+0x178/0x5c0
 [<c0318d6a>] error_code+0x6a/0x70
 [<e0a4575a>] dec_pending+0x3a/0x50 [dm_crypt]
 [<e0a45bde>] kcryptd_do_work+0x2de/0x330 [dm_crypt]
 [<c012aa4f>] run_workqueue+0x8f/0x140
 [<c012b18e>] worker_thread+0x7e/0xe0
 [<c012de42>] kthread+0x42/0x70
 [<c0104b1f>] kernel_thread_helper+0x7/0x18
 =======================
Code: 04 1c 1d 00 8d 74 26 00 eb d8 8d b4 26 00 00 00 00 8d bc 27 00 00 00 00 55 89 e5 83 ec 10 89 75 f8 89 7d fc 89 d6 89 5d f4 89 c7 <8b> 42 04 3b 02 7d 27 9c 5b fa 89 e0 25 00 e0 ff ff ff 40 14 8b 
EIP: [<c01456e3>] mempool_free+0x13/0xb0 SS:ESP 0068:d04dbee4

Kernel is an unpatched vanilla as obtained from kernel.org - config is available if anyone wants to look into it.
Comment 9 Rafal Olearski 2007-07-17 01:54:12 UTC
Hi, I have the same problem, very annoying in 2.6.21.5 and 2.6.22.1
On other computer I have 2.6.19.2 and smaller partition (60GB instead of 320GB, and it works flawlessly.

Kernel is an unpatched vanilla.


BUG: unable to handle kernel NULL pointer dereference at virtual address 00000000
 printing eip:
c013ee29
*pde = 00000000
Oops: 0000 [#1]
PREEMPT 
Modules linked in: wp512 dummy snd_pcm_oss snd_mixer_oss sha256 blowfish pl2303 8139cp emu10k1_gp gameport ohci1394 ieee1394 nvidia_agp tsdev joydev option usbserial ne2k_pci 8390
CPU:    0
EIP:    0060:[<c013ee29>]    Not tainted VLI
EFLAGS: 00010282   (2.6.22.1.Raf #1)
EIP is at mempool_free+0x8/0x84
eax: deb3bc3c   ebx: deb3bc3c   ecx: c1237360   edx: 00000000
esi: 00000000   edi: deb3bc3c   ebp: c03b7e30   esp: c15e9f18
ds: 007b   es: 007b   fs: 0000  gs: 0000  ss: 0068
Process kcryptd/0 (pid: 984, ti=c15e8000 task=dffc4050 task.ti=c15e8000)
Stack: de0149c0 deb3bc3c c15e9f38 de0149c0 c03b7e29 d2952520 000000f9 00000000 
       d2952520 d2952520 00000000 00000000 00000001 00000001 000000fa 00000000 
       deb3bc48 deb3bc44 dfcc4da0 c0124eff 000040b1 1a02967a 000001cd d282a560 
Call Trace:
 [<c03b7e29>] process_read_endio+0x43/0x4a
 [<c0124eff>] run_workqueue+0x8c/0x126
 [<c0124f99>] worker_thread+0x0/0xd3
 [<c0125062>] worker_thread+0xc9/0xd3
 [<c0127e72>] autoremove_wake_function+0x0/0x33
 [<c0127e72>] autoremove_wake_function+0x0/0x33
 [<c0124f99>] worker_thread+0x0/0xd3
 [<c01279f9>] kthread+0x33/0x54
 [<c01279c6>] kthread+0x0/0x54
 [<c0104723>] kernel_thread_helper+0x7/0x10
 =======================
Code: 0a b8 88 13 00 00 e8 10 8a 34 00 8d 54 24 00 89 d8 e8 02 90 fe ff e9 44 ff ff ff 83 c4 20 5b 5e 5f 5d c3 57 56 89 d6 53 57 89 c7 <8b> 02 39 42 04 7d 68 9c 5b fa 89 e0 25 00 e0 ff ff ff 40 14 8b 
EIP: [<c013ee29>] mempool_free+0x8/0x84 SS:ESP 0068:c15e9f18
Comment 10 Laurent Parenteau 2007-07-17 08:22:02 UTC
Hi,

I've easily reproduced the problem using vanilla 2.6.18.8, and also using debian stable kernel (linux-2.6_2.6.18.dfsg.1, with patch linux-2.6_2.6.18.dfsg.1-12etch2.diff).

I'm always testing on the same machine, which have an Intel Core2Duo CPU.
Comment 11 Milan Broz 2007-07-19 03:36:50 UTC
Please try this patch.

Milan
--
mbroz@redhat.com

---

Flush workqueue before releasing mempool in dm-crypt.
There can be finished but not released request yet.

Call chain:
  run workqueue
    dec_pending
      bio_endio(...);
      <remove device request - remove mempool>
      mempool_free(io, cc->io_pool);

This usually happens when cryptsetup create temporary
luks mapping in the beggining of crypt device activation.

When dm-core calls destructor crypt_dtr, no new request
are possible.

Signed-off-by: Milan Broz <mbroz@redhat.com>

---
 drivers/md/dm-crypt.c |    2 ++
 1 file changed, 2 insertions(+)

Index: linux-2.6.22/drivers/md/dm-crypt.c
===================================================================
--- linux-2.6.22.orig/drivers/md/dm-crypt.c	2007-07-17 21:56:36.000000000 +0200
+++ linux-2.6.22/drivers/md/dm-crypt.c	2007-07-19 11:55:13.000000000 +0200
@@ -920,6 +920,8 @@ static void crypt_dtr(struct dm_target *
 {
 	struct crypt_config *cc = (struct crypt_config *) ti->private;
 
+	flush_workqueue(_kcryptd_workqueue);
+
 	bioset_free(cc->bs);
 	mempool_destroy(cc->page_pool);
 	mempool_destroy(cc->io_pool);
Comment 12 Laurent Parenteau 2007-07-19 06:37:05 UTC
Hi,

I've tried this patch, on a kernel 2.6.22.1, with those additional patches:
dm-io-fix-panic-on-large-request.patch
dm-merge-max_hw_sector.patch
dm-use-singlethread-workqueues.patch

I was not able to reproduce the bug.  Normally, it always happens 1 out of 20 times, and often, more often than that.  I've tried 70 times and it didn't happens.

It would be great if the other peoples who have experienced this issue could try it to confirm if it fix the bug for them too.

So thanks, this patch (seems to have) fixed the problem!
Comment 13 Rafal Olearski 2007-07-19 13:35:09 UTC
Hi, I can confirm, I'm unable to reproduce the bug with 2.6.22.1 and this patch.

Thank you very much!
Comment 14 Andrew Morton 2007-08-16 09:51:52 UTC
thanks, guys.
Comment 15 Torsten Landschoff 2007-08-22 00:27:19 UTC
Just for the sake of completeness: I've seen this bug as well for quite a while. After getting annoyed about it, I found this patch. Since then, no problems anymore. 

BTW: How does one check if the patch has been applied to vanilla?
Comment 16 Filipus Klutiero 2007-12-04 23:05:48 UTC
Torsten: by checking git.
This was fixed in 2.6.23.

Note You need to log in before you can comment on or make changes to this bug.