Bug 13761 - kmemleak + BUG: unable to handle kernel paging request
Summary: kmemleak + BUG: unable to handle kernel paging request
Status: CLOSED CODE_FIX
Alias: None
Product: Other
Classification: Unclassified
Component: Other (show other bugs)
Hardware: All Linux
: P1 normal
Assignee: other_other
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2009-07-11 06:49 UTC by Márton Németh
Modified: 2012-06-13 13:21 UTC (History)
2 users (show)

See Also:
Kernel Version: 2.6.31-rc2
Subsystem:
Regression: No
Bisected commit-id:


Attachments
kernel configuration (64.34 KB, application/octet-stream)
2009-07-11 06:51 UTC, Márton Németh
Details
kmemleak: Do not acquire scan_mutex in kmemleak_open() (3.60 KB, patch)
2009-07-12 11:14 UTC, Catalin Marinas
Details | Diff
kmemleak: Protect the seq start/next/stop sequence by rcu_read_lock() (1.40 KB, patch)
2009-07-12 11:15 UTC, Catalin Marinas
Details | Diff

Description Márton Németh 2009-07-11 06:49:34 UTC
First I mounted debugfs with "mount -t debugfs nodev /sys/kernel/debug/". Then I executed "echo scan >/sys/kernel/debug/kmemleak" and "cat /sys/kernel/debug/kmemleak" a few times. I get the following message:


[ 1649.863693] BUG: unable to handle kernel paging request at 6b6b6b6b
[ 1649.863706] IP: [<c01b2f95>] kmemleak_seq_next+0x75/0x120
[ 1649.863718] *pde = 00000000 
[ 1649.863723] Oops: 0000 [#1] PREEMPT 
[ 1649.863728] last sysfs file: /sys/devices/system/cpu/cpu0/cpuidle/state2/time
[ 1649.863733] Modules linked in: gspca_sunplus gspca_main videodev v4l1_compat via drm agpgart ppdev lp ipv6 powernow_k8 cpufreq_powersave cpufreq_ondemand cpufreq_userspace cpufreq_conservative leds_clevo_mail led_class snd_via82xx snd_via82xx_modem snd_ac97_codec ac97_bus snd_pcm_oss snd_mixer_oss usbhid snd_pcm snd_mpu401_uart mousedev snd_seq_oss snd_seq_midi pcmcia snd_rawmidi firmware_class snd_seq_midi_event snd_seq snd_timer snd_seq_device serio_raw snd i2c_viapro ide_cd_mod ehci_hcd pcspkr uhci_hcd psmouse k8temp hwmon video backlight 8139too snd_page_alloc cdrom soundcore yenta_socket rsrc_nonstatic usbcore i2c_core nls_base pcmcia_core mii output battery 8250_pnp 8250 rtc_cmos parport_pc serial_core parport rtc_core rtc_lib button thermal processor ac evdev
[ 1649.863820] 
[ 1649.863825] Pid: 4464, comm: cat Not tainted (2.6.31-rc2 #1) K8N800
[ 1649.863830] EIP: 0060:[<c01b2f95>] EFLAGS: 00210292 CPU: 0
[ 1649.863834] EIP is at kmemleak_seq_next+0x75/0x120
[ 1649.863838] EAX: c7241df0 EBX: 00000000 ECX: 00000000 EDX: 6b6b6b6b
[ 1649.863842] ESI: e7ea1c70 EDI: c039a168 EBP: db191f1c ESP: db191ef4
[ 1649.863846]  DS: 007b ES: 007b FS: 0000 GS: 0033 SS: 0068
[ 1649.863850] Process cat (pid: 4464, ti=db190000 task=e4190620 task.ti=db190000)
[ 1649.863853] Stack:
[ 1649.863856]  00000002 00000001 00000000 c01b2f58 00000000 00000002 c7241df0 00000000
[ 1649.863866] <0> e7ea1c70 c039a168 db191f64 c01d01a3 ee911008 00000cc0 00001000 08051000
[ 1649.863877] <0> e7e8bc78 e7ea1c98 00000000 00000000 ee911008 00000002 ee911008 0003944e
[ 1649.863888] Call Trace:
[ 1649.863895]  [<c01b2f58>] ? kmemleak_seq_next+0x38/0x120
[ 1649.863902]  [<c01d01a3>] ? seq_read+0x323/0x3d0
[ 1649.863908]  [<c01b6fe9>] ? vfs_read+0x99/0x160
[ 1649.863913]  [<c01cfe80>] ? seq_read+0x0/0x3d0
[ 1649.863919]  [<c01b716d>] ? sys_read+0x3d/0x70
[ 1649.863925]  [<c0102e44>] ? sysenter_do_call+0x12/0x32
[ 1649.863928] Code: c0 c7 44 24 0c 58 2f 1b c0 c7 44 24 08 00 00 00 00 c7 44 24 04 01 00 00 00 c7 04 24 02 00 00 00 e8 d1 2e fa ff 8b 45 f0 8b 50 20 <8b> 02 0f 18 00 90 81 fa 60 ad 49 c0 89 d6 c7 45 e8 00 00 00 00 
[ 1649.863989] EIP: [<c01b2f95>] kmemleak_seq_next+0x75/0x120 SS:ESP 0068:db191ef4
[ 1649.863996] CR2: 000000006b6b6b6b
[ 1649.864034] ---[ end trace 1a6e31a050e7af6e ]---
[ 1649.864040] note: cat[4464] exited with preempt_count 1
[ 1649.864570] Slab corruption: kmemleak_object start=c7241df0, len=192
[ 1649.864576] Redzone: 0x9f911029d74e35b/0x9f911029d74e35b.
[ 1649.864579] Last user: [<c01b30a2>](free_object_rcu+0x62/0x70)
[ 1649.864589] 000: 01 00 00 00 6b 6b 6b 6b ff ff ff ff ff ff ff ff
[ 1649.864607] 040: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6c 6b 6b 6b
[ 1649.864626] Prev obj: start=c7241d18, len=192
[ 1649.864630] Redzone: 0xd84156c5635688c0/0xd84156c5635688c0.
[ 1649.864633] Last user: [<c01b4154>](kmemleak_alloc+0x74/0x290)
[ 1649.864639] 000: 01 00 00 00 ad 4e ad de ff ff ff ff ff ff ff ff
[ 1649.864656] 010: cc 3a a2 c0 00 00 00 00 31 02 43 c0 01 00 00 00
[ 1649.864673] Next obj: start=c7241ec8, len=192
[ 1649.864676] Redzone: 0xd84156c5635688c0/0xd84156c5635688c0.
[ 1649.864679] Last user: [<c01b4154>](kmemleak_alloc+0x74/0x290)
[ 1649.864685] 000: 01 00 00 00 ad 4e ad de ff ff ff ff ff ff ff ff
[ 1649.864702] 010: cc 3a a2 c0 00 00 00 00 31 02 43 c0 01 00 00 00
Comment 1 Márton Németh 2009-07-11 06:51:09 UTC
Created attachment 22307 [details]
kernel configuration
Comment 2 Catalin Marinas 2009-07-11 17:55:03 UTC
I haven't managed to trigger this bug yet (I'll try a bit more) but I have a patch in my branch which sorts out mutex locking in the seq functions.

http://www.linux-arm.org/git?p=linux-2.6.git;a=commitdiff_plain;h=b87324d082d9d898e3c06b2a07a2b94b2430b8ba

Do you still get this bug with the above patch (the full branch is at http://www.linux-arm.org/git?p=linux-2.6.git;a=shortlog;h=kmemleak)?
Comment 3 Catalin Marinas 2009-07-11 21:45:00 UTC
Even if I haven't reproduced it, it seems that it is possible for a kmemleak_object.object_list.next to point to a freed object (filled with 0x6b poison). I'll have a look and fix this.

Thanks.
Comment 4 Catalin Marinas 2009-07-11 22:20:33 UTC
The kmemleak_seq_next() function does list traversal where elements can be freed but the rcu_read_lock only protects a small loop. My understanding of the seq_file is that locks can be held between start() and stop() as the seq code doesn't sleep. In this case, the rcu_read_lock() is called in start() and rcu_read_unlock() in stop():

diff --git a/mm/kmemleak.c b/mm/kmemleak.c
index b2426af..7f33ca9 100644
--- a/mm/kmemleak.c
+++ b/mm/kmemleak.c
@@ -1213,7 +1213,6 @@ static void *kmemleak_seq_start(struct seq_file *seq, loff_t *pos)
 	}
 	object = NULL;
 out:
-	rcu_read_unlock();
 	return object;
 }
 
@@ -1229,13 +1228,11 @@ static void *kmemleak_seq_next(struct seq_file *seq, void *v, loff_t *pos)
 
 	++(*pos);
 
-	rcu_read_lock();
 	list_for_each_continue_rcu(n, &object_list) {
 		next_obj = list_entry(n, struct kmemleak_object, object_list);
 		if (get_object(next_obj))
 			break;
 	}
-	rcu_read_unlock();
 
 	put_object(prev_obj);
 	return next_obj;
@@ -1251,6 +1248,7 @@ static void kmemleak_seq_stop(struct seq_file *seq, void *v)
 		 * kmemleak_seq_start may return ERR_PTR if the scan_mutex
 		 * waiting was interrupted, so only release it if !IS_ERR.
 		 */
+		rcu_read_unlock();
 		mutex_unlock(&scan_mutex);
 		if (v)
 			put_object(v);
Comment 5 Márton Németh 2009-07-12 05:06:13 UTC
It was not easy for me also to reproduce this problem. Here is the way how I can trigger this problem:

1. mount -t debugfs nodev /sys/kernel/debug/
2. Open two xterm windows
3. On the first xterm window execute the command: 
   "while true; do echo -n .; echo scan >/sys/kernel/debug/kmemleak; done"
4. On the second xterm window execute the command:
   "while true; do clear; cat /sys/kernel/debug/kmemleak; done"
Comment 6 Márton Németh 2009-07-12 05:15:07 UTC
(In reply to comment #4)
> diff --git a/mm/kmemleak.c b/mm/kmemleak.c
> index b2426af..7f33ca9 100644
> --- a/mm/kmemleak.c
> +++ b/mm/kmemleak.c

Could you please also attach the patch to this bug? It seems that bugzilla converted all the TABs to SPACEs.

Exactly which patch should I apply? The one you mentioned in comment #2 or the one at comment #4 or both?
Comment 7 Catalin Marinas 2009-07-12 11:14:16 UTC
Created attachment 22319 [details]
kmemleak: Do not acquire scan_mutex in kmemleak_open()
Comment 8 Catalin Marinas 2009-07-12 11:15:19 UTC
Created attachment 22320 [details]
kmemleak: Protect the seq start/next/stop sequence by rcu_read_lock()
Comment 9 Catalin Marinas 2009-07-12 11:17:57 UTC
Both should be applied in the attached order. I attached kmemleak-scan-mutex.patch and kmemleak-seq-lock.patch.

Thanks.
Comment 10 Catalin Marinas 2009-07-13 16:24:27 UTC
The first patch was just merged into mainline, so only the second is needed. Please let me know if it solves the issue for you. Thanks.
Comment 11 Catalin Marinas 2009-07-31 12:59:15 UTC
Both patches now merged into mainline, so I'd like to close this bug. Please re-open if you think the problem still exists. Thanks.

Note You need to log in before you can comment on or make changes to this bug.