Bug 9264
Summary: | leds: ledtrig-timer calls sleeping function from invalid context | ||
---|---|---|---|
Product: | Drivers | Reporter: | Márton Németh (nm127) |
Component: | Other | Assignee: | Richard Purdie (rpurdie) |
Status: | RESOLVED CODE_FIX | ||
Severity: | normal | CC: | akpm, greg, htejun, mingo, rpurdie |
Priority: | P1 | ||
Hardware: | All | ||
OS: | Linux | ||
Kernel Version: | 2.6.24-rc1 | Subsystem: | |
Regression: | --- | Bisected commit-id: | |
Bug Depends on: | |||
Bug Blocks: | 9243 | ||
Attachments: |
.config of 2.6.24-rc1
Kernel-space test for DEBUG_SPINLOCK_SLEEP and kzalloc() change GFP_KERNEL to GFP_ATOMIC Proposed fix Kernel-space test for DEBUG_SPINLOCK_SLEEP and kzalloc() |
Description
Márton Németh
2007-10-30 22:31:05 UTC
Created attachment 13357 [details]
.config of 2.6.24-rc1
did this happen in earlier kernels, or is it new behaviour in 2.6.24-rc1? Looks like a bug in the led driver. led_trigger_store() calls led_trigger_set() with led_cdev->trigger_lock write locked. led_trigger_set() in turn calls timer_trig_activate() which in turn calls kzalloc with GFP_KERNEL and then device_create_file() which eventually calls sysfs_new_dirent() which triggers BUG as it tries GFP_KERNEL allocation while a rwlock_t is held. The question is why the kzalloc() call from led_trigger_set() didn't cause the BUG first. Anyways, that lock should either be a rwsem or led_trigger_set() shouldn't try to allocate using GFP_KERNEL or call device_create_file(). Created attachment 13491 [details]
Kernel-space test for DEBUG_SPINLOCK_SLEEP and kzalloc()
The attached test tries to trigger the BUG message when kzalloc() is used badly.
TestCase1: first calls write_lock() then calls kzalloc(..., GFP_ATOMIC) and finally write_unlock(). The BUG message should not appear.
TestCase2: calls write_lock() then calls kzalloc(..., GFP_KERNEL) and finally write_unlock(). The BUG message should appear.
Test loaded
TestCase1: Try to kzalloc(..., GFP_ATOMIC) while a write lock is held
TestCase1: in_atomic():1, irqs_disabled():0
TestCase1: finished
TestCase2: Try to kzalloc(..., GFP_KERNEL) while a write lock is held
TestCase2: in_atomic():1, irqs_disabled():0
BUG: sleeping function called from invalid context at mm/slab.c:3050
in_atomic():1, irqs_disabled():0
1 lock held by insmod/10900:
#0: (&test_lock#2){--..}, at: [<f8cca0b8>] test_init+0xb8/0x11a [spinlock_sleep_test]
[<c010533a>] show_trace_log_lvl+0x1a/0x30
[<c0105d32>] show_trace+0x12/0x20
[<c0105e85>] dump_stack+0x15/0x20
[<c011a478>] __might_sleep+0xb8/0xd0
[<c01749cf>] kmem_cache_alloc+0xcf/0x120
[<f8cca0fa>] test_init+0xfa/0x11a [spinlock_sleep_test]
[<c01470e1>] sys_init_module+0x131/0x15a0
[<c01042d2>] sysenter_past_esp+0x5f/0xa5
=======================
TestCase2: finished
Test unloading
It seems that the BUG detection works correctly.
Created attachment 13492 [details]
change GFP_KERNEL to GFP_ATOMIC
I'm afraid I was not carefully enough to report this issue using an unmodified kernel. I found that I already changed the GFP_KERNEL to GFP_ATOMIC in drivers/leds/ledtrig-timer.c, sorry about that.
However, this change did not removed the BUG message in dmesg.
Looking at the code two of the locks should be rwsems. I'll write a patch... Created attachment 13493 [details]
Proposed fix
The attached patch should fix the problem, can you test and confirm please?
Created attachment 13495 [details] Kernel-space test for DEBUG_SPINLOCK_SLEEP and kzalloc() TestCase3 added to test init_rwsem()/down_write()/up_write() concept. It seems to be working: TestCase3: Try to kzalloc(..., GFP_KERNEL) while a write semaphore is held TestCase3: in_atomic():0, irqs_disabled():0 TestCase3: finished I'll try the patch from comment #7 soon. The patch from comment #7 gets rid of the BUG messages, thanks. Ping: the patch in the bugzilla solved the regression but the fix is still not upstream it appears, as of 2.6.24-rc4. The patch has been in the LED tree and -mm, I'll make sure it gets to Linus. |