Bug 35282
Summary: | BUG() in 2.6.38.6 | ||
---|---|---|---|
Product: | IO/Storage | Reporter: | Maciej Rutecki (maciej.rutecki) |
Component: | SCSI | Assignee: | linux-scsi (linux-scsi) |
Status: | CLOSED CODE_FIX | ||
Severity: | normal | CC: | crazy-ivanovic, fabio.coatti, greg, maciej.rutecki, rjw |
Priority: | P1 | ||
Hardware: | All | ||
OS: | Linux | ||
Kernel Version: | 2.6.38.6 | Subsystem: | |
Regression: | Yes | Bisected commit-id: | |
Bug Depends on: | |||
Bug Blocks: | 32012 |
Description
Maciej Rutecki
2011-05-17 19:06:10 UTC
Reply-To: James.Bottomley@HansenPartnership.com On Tue, 2011-05-17 at 19:17 +0000, bugzilla-daemon@bugzilla.kernel.org wrote: > https://bugzilla.kernel.org/show_bug.cgi?id=35282 > May 15 16:09:32 192.168.42.2 Call Trace: > May 15 16:09:32 192.168.42.2 <IRQ> > May 15 16:09:32 192.168.42.2 > May 15 16:09:32 192.168.42.2 [<ffffffff812160eb>] ? __blk_run_queue+0x24/0x7e > May 15 16:09:32 192.168.42.2 [<ffffffff81216167>] ? blk_run_queue+0x22/0x34 > May 15 16:09:32 192.168.42.2 [<ffffffff812b2fde>] ? > scsi_run_queue+0x2ce/0x36d > May 15 16:09:32 192.168.42.2 [<ffffffff812b39ef>] ? > scsi_next_command+0x2d/0x39 > May 15 16:09:32 192.168.42.2 [<ffffffff812b4289>] ? > scsi_io_completion+0x1ca/0x3fd This looks like it might be related to the queue guard in __elv_next_request() again, could you try this patch: http://marc.info/?l=linux-scsi&m=130348673628282 Thanks, James Thanks for your answer :) Just tried (patch applied, compiled with make mrproper beforehand), but the result is pretty much the same. Interesting enough, If I reboot after using 2.6.38.5 for a while, the first boot on .6 is successful; following boots fails as shown so I guess that some hardware status can have a role into this, even if I don't know what happens. Here you can find the last capture with netconsole, with patch applied. (btw, how I can 100% sure to have applied it properly, besides checking the kernel date?) May 19 21:33:12 192.168.42.2 ACPI: PCI Interrupt Link [AE1B] (IRQs 16) *0, disabled. May 19 21:33:12 192.168.42.2 ACPI: PCI Interrupt Link [AE1C] (IRQs 16) *0, disabled. May 19 21:33:12 192.168.42.2 ACPI: PCI Interrupt Link [AE1D] (IRQs 16) *0 May 19 21:33:12 192.168.42.2 ACPI: PCI Interrupt Link [AE2A] (IRQs 16) *0, disabled. May 19 21:33:13 192.168.42.2 BUG: unable to handle kernel May 19 21:33:13 192.168.42.2 paging request May 19 21:33:13 192.168.42.2 at 0000000001000000 May 19 21:33:13 192.168.42.2 sd 9:0:0:0: [sde] No Caching mode page present May 19 21:33:13 192.168.42.2 sd 9:0:0:0: [sde] Assuming drive cache: write through May 19 21:33:13 192.168.42.2 sd 9:0:0:0: [sde] No Caching mode page present May 19 21:33:13 192.168.42.2 sd 9:0:0:0: [sde] Assuming drive cache: write through May 19 21:33:13 192.168.42.2 IP: May 19 21:33:13 192.168.42.2 [<0000000001000000>] 0x1000000 May 19 21:33:13 192.168.42.2 PGD 22e527067 May 19 21:33:13 192.168.42.2 PUD 22e528067 May 19 21:33:13 192.168.42.2 PMD 0 May 19 21:33:13 192.168.42.2 May 19 21:33:13 192.168.42.2 Oops: 0010 [#1] May 19 21:33:13 192.168.42.2 PREEMPT May 19 21:33:13 192.168.42.2 SMP May 19 21:33:13 192.168.42.2 May 19 21:33:13 192.168.42.2 sde: sde1 May 19 21:33:13 192.168.42.2 last sysfs file: /sys/devices/virtual/bdi/8:64/uevent May 19 21:33:13 192.168.42.2 CPU 1 May 19 21:33:13 192.168.42.2 May 19 21:33:13 192.168.42.2 Modules linked in: May 19 21:33:13 192.168.42.2 snd_hda_codec_hdmi May 19 21:33:13 192.168.42.2 snd_hda_codec_analog May 19 21:33:13 192.168.42.2 usbhid May 19 21:33:13 192.168.42.2 usb_storage May 19 21:33:13 192.168.42.2 usb_libusual May 19 21:33:13 192.168.42.2 uas May 19 21:33:13 192.168.42.2 snd_hda_intel May 19 21:33:13 192.168.42.2 snd_hda_codec May 19 21:33:13 192.168.42.2 snd_hwdep May 19 21:33:13 192.168.42.2 snd_pcm May 19 21:33:13 192.168.42.2 i2c_nforce2 May 19 21:33:13 192.168.42.2 snd_timer May 19 21:33:13 192.168.42.2 asus_atk0110 May 19 21:33:13 192.168.42.2 snd May 19 21:33:13 192.168.42.2 soundcore May 19 21:33:13 192.168.42.2 snd_page_alloc May 19 21:33:13 192.168.42.2 k10temp May 19 21:33:13 192.168.42.2 ohci_hcd May 19 21:33:13 192.168.42.2 ehci_hcd May 19 21:33:13 192.168.42.2 usbcore May 19 21:33:13 192.168.42.2 May 19 21:33:13 192.168.42.2 May 19 21:33:13 192.168.42.2 Pid: 9, comm: ksoftirqd/1 Not tainted 2.6.38.6 #2 May 19 21:33:13 192.168.42.2 May 19 21:33:13 192.168.42.2 System manufacturer System Product Name May 19 21:33:13 192.168.42.2 / May 19 21:33:13 192.168.42.2 M3N-HT DELUXE May 19 21:33:13 192.168.42.2 May 19 21:33:13 192.168.42.2 RIP: 0010:[<0000000001000000>] May 19 21:33:13 192.168.42.2 [<0000000001000000>] 0x1000000 May 19 21:33:13 192.168.42.2 RSP: 0018:ffff8800afc83df0 EFLAGS: 00010006 May 19 21:33:13 192.168.42.2 RAX: 0000000000000001 RBX: ffff88022e228c20 RCX: ffff8802288b5848 May 19 21:33:13 192.168.42.2 RDX: 0000000001000000 RSI: 0000000000000000 RDI: ffff88022e228c20 May 19 21:33:13 192.168.42.2 RBP: 0000000000000200 R08: 0000000000000000 R09: ffff8802288b5848 May 19 21:33:13 192.168.42.2 R10: ffff88022e228c20 R11: ffff88022fd63938 R12: ffff88022d75b040 May 19 21:33:13 192.168.42.2 R13: ffff88022e228c20 R14: 0000000000000000 R15: 0000000000000246 May 19 21:33:13 192.168.42.2 FS: 00007fafe7466700(0000) GS:ffff8800afc80000(0000) knlGS:0000000000000000 May 19 21:33:13 192.168.42.2 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b May 19 21:33:13 192.168.42.2 CR2: 0000000001000000 CR3: 000000022e525000 CR4: 00000000000006e0 May 19 21:33:13 192.168.42.2 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 May 19 21:33:13 192.168.42.2 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 May 19 21:33:13 192.168.42.2 Process ksoftirqd/1 (pid: 9, threadinfo ffff88022fcf4000, task ffff88022fccb100) May 19 21:33:13 192.168.42.2 Stack: May 19 21:33:13 192.168.42.2 ffffffff812160eb May 19 21:33:13 192.168.42.2 ffff88022e228c20 May 19 21:33:14 192.168.42.2 ffff88022e228c20 May 19 21:33:14 192.168.42.2 0000000000000296 May 19 21:33:14 192.168.42.2 May 19 21:33:14 192.168.42.2 ffffffff81216167 May 19 21:33:14 192.168.42.2 ffff88022fd63938 May 19 21:33:14 192.168.42.2 ffff88022d75b000 May 19 21:33:14 192.168.42.2 ffff8800afc83e58 May 19 21:33:14 192.168.42.2 May 19 21:33:14 192.168.42.2 ffffffff812b2ffe May 19 21:33:14 192.168.42.2 ffff88023f8026c0 May 19 21:33:14 192.168.42.2 ffff8800afc83e58 May 19 21:33:14 192.168.42.2 ffff88022de0c800 May 19 21:33:14 192.168.42.2 May 19 21:33:14 192.168.42.2 Call Trace: May 19 21:33:14 192.168.42.2 <IRQ> May 19 21:33:14 192.168.42.2 May 19 21:33:14 192.168.42.2 [<ffffffff812160eb>] ? __blk_run_queue+0x24/0x7e May 19 21:33:14 192.168.42.2 [<ffffffff81216167>] ? blk_run_queue+0x22/0x34 May 19 21:33:14 192.168.42.2 [<ffffffff812b2ffe>] ? scsi_run_queue+0x2ce/0x36d May 19 21:33:14 192.168.42.2 [<ffffffff812b3a0f>] ? scsi_next_command+0x2d/0x39 May 19 21:33:14 192.168.42.2 [<ffffffff812b42a9>] ? scsi_io_completion+0x1ca/0x3fd May 19 21:33:14 192.168.42.2 [<ffffffff813cb268>] ? _raw_spin_unlock+0x1e/0x2b May 19 21:33:14 192.168.42.2 [<ffffffff813cb295>] ? _raw_spin_unlock_irqrestore+0x20/0x2e May 19 21:33:14 192.168.42.2 [<ffffffff8121a83b>] ? blk_done_softirq+0x5f/0x6c May 19 21:33:14 192.168.42.2 [<ffffffff810326f9>] ? __do_softirq+0x7b/0x11a May 19 21:33:14 192.168.42.2 [<ffffffff81002f4c>] ? call_softirq+0x1c/0x28 May 19 21:33:14 192.168.42.2 <EOI> May 19 21:33:14 192.168.42.2 May 19 21:33:14 192.168.42.2 [<ffffffff81004761>] ? do_softirq+0x31/0x63 May 19 21:33:14 192.168.42.2 [<ffffffff810323c2>] ? run_ksoftirqd+0x9c/0x180 May 19 21:33:14 192.168.42.2 [<ffffffff81032326>] ? run_ksoftirqd+0x0/0x180 May 19 21:33:14 192.168.42.2 [<ffffffff81042a95>] ? kthread+0x7a/0x82 May 19 21:33:14 192.168.42.2 [<ffffffff81002e54>] ? kernel_thread_helper+0x4/0x10 May 19 21:33:14 192.168.42.2 [<ffffffff81042a1b>] ? kthread+0x0/0x82 May 19 21:33:14 192.168.42.2 [<ffffffff81002e50>] ? kernel_thread_helper+0x0/0x10 May 19 21:33:14 192.168.42.2 Code: May 19 21:33:14 192.168.42.2 Bad RIP value. May 19 21:33:14 192.168.42.2 May 19 21:33:14 192.168.42.2 RIP May 19 21:33:14 192.168.42.2 [<0000000001000000>] 0x1000000 May 19 21:33:14 192.168.42.2 RSP <ffff8800afc83df0> May 19 21:33:14 192.168.42.2 CR2: 0000000001000000 May 19 21:33:14 192.168.42.2 ---[ end trace e611b077bd16448b ]--- May 19 21:33:14 192.168.42.2 Kernel panic - not syncing: Fatal exception in interrupt May 19 21:33:14 192.168.42.2 Pid: 9, comm: ksoftirqd/1 Tainted: G D 2.6.38.6 #2 May 19 21:33:14 192.168.42.2 Call Trace: May 19 21:33:14 192.168.42.2 <IRQ> May 19 21:33:14 192.168.42.2 [<ffffffff813c94ba>] ? panic+0xa1/0x199 May 19 21:33:14 192.168.42.2 [<ffffffff8102e12c>] ? kmsg_dump+0xde/0xed May 19 21:33:14 192.168.42.2 [<ffffffff8100586a>] ? oops_end+0x7e/0x8b May 19 21:33:14 192.168.42.2 [<ffffffff8101b236>] ? no_context+0x1f4/0x201 May 19 21:33:14 192.168.42.2 [<ffffffff8101b718>] ? do_page_fault+0x150/0x2e8 May 19 21:33:14 192.168.42.2 [<ffffffff81022def>] ? check_preempt_curr+0x36/0x68 May 19 21:33:14 192.168.42.2 [<ffffffff810294e1>] ? ttwu_post_activation+0x18/0xa5 May 19 21:33:14 192.168.42.2 [<ffffffff813cb295>] ? _raw_spin_unlock_irqrestore+0x20/0x2e May 19 21:33:14 192.168.42.2 [<ffffffff8102b101>] ? try_to_wake_up+0x197/0x1b2 May 19 21:33:14 192.168.42.2 [<ffffffff813cba0f>] ? page_fault+0x1f/0x30 May 19 21:33:14 192.168.42.2 [<ffffffff812160eb>] ? __blk_run_queue+0x24/0x7e May 19 21:33:14 192.168.42.2 [<ffffffff81216167>] ? blk_run_queue+0x22/0x34 May 19 21:33:14 192.168.42.2 [<ffffffff812b2ffe>] ? scsi_run_queue+0x2ce/0x36d May 19 21:33:14 192.168.42.2 [<ffffffff812b3a0f>] ? scsi_next_command+0x2d/0x39 May 19 21:33:14 192.168.42.2 [<ffffffff812b42a9>] ? scsi_io_completion+0x1ca/0x3fd May 19 21:33:14 192.168.42.2 [<ffffffff813cb268>] ? _raw_spin_unlock+0x1e/0x2b May 19 21:33:14 192.168.42.2 [<ffffffff813cb295>] ? _raw_spin_unlock_irqrestore+0x20/0x2e May 19 21:33:14 192.168.42.2 [<ffffffff8121a83b>] ? blk_done_softirq+0x5f/0x6c May 19 21:33:14 192.168.42.2 [<ffffffff810326f9>] ? __do_softirq+0x7b/0x11a May 19 21:33:14 192.168.42.2 [<ffffffff81002f4c>] ? call_softirq+0x1c/0x28 May 19 21:33:14 192.168.42.2 <EOI> May 19 21:33:14 192.168.42.2 [<ffffffff81004761>] ? do_softirq+0x31/0x63 May 19 21:33:14 192.168.42.2 [<ffffffff810323c2>] ? run_ksoftirqd+0x9c/0x180 May 19 21:33:14 192.168.42.2 [<ffffffff81032326>] ? run_ksoftirqd+0x0/0x180 May 19 21:33:14 192.168.42.2 [<ffffffff81042a95>] ? kthread+0x7a/0x82 May 19 21:33:14 192.168.42.2 [<ffffffff81002e54>] ? kernel_thread_helper+0x4/0x10 May 19 21:33:14 192.168.42.2 [<ffffffff81042a1b>] ? kthread+0x0/0x82 May 19 21:33:14 192.168.42.2 [<ffffffff81002e50>] ? kernel_thread_helper+0x0/0x10 Reply-To: James.Bottomley@HansenPartnership.com On Thu, 2011-05-19 at 19:44 +0000, bugzilla-daemon@bugzilla.kernel.org wrote: > --- Comment #2 from Fabio Coatti <fabio.coatti@gmail.com> 2011-05-19 > 19:44:30 --- > Thanks for your answer :) > > Just tried (patch applied, compiled with make mrproper beforehand), but the > result is pretty much the same. > Interesting enough, If I reboot after using 2.6.38.5 for a while, the first > boot on .6 is successful; following boots fails as shown so I guess that some > hardware status can have a role into this, even if I don't know what happens. > Here you can find the last capture with netconsole, with patch applied. (btw, > how I can 100% sure to have applied it properly, besides checking the kernel > date?) Well, the first patch was just a guess. It looks like another queue guard problem. However, I think this one is that the code thinks a dead queue isn't stopped. Could you try this patch instead? Thanks, James --- diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h index 2ad95fa..8166845 100644 --- a/include/linux/blkdev.h +++ b/include/linux/blkdev.h @@ -473,7 +473,9 @@ static inline void queue_flag_clear(unsigned int flag, struct request_queue *q) } #define blk_queue_tagged(q) test_bit(QUEUE_FLAG_QUEUED, &(q)->queue_flags) -#define blk_queue_stopped(q) test_bit(QUEUE_FLAG_STOPPED, &(q)->queue_flags) +#define blk_queue_stopped(q) \ + (test_bit(QUEUE_FLAG_STOPPED, &(q)->queue_flags) || \ + test_bit(QUEUE_FLAG_DEAD, &(q)->queue_flags)) #define blk_queue_nomerges(q) test_bit(QUEUE_FLAG_NOMERGES, &(q)->queue_flags) #define blk_queue_noxmerges(q) \ test_bit(QUEUE_FLAG_NOXMERGES, &(q)->queue_flags) Ok, just tried (I've had to apply the patch manually, as my include/linux/blkdev.h is slightly different and patch complained a bit, see below). Now it works, the kernel is able to boot correctly and I'm writing from 2.6.38.6. Many thanks for your time! PS: actual blkdev.h on my machine: static inline void queue_flag_clear(unsigned int flag, struct request_queue *q) { WARN_ON_ONCE(!queue_is_locked(q)); __clear_bit(flag, &q->queue_flags); (473)} #define blk_queue_plugged(q) test_bit(QUEUE_FLAG_PLUGGED, &(q)->queue_flags) #define blk_queue_tagged(q) test_bit(QUEUE_FLAG_QUEUED, &(q)->queue_flags) #define blk_queue_stopped(q) \ (test_bit(QUEUE_FLAG_STOPPED, &(q)->queue_flags) || \ test_bit(QUEUE_FLAG_DEAD, &(q)->queue_flags)) #define blk_queue_nomerges(q) test_bit(QUEUE_FLAG_NOMERGES, &(q)->queue_flags) #define blk_queue_noxmerges(q) \ I had some crashes over here, especially with 2.6.39, but also with some 2.6.38.*, kernel versions, too. Applied this patch about 24h ago and so far there seem to be no further kernel panics. Since I know no way to actually get the logs from the crashes, I just photographed them using my mobile phone: http://imagebin.org/index.php?mode=image&id=155275 http://imagebin.org/index.php?mode=image&id=155274 As you can probably see they are not complete, some output is missing. My general hardware/setup: * AMD 64 vanilla kernel (2.6.39) * Phenom II X6 1055T * 8GB Ram * Gigabyte GA-870A-UD3 (rev. 2.1) Reply-To: James.Bottomley@HansenPartnership.com > --- Comment #5 from Nils Kneuper <crazy-ivanovic@gmx.net> 2011-05-27 > 09:53:06 --- > I had some crashes over here, especially with 2.6.39, but also with some > 2.6.38.*, kernel versions, too. Applied this patch about 24h ago and so far > there seem to be no further kernel panics. Since I know no way to actually > get > the logs from the crashes, I just photographed them using my mobile phone: > http://imagebin.org/index.php?mode=image&id=155275 > http://imagebin.org/index.php?mode=image&id=155274 That looks like this: http://marc.info/?l=linux-scsi&m=130635674521428 It depends on a couple of other block fixes, though. Everything will probably be upstream and back to stable in the next few days. James Fixed by commit e73e079bf128d68284efedeba1fbbc18d78610f9 . |