Bug 11990
Summary: | Kernel hang in spin_unlock_irq from scsi_request_fn from do_IRQ | ||
---|---|---|---|
Product: | IO/Storage | Reporter: | Petr Vandrovec (vandrove) |
Component: | SCSI | Assignee: | linux-scsi (linux-scsi) |
Status: | CLOSED DUPLICATE | ||
Severity: | normal | CC: | rjw |
Priority: | P1 | ||
Hardware: | All | ||
OS: | Linux | ||
Kernel Version: | 2.6.28-rc3 | Subsystem: | |
Regression: | Yes | Bisected commit-id: | |
Bug Depends on: | |||
Bug Blocks: | 11808 |
Description
Petr Vandrovec
2008-11-08 19:50:18 UTC
Apparently my lower bound test kernel did not had Jens's patch to use tagged queuing from SCSI layer applied. I've discovered reliable test case (write 4GB of data concurrently to every attached disk), and found that reverting all 4 SATA tagged queueing related checkins gets rid of the problem: 43a49cbdf31e812c0d8f553d433b09b421f5d52c 3070f69b66b7ab2f02d8a2500edae07039c38508 e013e13bf605b9e6b702adffbe2853cfc60e7806 2fca5ccf97d2c28bcfce44f5b07d85e74e3cd18e Reply-To: James.Bottomley@HansenPartnership.com On Sat, 2008-11-08 at 19:50 -0800, bugme-daemon@bugzilla.kernel.org wrote: > http://bugzilla.kernel.org/show_bug.cgi?id=11990 > > Summary: Kernel hang in spin_unlock_irq from scsi_request_fn from > do_IRQ > Product: IO/Storage > Version: 2.5 > KernelVersion: 2.6.28-rc3 > Platform: All > OS/Version: Linux > Tree: Mainline > Status: NEW > Severity: normal > Priority: P1 > Component: SCSI > AssignedTo: linux-scsi@vger.kernel.org > ReportedBy: vandrove@vc.cvut.cz > > > Latest working kernel version: commit c8d7aa after 2.6.28-rc2 > Earliest failing kernel version: commit 920da6 after 2.6.28-rc2 > Distribution: Debian > Hardware Environment: sata_sil24, amd64, 2cpu > Software Environment: 64bit kernel, 32bit userspace, preemptible kernel > Problem Description: > > When I/O is under stress, from time to time CPU1 hangs, most probably due to > endless stream of interrupts. Backtrace printed either by kernel's > softlockup > detection or alt-sysrq-p is below (written down; I/O is dead when this > happens). > > _spin_unlock_irq + 0x30 (after sti) > scsi_request_fn + 0x1b9 (after spin_unlock_irq(shost->host_lock) at > not_ready:) > blk_invoke_request_fn > __blk_runqueue > scsi_run_queue > scsi_next_command > scsi_end_request > scsi_io_completion > scsi_finish_command > scsi_softirq_done > blk_done_softirq > __do_softirq > call_softirq > do_softirq > irqexit > do_IRQ > ret_from_intr > <EOI> > native_safe_halt > trace_hardirqs_on > default_idle > c1e_idle > cpu_idle > start_secondary > > Steps to reproduce: > > It seems to occur under heavy I/O (updatedb, dumping core from ~3GB app), but > I > was not able to trigger it reliably - most reliable is hard resetting box, > then > it occurs in ~80% cases when replaying journals on disks connected to > sata_sil24 (through PMP, but problem does not seem to occur on 2.6.28-rc2 > with > Jens's PMP patches). This looks identical to http://bugzilla.kernel.org/show_bug.cgi?id=11898 Could you see if this refinement of the discussed patches fixes it for you? Thanks, James --- diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c index f5d3b96..e09a661 100644 --- a/drivers/scsi/scsi_lib.c +++ b/drivers/scsi/scsi_lib.c @@ -606,6 +606,7 @@ static void scsi_run_queue(struct request_queue *q) } list_del_init(&sdev->starved_entry); + starved_head = NULL; spin_unlock(shost->host_lock); spin_lock(sdev->request_queue->queue_lock); @@ -620,6 +621,12 @@ static void scsi_run_queue(struct request_queue *q) spin_unlock(sdev->request_queue->queue_lock); spin_lock(shost->host_lock); + if (unlikely(!list_empty(&sdev->starved_entry))) + /* + * sdev got put back on the starved list + * so finish starved handling + */ + break; } spin_unlock_irqrestore(shost->host_lock, flags); |