Bug 12020 - scsi_times_out NULL pointer dereference
Summary: scsi_times_out NULL pointer dereference
Status: CLOSED UNREPRODUCIBLE
Alias: None
Product: SCSI Drivers
Classification: Unclassified
Component: Other (show other bugs)
Hardware: All Linux
: P1 normal
Assignee: scsi_drivers-other
URL:
Keywords:
Depends on:
Blocks: 11808
  Show dependency tree
 
Reported: 2008-11-13 10:30 UTC by Bernd Schubert
Modified: 2008-12-07 12:21 UTC (History)
3 users (show)

See Also:
Kernel Version: 2.6.28-git20081113
Subsystem:
Regression: Yes
Bisected commit-id:


Attachments

Description Bernd Schubert 2008-11-13 10:30:21 UTC
Latest working kernel version: 2.6.27
Earliest failing kernel version: 2.6.28-rc4
Hardware Environment: Infortrend G2430 connected to LSI22320R
Problem Description:

Hello,

first in 2.6.28-rc{1,2,3} the error handler was entirely broken - it deadlocked. In rc4 this is fixed, but now I already two times got a Null pointer dereference while doing some error handler tests. All of that looks like due to the scsi timeout commits.

Steps to reproduce: E.g. reset devices connected to LSI 53C1030 devices using lsiutil. Can be reproduced on about 20% eh activations.

(gdb) l *(scsi_times_out+0x15)
0xffffffff80460f1e is in scsi_times_out (drivers/scsi/scsi_error.c:176).
171             enum blk_eh_timer_return (*eh_timed_out)(struct scsi_cmnd *);
172             enum blk_eh_timer_return rtn = BLK_EH_NOT_HANDLED;
173
174             scsi_log_completion(scmd, TIMEOUT_ERROR);
175
176             if (scmd->device->host->transportt->eh_timed_out)
177                     eh_timed_out = scmd->device->host->transportt->eh_timed_out;
178             else if (scmd->device->host->hostt->eh_timed_out)
179                     eh_timed_out = scmd->device->host->hostt->eh_timed_out;
180             else

[  143.804672] BUG: unable to handle kernel NULL pointer dereference at 0000000000000000
[  143.808507] IP: [<ffffffff80460f1e>] scsi_times_out+0x15/0x71
[  143.816020] PGD f9381067 PUD f9360067 PMD 0
[  143.824018] Oops: 0000 [#1] SMP
[  143.824018] last sysfs file: /sys/devices/system/cpu/cpu1/cache/index2/shared_cpu_map
[  143.832016] Dumping ftrace buffer:
[  143.832016]    (ftrace buffer empty)
[  143.832016] CPU 1
[  143.832016] Modules linked in: mptctl ib_ipoib inet_lro ib_umad rdma_ucm rdma_cm ib_cm iw_cm ib_sa ib_addr ib_uvee
[  143.832016] Pid: 246, comm: pdflush Not tainted 2.6.28-rc4-bs1 #10
[  143.832016] RIP: 0010:[<ffffffff80460f1e>]  [<ffffffff80460f1e>] scsi_times_out+0x15/0x71
[  143.832016] RSP: 0018:ffff88007f6a3df0  EFLAGS: 00010086
[  143.832016] RAX: ffff88007ebf5330 RBX: 0000000000000000 RCX: ffff8800f93804b8
[  143.832016] RDX: ffff88007ebf5948 RSI: 0000000000000246 RDI: ffff8800f9380378
[  143.832016] RBP: ffff88007f6a3e00 R08: 0000000000000000 R09: 0000000000000000
[  143.832016] R10: ffff8800f9144680 R11: ffff88007eeac240 R12: ffff88007ebf5330
[  143.832016] R13: ffff88007ebf5808 R14: ffffffff80380461 R15: 0000000000000000
[  143.832016] FS:  0000000000733860(0000) GS:ffff8800fb29ab40(0000) knlGS:0000000000000000
[  143.832016] CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
[  143.832016] CR2: 0000000000000000 CR3: 00000000e80ec000 CR4: 00000000000006e0
[  143.832016] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  143.832016] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[  143.832016] Process pdflush (pid: 246, threadinfo ffff88007ed12000, task ffff88007ed11890)
[  143.832016] Stack:
[  143.832016]  ffff88007f6a3e00 ffff8800f9380378 ffff88007f6a3e20 ffffffff80380426
[  143.832016]  ffff88007ebf5330 ffff8800f9380378 ffff88007f6a3e70 ffffffff803804f9
[  143.832016]  ffff88007eea0000 ffff88007ebf5668 0000000000000246 ffff88007ebf5330
[  143.832016] Call Trace:
[  143.832016]  <IRQ> <0> [<ffffffff80380426>] blk_rq_timed_out+0x1b/0x56
[  143.832016]  [<ffffffff803804f9>] blk_rq_timed_out_timer+0x98/0x118
[  143.832016]  [<ffffffff80380461>] ? blk_rq_timed_out_timer+0x0/0x118
[  143.832016]  [<ffffffff802464e2>] run_timer_softirq+0x14c/0x1cc
[  143.832016]  [<ffffffff80242392>] __do_softirq+0x83/0x128
[  143.832016]  [<ffffffff8020d03c>] call_softirq+0x1c/0x28
[  143.832016]  [<ffffffff8020ea39>] do_softirq+0x49/0x90
[  143.832016]  [<ffffffff802422aa>] irq_exit+0x44/0x46
[  143.832016]  [<ffffffff8020e88b>] do_IRQ+0xba/0xcf
Comment 1 Anonymous Emailer 2008-11-13 11:03:41 UTC
Reply-To: James.Bottomley@HansenPartnership.com

On Thu, 2008-11-13 at 10:30 -0800, bugme-daemon@bugzilla.kernel.org
wrote:
> http://bugzilla.kernel.org/show_bug.cgi?id=12020
> 
>            Summary: scsi_times_out NULL pointer dereference
>            Product: SCSI Drivers
>            Version: 2.5
>      KernelVersion: 2.6.28-git20081113
>           Platform: All
>         OS/Version: Linux
>               Tree: Mainline
>             Status: NEW
>           Severity: normal
>           Priority: P1
>          Component: Other
>         AssignedTo: scsi_drivers-other@kernel-bugs.osdl.org
>         ReportedBy: bs@q-leap.de
> 
> 
> Latest working kernel version: 2.6.27
> Earliest failing kernel version: 2.6.28-rc4
> Hardware Environment: Infortrend G2430 connected to LSI22320R
> Problem Description:
> 
> Hello,
> 
> first in 2.6.28-rc{1,2,3} the error handler was entirely broken - it
> deadlocked. In rc4 this is fixed, but now I already two times got a Null
> pointer dereference while doing some error handler tests. All of that looks
> like due to the scsi timeout commits.
> 
> Steps to reproduce: E.g. reset devices connected to LSI 53C1030 devices using
> lsiutil. Can be reproduced on about 20% eh activations.
> 
> (gdb) l *(scsi_times_out+0x15)
> 0xffffffff80460f1e is in scsi_times_out (drivers/scsi/scsi_error.c:176).
> 171             enum blk_eh_timer_return (*eh_timed_out)(struct scsi_cmnd *);
> 172             enum blk_eh_timer_return rtn = BLK_EH_NOT_HANDLED;
> 173
> 174             scsi_log_completion(scmd, TIMEOUT_ERROR);
> 175
> 176             if (scmd->device->host->transportt->eh_timed_out)
> 177                     eh_timed_out =
> scmd->device->host->transportt->eh_timed_out;
> 178             else if (scmd->device->host->hostt->eh_timed_out)
> 179                     eh_timed_out =
> scmd->device->host->hostt->eh_timed_out;
> 180             else

Actually, I think the trace is slightly off.  I suspect this is the
problem:

	struct scsi_cmnd *scmd = req->special;

I bet req->special is NULL because the command timed out even before it
was prepared by the subsystem.

Does this fix it?

The fix is more of a bandaid than anything ... we can't really have
commands timing out in the mid-layer because we expect we have full
control of them.  With this patch, if we run out of resets, block will
complete a command we're still processing.

James

---

diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c
index 94ed262..5612c42 100644
--- a/drivers/scsi/scsi_error.c
+++ b/drivers/scsi/scsi_error.c
@@ -127,6 +127,13 @@ enum blk_eh_timer_return scsi_times_out(struct request *req)
 	enum blk_eh_timer_return (*eh_timed_out)(struct scsi_cmnd *);
 	enum blk_eh_timer_return rtn = BLK_EH_NOT_HANDLED;
 
+	if (!scmd)
+		/*
+		 * nasty: command timed out before the mid layer
+		 * even prepared it
+		 */
+		return BLK_EH_RESET_TIMER;
+
 	scsi_log_completion(scmd, TIMEOUT_ERROR);
 
 	if (scmd->device->host->transportt->eh_timed_out)
Comment 2 Bernd Schubert 2008-11-13 12:12:16 UTC
Thanks going to test it now. 

While we are at this function, could you please check 

	if (eh_timed_out)
		rtn = eh_timed_out(scmd);
		switch (rtn) {
		case BLK_EH_NOT_HANDLED:
			break;
		default:
			return rtn;
		}


Is the indentation wrong or are there missing if-braces?


Thanks,
Bernd
Comment 3 Anonymous Emailer 2008-11-13 12:23:08 UTC
Reply-To: James.Bottomley@HansenPartnership.com

On Thu, 2008-11-13 at 12:12 -0800, bugme-daemon@bugzilla.kernel.org
wrote:
> While we are at this function, could you please check 
> 
>         if (eh_timed_out)
>                 rtn = eh_timed_out(scmd);
>                 switch (rtn) {
>                 case BLK_EH_NOT_HANDLED:
>                         break;
>                 default:
>                         return rtn;
>                 }
> 
> 
> Is the indentation wrong or are there missing if-braces?

It's not as intended, but harmless:

http://git.kernel.org/?p=linux/kernel/git/jejb/scsi-misc-2.6.git;a=commitdiff;h=6ec39f02cf48df89c3cbab4aeef521569fec00e4

James
Comment 4 Anonymous Emailer 2008-11-13 14:47:09 UTC
Reply-To: James.Bottomley@HansenPartnership.com

On Thu, 2008-11-13 at 13:03 -0600, James Bottomley wrote:
> Actually, I think the trace is slightly off.  I suspect this is the
> problem:
> 
>       struct scsi_cmnd *scmd = req->special;
> 
> I bet req->special is NULL because the command timed out even before it
> was prepared by the subsystem.
> 
> Does this fix it?
> 
> The fix is more of a bandaid than anything ... we can't really have
> commands timing out in the mid-layer because we expect we have full
> control of them.  With this patch, if we run out of resets, block will
> complete a command we're still processing.
> 
> James
> 
> ---
> 
> diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c
> index 94ed262..5612c42 100644
> --- a/drivers/scsi/scsi_error.c
> +++ b/drivers/scsi/scsi_error.c
> @@ -127,6 +127,13 @@ enum blk_eh_timer_return scsi_times_out(struct request
> *req)
>       enum blk_eh_timer_return (*eh_timed_out)(struct scsi_cmnd *);
>       enum blk_eh_timer_return rtn = BLK_EH_NOT_HANDLED;
>  
> +     if (!scmd)
> +             /*
> +              * nasty: command timed out before the mid layer
> +              * even prepared it
> +              */
> +             return BLK_EH_RESET_TIMER;
> +
>       scsi_log_completion(scmd, TIMEOUT_ERROR);
>  
>       if (scmd->device->host->transportt->eh_timed_out)

Mike Anderson pointed out that we have a potential window where the
timer can fire after we've unprepped the request in SCSI (so making
req->special NULL) but before we call blk_requeue_request() which stops
the timer.  We can rejig the locking to prevent this from happening, so
could you (separately) try this patch?

James

---

diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index f5d3b96..3475b74 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -649,8 +643,8 @@ static void scsi_requeue_command(struct request_queue *q, struct scsi_cmnd *cmd)
 	struct request *req = cmd->request;
 	unsigned long flags;
 
-	scsi_unprep_request(req);
 	spin_lock_irqsave(q->queue_lock, flags);
+	scsi_unprep_request(req);
 	blk_requeue_request(q, req);
 	spin_unlock_irqrestore(q->queue_lock, flags);
 
Comment 5 Rafael J. Wysocki 2008-11-16 09:50:26 UTC
Patch : http://bugzilla.kernel.org/show_bug.cgi?id=12020#c4
Handled-By : James Bottomley <James.Bottomley@HansenPartnership.com>
Comment 6 Alexander Y. Fomichev 2008-11-20 07:12:21 UTC
looks very similar

[  316.336654] BUG: unable to handle kernel NULL pointer dereference at 0000000000000000
[  316.339972] IP: [<ffffffff803f84d3>] scsi_times_out+0x10/0x72
[  316.339972] PGD 3e627067 PUD 3de0b067 PMD 0
[  316.339972] Oops: 0000 [#1] PREEMPT SMP
[  316.339972] last sysfs file: /sys/devices/virtual/block/md0/md/metadata_version
[  316.339972] Dumping ftrace buffer:
[  316.339972]    (ftrace buffer empty)
[  316.339972] CPU 1
[  316.339972] Modules linked in: floppy sg
[  316.339972] Pid: 0, comm: swapper Not tainted 2.6.28-rc5 #1
[  316.339972] RIP: 0010:[<ffffffff803f84d3>]  [<ffffffff803f84d3>] scsi_times_out+0x10/0x72
[  316.339972] RSP: 0018:ffff88003fb53e20  EFLAGS: 00010082
[  316.339972] RAX: ffff88003ef60000 RBX: 0000000000000000 RCX: ffff88003ef60308
[  316.339972] RDX: ffff88003ef60308 RSI: 0000000000006cb2 RDI: ffff880033dae5c0
[  316.339972] RBP: ffff88003fb53e30 R08: ffff880001019180 R09: 0000000000000010
[  316.339972] R10: 0000000000000000 R11: 0000000000000001 R12: ffff88003ef601c8
[  316.339972] R13: ffff88003ef60308 R14: 0000000000000102 R15: 0000000000000000
[  316.339972] FS:  0000000000000000(0000) GS:ffff88003fb23b00(0000) knlGS:0000000000000000
[  316.339972] CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
[  316.339972] CR2: 0000000000000000 CR3: 000000003e7e8000 CR4: 00000000000006e0
[  316.339972] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  316.339972] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[  316.339972] Process swapper (pid: 0, threadinfo ffff88003fb4e000, task ffff88003f863500)
[  316.339972] Stack:
[  316.339972]  ffff88000101f900 ffff880033dae5c0 ffff88003fb53e50 ffffffff803544ca
[  316.339972]  ffff88003ef60000 ffff880033dae5c0 ffff88003fb53ea0 ffffffff803545d8
[  316.339972]  ffff88003fb53ea0 ffff88003ef60000 0000000000000286 ffff88003ef60000
[  316.339972] Call Trace:
[  316.339972]  <IRQ> <0> [<ffffffff803544ca>] blk_rq_timed_out+0x16/0x5c
[  316.339972]  [<ffffffff803545d8>] blk_rq_timed_out_timer+0xc8/0x138
[  316.339972]  [<ffffffff80354510>] ? blk_rq_timed_out_timer+0x0/0x138
[  316.339972]  [<ffffffff8023f8b7>] run_timer_softirq+0x183/0x1ec
[  316.339972]  [<ffffffff80254c0c>] ? tick_dev_program_event+0x6c/0xa4
[  316.339972]  [<ffffffff8023b326>] __do_softirq+0x72/0x128
[  316.339972]  [<ffffffff8020c8cc>] call_softirq+0x1c/0x30
[  316.339972]  [<ffffffff8020df2d>] do_softirq+0x3d/0x78
[  316.339972]  [<ffffffff8023b249>] irq_exit+0x8f/0x98
[  316.339972]  [<ffffffff8021d8e4>] smp_apic_timer_interrupt+0x8a/0xd6
[  316.339972]  [<ffffffff8020c31b>] apic_timer_interrupt+0x6b/0x70
[  316.339972]  <EOI> <0> [<ffffffff80212f22>] ? mwait_idle+0x45/0x4a
[  316.339972]  [<ffffffff80209deb>] ? enter_idle+0x22/0x24
[  316.339972]  [<ffffffff8020a386>] ? cpu_idle+0x41/0x80
[  316.339972] Code: cb ff ff 85 c0 74 a0 45 31 e4 eb d2 45 31 e4 44 89 e0 5b 41 5c 41 5d 41 5e 5d c3 55 48 89 e5 53 48 83 ec 08 48 8b 9f e0 00 00 00 <48> 8b 03 48 8b 10 48 8b 82 b8 00 00 00 48 8b 80 60 01 00 00 48
[  316.339972] RIP  [<ffffffff803f84d3>] scsi_times_out+0x10/0x72
[  316.339972]  RSP <ffff88003fb53e20>
[  316.339972] CR2: 0000000000000000
[  316.339972] Kernel panic - not syncing: Fatal exception in interrupt

in my case easily be triggered by disk activity (i.g. rsync) 
on rebuilding raid

On Thu, 2008-11-13 at 13:03 -0600, James Bottomley wrote:
> Actually, I think the trace is slightly off.  I suspect this is the
> problem:
> 
>       struct scsi_cmnd *scmd = req->special;
> 
> I bet req->special is NULL because the command timed out even before it
> was prepared by the subsystem.
> 
> Does this fix it?

In my case it doesn't 'fix', but proof of concept.
With your patch [i just printk-ing the comment] 
system remains locked printk-ing from time to time: 
"nasty: command timed out before the mid layer 
even prepared it"

> The fix is more of a bandaid than anything ... we can't really have
> commands timing out in the mid-layer because we expect we have full
> control of them.  With this patch, if we run out of resets, block will
> complete a command we're still processing.

here is a dmesg:
http://sysadminday.org.ru/2.6.28-rc5-git3/scsi_times_out-NULL_pointer_dereference_dmesg
Comment 7 Anonymous Emailer 2008-11-20 11:36:42 UTC
Reply-To: andmike@linux.vnet.ibm.com

I have two systems that are hitting similar signatures in scsi_times_out.

Note: that my testing is using a distro kernel, but in this area the code
is very similar. I will work to get a reproduction on mainline.

..but..

I added some debug to scsi_times_out and noticed that the request with no
scmd set in req->special also did not have REQ_STARTED set.

I added a WARN_ON check to blk_add_timer for any requests 
that we where starting a timer for that did not have REQ_STARTED. This is
shown below. This does not look good as the elv_dequeue_request is being
called off elv_next_request for some cases.

Call Trace:
[c00000007b747580] [c00000000027808c] .blk_add_timer+0x74/0x134
(unreliable)
[c00000007b747610] [c00000000026f9b8] .elv_dequeue_request+0x78/0x8c
[c00000007b747680] [c000000000275830] .blk_do_ordered+0x8c/0x31c
[c00000007b747720] [c00000000026fc18] .elv_next_request+0x24c/0x2d4
[c00000007b7477c0] [d000000000368004] .scsi_request_fn+0xc8/0x628
[scsi_mod]
[c00000007b7478a0] [c00000000026fdf4] .elv_insert+0x154/0x38c
[c00000007b747940] [c000000000273ad0] .__make_request+0x4e4/0x568
[c00000007b7479f0] [c000000000271a68] .generic_make_request+0x3f4/0x468
[c00000007b747af0] [c000000000271bd8] .submit_bio+0xfc/0x124
[c00000007b747bb0] [c000000000160a00] .submit_bh+0x14c/0x198
[c00000007b747c40] [c0000000001630a0] .sync_dirty_buffer+0xbc/0x15c
[c00000007b747cd0] [c0000000001fcac0]
.journal_commit_transaction+0x1014/0x158c
[c00000007b747e10] [c00000000020111c] .kjournald+0x104/0x2f4
[c00000007b747f00] [c0000000000a909c] .kthread+0x78/0xc4
[c00000007b747f90] [c00000000002ae2c] .kernel_thread+0x4c/0x68

I changed the previous mentioned WARN_ON to just do a return if the request
does not have REQ_STARTED. This corrected the issue of seeing an oops in
scsi_times_out. But this is just a hack.

Hope this analysis is not flawed because of kernel deltas. It also may not
address this specific issue being seen in this bug, but does appear to
indicate a possible path to get a request on the timeout list with out a
req->special set.

I think we may need to look at some of the paths that are calling
blkdev_dequeue_request and understand how to prevent blk_add_timer from
being called if we are not really starting a SCSI cmd.

-andmike
--
Michael Anderson
andmike@linux.vnet.ibm.com
Comment 8 Bernd Schubert 2008-12-03 02:19:00 UTC
> Mike Anderson pointed out that we have a potential window where the
> timer can fire after we've unprepped the request in SCSI (so making
> req->special NULL) but before we call blk_requeue_request() which stops
> the timer.  We can rejig the locking to prevent this from happening, so
> could you (separately) try this patch?
> 
> James
> 

Hello James,

sorry for the huge delay. Unfortunately it turned out I was just 'lucky' to run into this bug the first few times. When I later on tried to reproduce this specific issue, I tried 20 times and couldn't, even without any patch :( So testing especially the 2nd patch turnes out to be a bit difficult (can't be verified by printk). 
I now also mostly only have remotely access to the test hardware and can't reset this system remotely, so testing has become a bit difficult :(
Tomorrow will be in the lab again and try to test again.

Thanks for your help and patience,
Bernd

Note You need to log in before you can comment on or make changes to this bug.