Bug 38842

Summary: panic in elv_completed_request on safe remove usb hard drive
Product: IO/Storage Reporter: leniviy (basinilya)
Component: OtherAssignee: io_other
Status: RESOLVED CODE_FIX    
Severity: high CC: alan, cereslei, florian, herton, jrnieder, maciej.rutecki, rjw, stern
Priority: P1    
Hardware: x86-64   
OS: Linux   
Kernel Version: 2.6.39.2 Subsystem:
Regression: Yes Bisected commit-id:
Bug Depends on:    
Bug Blocks: 32012    
Attachments: dmesg output

Description leniviy 2011-07-06 09:20:03 UTC
Created attachment 64772 [details]
dmesg output

on laptop Acer Aspire 3820, when I choose "safe remove" drive in nautilus, this leads to
"Buffer I/O error" message in 50% cases and kernel panic in 30% cases.

Archlinux x86_64

I was able to reproduce it on virtualbox running on the same laptop:
1cpu, archlinux x86_64 and 512mb flash drive. No guarantee, but you can download the exported VM and flash drive dd image.
http://www.sendspace.com/file/ryvxf6
http://www.sendspace.com/file/6as3so

console redirected to file in /tmp
login as root, no password
start Gnome with ~/runme.sh
start nautilus
plug flash drive
browse flash drive contents a little and immediately click "safe remove"



also see attached log file.

[ 900.820150] EXT3-fs (sdb2): mounted filesystem with ordered data mode
here I click "remove"
[ 908.439995] BUG: unable to handle kernel
NULL pointer dereference
at 0000000000000048
[ 908.440066] IP:
[<ffffffff811f75bf>] elv_completed_request+0x3f/0x50
[ 908.440119] PGD 0

[ 908.440139] Oops: 0000 [#1]
PREEMPT
SMP
[ 908.440177] last sysfs file: /sys/devices/pci0000:00/0000:00:1d.0/usb2/2-1/2-1.2/power/level
[ 908.440231] CPU 1

[ 908.440246] Modules linked in:
cryptd
[ 908.440273] quiet_error: 53 callbacks suppressed
[ 908.440276] Buffer I/O error on device sdb3, logical block 0
...
[ 908.532372] Kernel panic - not syncing: Fatal exception in interrupt
[ 908.534426] Pid: 10, comm: ksoftirqd/1 Tainted: G D 2.6.39-ARCH #1
[ 908.536520] Call Trace:
[ 908.538533] [<ffffffff813dee0e>] panic+0xa0/0x1ad
...
[ 908.558075] [<ffffffff813e8985>] page_fault+0x25/0x30
[ 908.559736] [<ffffffff811f75bf>] ? elv_completed_request+0x3f/0x50
[ 908.561393] [<ffffffff811fd65c>] __blk_put_request+0x3c/0xe0
Comment 1 leniviy 2011-07-06 09:22:09 UTC
downgraded to 2.6.38.8 , works stable
Comment 2 Florian Mickler 2011-07-08 14:19:40 UTC
This could already be fixed in 3.0-rc6 ... can you test that?
Comment 3 Jonathan Nieder 2011-07-11 02:40:04 UTC
At <http://bugs.debian.org/631187>, Alexander Kurtz writes:

> I just tested 2.6.39-3 from sid and 3.0.0~rc6-1~experimental.1 from
> experimental. Unfortunately both reliably panic when safely removing my
> external hard drive. 2.6.38-5 (still) works fine. Seems like it's time
> for me to do a git bisect, or do you any other ideas?

So alas, this doesn't seem to be fixed yet. Based on an oops there, it seems e->ops is NULL (thanks to bwh for analyzing it).
Comment 4 Alan Stern 2011-07-11 14:57:48 UTC
There is a proposed patch here:

   http://marc.info/?l=linux-scsi&m=131007155700831&w=2

I don't know if it has gotten into Linus's tree yet.  Also, this patch may not be a complete fix.
Comment 5 Herton Ronaldo Krzesinski 2011-07-12 13:23:52 UTC
I got a similar bug reported here:
https://bugs.launchpad.net/bugs/793796

Indeed it's because of elevator being deallocated (elevator_exit, e->ops = NULL) and later e->ops trying to be used again.

I did 2 patches checking for queue being dead (0004 and 0005 at http://people.canonical.com/~herton/lp793796/r5/), reporter confirmed that fixed his problems with them, but indeed it seems other elevator calls could access e->ops after elevator_exit, the patch from James seems more complete.

I'll try to get reporter test with only his patch.
Comment 6 Herton Ronaldo Krzesinski 2011-07-15 15:48:17 UTC
I got the reporter to test the patch from http://marc.info/?l=linux-scsi&m=131007155700831&w=2

The results are that the patch doesn't work for him, so something more is needed, he still gets the panic on elv_completed_request: https://launchpadlibrarian.net/75257538/r6panic.jpg

(gdb) l *(elv_completed_request+0x3f)
0xffffffff812bd22f is in elv_completed_request (/home/herton/clean/ubuntu-natty/block/elevator.c:812).
807              * request is released from the driver, io must be done
808              */
809             if (blk_account_rq(rq)) {
810                     q->in_flight[rq_is_sync(rq)]--;
811                     if ((rq->cmd_flags & REQ_SORTED) &&
812                         e->ops->elevator_completed_req_fn)
813                             e->ops->elevator_completed_req_fn(q, rq);
814             }
815     }