Bug 5998 - oops on mount "kernel access of bad area, sig: 11 [#1]"
Summary: oops on mount "kernel access of bad area, sig: 11 [#1]"
Status: CLOSED CODE_FIX
Alias: None
Product: Drivers
Classification: Unclassified
Component: IEEE1394 (show other bugs)
Hardware: i386 Linux
: P2 normal
Assignee: Stefan Richter
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2006-02-02 15:03 UTC by john stultz
Modified: 2006-06-26 04:41 UTC (History)
0 users

See Also:
Kernel Version: 2.6.15-rc5
Subsystem:
Regression: ---
Bisected commit-id:


Attachments

Description john stultz 2006-02-02 15:03:42 UTC
Most recent kernel where this bug did not occur: unknown
Distribution: Ubuntu
Hardware Environment: ppc32 Apple Mac mini
Problem Description:

After over 10 days of uptime, mounting and unmounting my external firewire
hardddrive for backups, I got the following OOPs today when trying to mount the
drive. I've seen problems where the cable gets bumped loose and I'll see
something similar, however I have not been able to verify if the cable was
secure when this occured.  The mount command is still hung, but the box seems to
be running fine.

ieee1394: sbp2: aborting sbp2 command
sd 0:0:0:0: 
        command: cdb[0]=0x28: 28 00 09 27 c0 03 00 00 02 00
ieee1394: sbp2: aborting sbp2 command
sd 0:0:0:0: 
        command: cdb[0]=0x0: 00 00 00 00 00 00
Oops: kernel access of bad area, sig: 11 [#1]
PREEMPT 
NIP: C0023FF8 LR: C0023FF8 SP: EFC0FEB0 REGS: efc0fe00 TRAP: 0300    Not tainted
MSR: 00001032 EE: 0 PR: 0 FP: 0 ME: 1 IR/DR: 11
DAR: 00000000, DSISR: 40000000
TASK = c134b230[317] 'scsi_eh_0' THREAD: efc0e000
Last syscall: -1 
GPR00: 00000000 EFC0FEB0 C134B230 00000001 C05BDE28 FFFFFFFF C0650000 C0664E84 
GPR08: 00040000 00000001 C13DF400 EFC0E000 C0650000 00000000 00000000 00000000 
GPR16: 00000000 C02EF5A0 C06070EC C0586F7C C06070EC C0586F7C C06070EC C0650000 
GPR24: 00000003 C13EE204 C13EE268 00000001 00009032 00000000 C13EE1C0 C02EF440 
NIP [c0023ff8] complete+0x28/0x90
LR [c0023ff8] complete+0x28/0x90
Call trace:
 [c02ef45c] scsi_eh_done+0x1c/0x30
 [c03248b8] sbp2scsi_abort+0x158/0x170
 [c02efbcc] scsi_send_eh_cmnd+0x10c/0x1a0
 [c02efce8] scsi_eh_tur+0x88/0xe0
 [c02f0930] scsi_error_handler+0x450/0xa10
 [c00437e8] kthread+0x108/0x110
 [c0007534] kernel_thread+0x44/0x60
note: scsi_eh_0[317] exited with preempt_count 1


Steps to reproduce: Not easily reproduced.
Comment 1 Stefan Richter 2006-02-21 11:30:31 UTC
My working hypothesis are potential race conditions in sbp2 because sbp2 handles
lots of the protocol without waiting for 1394 transactions to be complete. (It
does so because most of it runs in hardware interrupt or soft IRQ context.) I am
planning to rework this but don't expect fast-paced progress...
Comment 2 Stefan Richter 2006-06-26 04:41:47 UTC
Early in April, there went a patch into 2.6.17-rcX and 2.6.16.2 which cleaned up
the locking regime in sbp2scsi_abort. This _may_ have fixed it.
http://marc.theaimsgroup.com/?l=linux-kernel&m=114391907428954

Should you see the problem ever again, please reopen this bug or notify me if
you have insufficient permissions to reopen it.

Note You need to log in before you can comment on or make changes to this bug.