Bug 5921
Summary: | AIC7xxx: SCSI bus crash when formating CDRW | ||
---|---|---|---|
Product: | SCSI Drivers | Reporter: | Emmanuel Fust (emmanuel.fuste) |
Component: | Other | Assignee: | linux-scsi (linux-scsi) |
Status: | CLOSED OBSOLETE | ||
Severity: | high | CC: | alan, hare, protasnb |
Priority: | P2 | ||
Hardware: | i386 | ||
OS: | Linux | ||
Kernel Version: | 2.6.25 | Subsystem: | |
Regression: | No | Bisected commit-id: | |
Attachments: |
Kernel Log
New kernel log Patch 0 Patch 1/5 Patch 2/5 Patch 3/5 Patch 4/5 Patch 5/5 Patch 6/5 crash log march 30 2008 |
Description
Emmanuel Fust
2006-01-19 01:33:32 UTC
Created attachment 7070 [details]
Kernel Log
2.6.15 with James Bottomley "aic7xxx timer handling bug" fix. It's better !! The kernel no longuer panic, but the scsi bus never recover. First, when inserting/ejecting a non-initialised CDRW, I've got this: eject: Jan 12 21:14:26 rafale kernel: sr 0:0:3:0: Attempting to queue an ABORT message Jan 12 21:14:26 rafale kernel: CDB: 0x1b 0x0 0x0 0x0 0x2 0x0 Jan 12 21:14:26 rafale kernel: sr 0:0:3:0: Command not found Jan 12 21:14:26 rafale kernel: aic7xxx_abort returns 0x2002 Jan 12 21:15:17 rafale kernel: Device not ready. Make sure there is a disc in the drive. Insert/eject Jan 12 21:16:31 rafale kernel: sr 0:0:3:0: Attempting to queue an ABORT message Jan 12 21:16:31 rafale kernel: CDB: 0x1b 0x0 0x0 0x0 0x2 0x0 Jan 12 21:16:31 rafale kernel: sr 0:0:3:0: Command not found Jan 12 21:16:31 rafale kernel: aic7xxx_abort returns 0x2002 Insert Jan 12 21:17:08 rafale kernel: cdrom: This disc doesn't have any tracks I recognize! Jan 12 21:17:18 rafale kernel: cdrom: This disc doesn't have any tracks I recognize! Next, I tried cdrwtool -d /dev/sr0 -q followed by CTRL-C to give a chance to the kernel to recover. I've got tons of errors, but get the shell back. SCSI bus never really recover, but I've got some syslog trace. It is big so here as a gziped attachment (with the complete boot sequence if it could help). Console trace finish with messages like: sd 0:0:0:0: SCSI error : return code = 0x6000000 end_request: I/O error, dev sda2, logical block 973373 lost page wite due to I/O error on sda2 ... And it not in the syslog trace indeed ... Hope this help. If no one have idea about how to fix this, I will try the 2.6.13 driver (the one before the sync with adaptec if I am not mistaken, the one which initialise 80% of the cd before crashing) with only the timer handling bug applied. One more thing, the eject button of the drive no longuer work with 2.6.15 (but was with 2.6.14). Bug# 5659 is perhaps the same. There is many repports of pblms with other scsi cards and cd/dvd scsi readers/writers since 2.6.14/2.6.15 but not with 2.6.13. I think I hit two bugs: the timer one, specific of aic7xxx and a more generic one present since 2.6.14. Created attachment 7209 [details]
New kernel log
2.6.15 + James Bottomley "aic7xxx timer handling
bug" fix + "Turn off ordered flush barriers for SCSI driver" + "semaphore to
completion conversion" from Christoph Hellwig.
Better: I was able to reboot the computer (slooowwwly), but the cdwriter nether
start to initialise the CDRW and the card nether conpletely recover.
I don't know if I hit the same bug, but with kernel 2.6.15, my system freezes on any CDROM-access. No info in any logfile, can't SSH into the machine, all I can do is turn off the computer. This is on x86-64 (Opteron, non-SMP). CDROM-access worked perfectly up to 2.6.13, never tested 2.6.14. I use a Plextor SCSI CDROM and a Pioneer IDE DVD-recorder (with Acard IDE->SCSI bridge), both drives, as well as a few SCSI HDDs, connected to an Adaptec 19160 controller. Bug seems to be present in 2.6.14, 2.6.15, 2.6.16, 2.6.16.1 Tested with gentoo-sources - http://dev.gentoo.org/~dsd/genpatches/ Ok, new patches, no more freeze but doesn't solve my CDRW problem. Patch 0 - from git tree Patch 1-5 port of latest aic79xx to aic7xx Patch 6 correct a bug introduce by patch 1 Created attachment 7703 [details]
Patch 0
Created attachment 7704 [details]
Patch 1/5
Created attachment 7705 [details]
Patch 2/5
Created attachment 7706 [details]
Patch 3/5
Created attachment 7707 [details]
Patch 4/5
Created attachment 7708 [details]
Patch 5/5
Created attachment 7709 [details]
Patch 6/5
Any updates on the bug, is it still present in 2.6.22+? Thanks. As of 2.6.21 the bug is still there. Will try and report results soon with 2.6.22+. Ping... how is it working with present kernel? (In reply to comment #17) > Ping... how is it working with present kernel? > Will re-power the affected computer and will try with a 2.6.25-rc5 or newer kernel. Expect a report in the next 10 days. Created attachment 15515 [details]
crash log march 30 2008
I was able to restart cleanly the computer whith the magic Ctrl+Alt+del combo keys, but it took ~1hour.
sr0 is on the narrow bus.
The recovery path still impact non involved devices like sda witch is on the wide bus and wich consequently never fully recover too.
There is two questions (two bugs too ?)
- why formating a cdrw crash the HBA
- why the recovery code never recover and take sr0 AND sda in a never ending spiral.
Complete log attached.
Latest tested kernel is 2.6.25 Will retry with 2.6.27. |