Bug 8735
Summary: | BUG: scheduling while atomic | ||
---|---|---|---|
Product: | Drivers | Reporter: | Gregor Jasny (gjasny) |
Component: | IEEE1394 | Assignee: | Stefan Richter (stefanr) |
Status: | CLOSED CODE_FIX | ||
Severity: | normal | ||
Priority: | P1 | ||
Hardware: | All | ||
OS: | Linux | ||
Kernel Version: | Todays linux-2.6 git pull (2.6.22+) | Subsystem: | |
Regression: | --- | Bisected commit-id: | |
Attachments: | [PATCH] firewire: fw-ohci: fix "scheduling while atomic" |
Description
Gregor Jasny
2007-07-11 05:20:17 UTC
Created attachment 12017 [details]
[PATCH] firewire: fw-ohci: fix "scheduling while atomic"
Does this work for you?
Also, is it a PPC Mini or Intel Mini? (I have an Intel Mini but didn't encounter the bug yet.)
Hi Stefan, It's an early Core Duo Mini (T2300@1.66GHz). I've booted several times without any BUG/OOPS. Can you please push the patch into the stable series? During operation my sytem froze for about 5 seconds and logged the following lines to syslog: Jul 11 15:31:11 Mini kernel: firewire_sbp2: status write for unknown orb Jul 11 15:31:41 Mini kernel: firewire_sbp2: sbp2_scsi_abort I've never observed this behaviour with the old stack. Is there anything I can do to debug this? Thanks, Gregor I'm going to send the patch to Linus soon, then to -stable. ---- I've got a 1.66 GHz Core Duo Mini too, but from after the product revision when they discontinued the Core Solo type and added the 1.83 GHz type. From lspci: 00:00.0 Host bridge: Intel Corporation Mobile 945GM/PM/GMS, 943/940GML and 945GT Express Memory Controller Hub (rev 03) ... 03:03.0 FireWire (IEEE 1394): Agere Systems FW323 (rev 61) I guess it looks the same on your Mini. But I replaced the Core Duo by a Core 2 Duo and am running an x86-64 build on it. It's not my primary machine, hence not extensively tested. ---- About the "status write for unknown orb" ...pause... "sbp2_scsi_abort": This is known to happen sometimes but the reason is unclear. Perhaps there was a bus reset, which may happen sporadically if the bus is electrically unstable. There might be hidden bugs in bus reset handling. Was there a line like "phy config: card 2, new root=ffc1, gap_count=5" near the "...unknown orb" message? This message is only logged for some bus resets though, not for all. Alas I don't get those "...unkown orb" messages with my own setups, or at least didn't notice it so far. If I get similar errors at all, then early during device discovery and usually ending in total inability to start using the disk. The "sbp2_scsi_abort" later is caused by the Linux SCSI stack calling into sbp2 to abort the command after timeout. It's because sbp2 was previously unable to finish the command properly because it didn't recognize the status write from the device for what it was. The fact that your system froze is because you have the root filesystem on that disk, and there is no IO going on until the timeout and sbp2_scsi_abort. This may take a while to debug, especially as I'm currently busy with projects entirely unrelated to Linux. I don't know how if Kristian Høgsberg is available at the moment. Please open a new bug for this and point to your comment #2 here as description. But also make a note what brand/model of SBP-2 enclosure this is, and if possible which SBP-2 bridge chip is in there. Fix was committed to Linus' tree and sent to the -stable team. On "status write for unknown orb": I saw this once during heavy traffic on an otherwise perfectly working setup, but without subsequent I/O error. I.e. I'm able to reproduce it at least partially on rare occasions. |