Bug 9817
Summary: | oops in jbd2_journal_commit_transaction after journal abort | ||
---|---|---|---|
Product: | File System | Reporter: | Eric Sandeen (sandeen) |
Component: | ext4 | Assignee: | fs_ext4 (fs_ext4) |
Status: | REJECTED UNREPRODUCIBLE | ||
Severity: | normal | CC: | jarod, protasnb, stefanr |
Priority: | P1 | ||
Hardware: | All | ||
OS: | Linux | ||
Kernel Version: | 2.6.24-0.174.rc8.git7 | Subsystem: | |
Regression: | --- | Bisected commit-id: | |
Attachments: |
more dmesg
local copy |
Description
Eric Sandeen
2008-01-25 14:22:09 UTC
Created attachment 14579 [details]
more dmesg
more dmesg
Created attachment 14580 [details]
local copy
Can't recall at the moment whether or not the kernel had Stefan's latest sbp2 login/reconnect fixes or not. My thought is not. Should be reasonably easy to reproduce the situation sometime next week though. Oops. By "latest", I meant the latest ones already in linux1394-git as of right now, not the ones just posted to lkml earlier today. However, those latest ones look like they could be relevant to the firewire side of this issue too. From the attachment: > firewire_core: created new fw device fw0 (0 config rom retries, S400) > firewire_core: created new fw device fw1 (0 config rom retries, S100) > firewire_core: phy config: card 0, new root=ffc1, gap_count=5 > firewire_core: created new fw device fw2 (0 config rom retries, S400) > firewire_core: created new fw device fw3 (0 config rom retries, S400) > firewire_core: phy config: card 1, new root=ffc1, gap_count=5 > firewire_core: created new fw device fw4 (0 config rom retries, S800) ... > firewire_core: created new fw device fw5 (0 config rom retries, S800) ... > firewire_core: created new fw device fw6 (0 config rom retries, S800) How many FireWire devices have actually been on the bus? And is it true that one of them was a 100 Mbit/s device? Jarod and I did some improvements regarding firewire-sbp2's login and reconnection behavior lately. However, there is still something to do in the area of reconnection while IO is going on (bug 9734). That said, it's always good if a filesystem doesn't oops if a connection suddenly goes away. So, if you filesystem guys are interested in stabilizing this, just take a USB disk, do some filesystem I/O, and unplug during the I/O... Agreed; we shouldn't have oopsed. Still working out where/when/why we did. BTW, I just filed bug 9828 for the "WARNING: at fs/sysfs/dir.c:424 sysfs_add_one()" in the attachment. This warning is unrelated to this bug here. (In reply to comment #5) > From the attachment: > > firewire_core: created new fw device fw0 (0 config rom retries, S400) > > firewire_core: created new fw device fw1 (0 config rom retries, S100) > > firewire_core: phy config: card 0, new root=ffc1, gap_count=5 > > firewire_core: created new fw device fw2 (0 config rom retries, S400) > > firewire_core: created new fw device fw3 (0 config rom retries, S400) > > firewire_core: phy config: card 1, new root=ffc1, gap_count=5 > > firewire_core: created new fw device fw4 (0 config rom retries, S800) > ... > > firewire_core: created new fw device fw5 (0 config rom retries, S800) > ... > > firewire_core: created new fw device fw6 (0 config rom retries, S800) > > How many FireWire devices have actually been on the bus? And is it true that > one of them was a 100 Mbit/s device? This is one of my test boxes, with 3 firewire controllers in it. The S100 device (fw1) is a dv camera hooked to the first card (fw0), the second controller (fw2) has a webcam (fw3) hooked to it. fw4 is a FW800 card, with two FW800 drives (fw5 and fw6) hooked up to it. > Jarod and I did some improvements regarding firewire-sbp2's login and > reconnection behavior lately. However, there is still something to do in the > area of reconnection while IO is going on (bug 9734). I plan to dig into this one more next week and see if I can reproduce with the latest and greatest linux1394-git tree and with some of the pending/under review patches as well. A few "error status: 0:9", then "failed to reconnect", then "logged in" should happen less often with patch "try to increase reconnect_hold". The target refused reconnect but accepted re-login because fw-sbp2 was already too late with its first reconnect attempt. PS: As noted in the changelog of the patch, the patch is ineffective with Initio and Prolific bridges which don't give us extended reconnect_hold. Any updates on this problem, testing with latest kernel? Thanks. This bug report so old, and I've seen multiple cases where we don't oops after the journal is aborted, so I'm going to close this bug. If you can reproduce this with a newer kernel, please open a new bug. Thanks!! |