Bug 14290 - Multiple oopses in the block layer when logging in with X
Summary: Multiple oopses in the block layer when logging in with X
Status: RESOLVED CODE_FIX
Alias: None
Product: IO/Storage
Classification: Unclassified
Component: Block Layer (show other bugs)
Hardware: All Linux
: P1 normal
Assignee: Jens Axboe
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2009-10-01 17:51 UTC by Daniel Vetter
Modified: 2009-10-02 14:13 UTC (History)
0 users

See Also:
Kernel Version: 2.6.32-rc1
Subsystem:
Regression: Yes
Bisected commit-id:


Attachments
oops captured from 2.6.32-rc2-00141-g817b33d (3.97 KB, text/plain)
2009-10-01 17:51 UTC, Daniel Vetter
Details
full dmesg from 2.6.31-03706-ga9bbd21 (46.96 KB, text/plain)
2009-10-01 17:55 UTC, Daniel Vetter
Details

Description Daniel Vetter 2009-10-01 17:51:53 UTC
Created attachment 23222 [details]
oops captured from 2.6.32-rc2-00141-g817b33d

If noticed the problem first 2.6.31-04082-g1824090.

Symptoms: Usually when logging into an X session (kde4), but sometimes already when kdm starts up, the X hangs (login via serial console seems to sometimes work). Unfortunately the output on the serial console is completely mangled and incomplete, but it contains at least a few oopsen.

Hardware: 4 core dual socket Opteron with 4GB ram, 64bit kernel&userspace, SATA-HD's on sata_sil:
01:0b.0 Mass storage controller [0180]: Silicon Image, Inc. SiI 3114 [SATALink/SATARaid] Serial ATA Controller [1095:3114] (rev 02)

Storage-layering:
- ext4->lvm->dm-crypt->raid1->sata_sil
- ext4->lvm->dm_crypt->sata_sil
- ext3->sata_sil

Bisecting was a pain, because I've run into an unrelated (and already fixed) breakage in the tg3 driver, argh. With some wild guessing I've found a recent enough good kernel: 2.6.31-03706-ga9bbd21.

Bisecting then turned up v2.6.31-18-ge7e503a as the first bad commit. I've reverted this on top of latest -linus (2.6.32-rc2-00141-g817b33d), but that didn't fix the problem. But at least I could capture the first non-mangled oops (see the attachment). I'm not really sure wheter this is the same problem, but it compares favourable to the mess I've got beforehand, and after this oops followed immediately some more garbled oopses like before.

I'll also attach the full dmesg of a working kernel shortly.

-Daniel
Comment 1 Daniel Vetter 2009-10-01 17:55:54 UTC
Created attachment 23223 [details]
full dmesg from 2.6.31-03706-ga9bbd21
Comment 2 Jens Axboe 2009-10-01 17:59:06 UTC
The oops you are quoting is from ext4, and that is an unrelated problem. Ted
posted a fix for that on lkml today, but I can't seem to find it now.

I'll take a look at the bitops bit!
Comment 3 Daniel Vetter 2009-10-01 20:00:35 UTC
On Thu, Oct 01, 2009 at 05:59:07PM +0000, bugzilla-daemon@bugzilla.kernel.org wrote:
> --- Comment #2 from Jens Axboe <axboe@kernel.dk>  2009-10-01 17:59:06 ---
> The oops you are quoting is from ext4, and that is an unrelated problem. Ted
> posted a fix for that on lkml today, but I can't seem to find it now.

Thanks for the hint. I've applied Ted's latest queue (I've found a merge
request on lkml):

git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4.git for_linus

and applied it ontop of the revert. Works without any fuss. So I'd say
bisecting pointed at the right commit.

> I'll take a look at the bitops bit!
Thanks, Daniel
Comment 4 Jens Axboe 2009-10-02 07:44:52 UTC
Can you post an oops from 2.6.32-rc2 with Teds fix?
Comment 5 Daniel Vetter 2009-10-02 11:01:35 UTC
On Fri, Oct 02, 2009 at 07:44:53AM +0000, bugzilla-daemon@bugzilla.kernel.org wrote:
> --- Comment #4 from Jens Axboe <axboe@kernel.dk>  2009-10-02 07:44:52 ---
> Can you post an oops from 2.6.32-rc2 with Teds fix?
Done. Looks like the problem is gone. I'll close this report when I'll
have retested in a few days, to rule out any accidentaly effect.

-Daniel
Comment 6 Jens Axboe 2009-10-02 14:13:12 UTC
I audited the bitops, and it looks fine in the current kernel. So a guess would be that you initially was bitten by that problem, and then later (when that was fixed), the ext4 issue was the one that hit you.
Comment 7 Jens Axboe 2009-10-02 14:13:53 UTC
I'll close this one, reopen if you see anything suspicious.

Note You need to log in before you can comment on or make changes to this bug.