Created attachment 23222 [details]
oops captured from 2.6.32-rc2-00141-g817b33d
If noticed the problem first 2.6.31-04082-g1824090.
Symptoms: Usually when logging into an X session (kde4), but sometimes already when kdm starts up, the X hangs (login via serial console seems to sometimes work). Unfortunately the output on the serial console is completely mangled and incomplete, but it contains at least a few oopsen.
Hardware: 4 core dual socket Opteron with 4GB ram, 64bit kernel&userspace, SATA-HD's on sata_sil:
01:0b.0 Mass storage controller : Silicon Image, Inc. SiI 3114 [SATALink/SATARaid] Serial ATA Controller [1095:3114] (rev 02)
Bisecting was a pain, because I've run into an unrelated (and already fixed) breakage in the tg3 driver, argh. With some wild guessing I've found a recent enough good kernel: 2.6.31-03706-ga9bbd21.
Bisecting then turned up v2.6.31-18-ge7e503a as the first bad commit. I've reverted this on top of latest -linus (2.6.32-rc2-00141-g817b33d), but that didn't fix the problem. But at least I could capture the first non-mangled oops (see the attachment). I'm not really sure wheter this is the same problem, but it compares favourable to the mess I've got beforehand, and after this oops followed immediately some more garbled oopses like before.
I'll also attach the full dmesg of a working kernel shortly.
Created attachment 23223 [details]
full dmesg from 2.6.31-03706-ga9bbd21
The oops you are quoting is from ext4, and that is an unrelated problem. Ted
posted a fix for that on lkml today, but I can't seem to find it now.
I'll take a look at the bitops bit!
On Thu, Oct 01, 2009 at 05:59:07PM +0000, email@example.com wrote:
> --- Comment #2 from Jens Axboe <firstname.lastname@example.org> 2009-10-01 17:59:06 ---
> The oops you are quoting is from ext4, and that is an unrelated problem. Ted
> posted a fix for that on lkml today, but I can't seem to find it now.
Thanks for the hint. I've applied Ted's latest queue (I've found a merge
request on lkml):
and applied it ontop of the revert. Works without any fuss. So I'd say
bisecting pointed at the right commit.
> I'll take a look at the bitops bit!
Can you post an oops from 2.6.32-rc2 with Teds fix?
On Fri, Oct 02, 2009 at 07:44:53AM +0000, email@example.com wrote:
> --- Comment #4 from Jens Axboe <firstname.lastname@example.org> 2009-10-02 07:44:52 ---
> Can you post an oops from 2.6.32-rc2 with Teds fix?
Done. Looks like the problem is gone. I'll close this report when I'll
have retested in a few days, to rule out any accidentaly effect.
I audited the bitops, and it looks fine in the current kernel. So a guess would be that you initially was bitten by that problem, and then later (when that was fixed), the ext4 issue was the one that hit you.
I'll close this one, reopen if you see anything suspicious.