Bug 14436
Summary: | Computer becomes unusable without any apparent reason | ||
---|---|---|---|
Product: | Memory Management | Reporter: | Pitxyoki (Pitxyoki) |
Component: | Other | Assignee: | Andrew Morton (akpm) |
Status: | CLOSED CODE_FIX | ||
Severity: | high | CC: | rjw |
Priority: | P1 | ||
Hardware: | All | ||
OS: | Linux | ||
Kernel Version: | 2.6.32-rc4 | Subsystem: | |
Regression: | Yes | Bisected commit-id: | |
Bug Depends on: | |||
Bug Blocks: | 14230 | ||
Attachments: |
System logs for both occurrences of the bug.
Fourth occurrence Fifth occurrence |
Description
Pitxyoki
2009-10-18 18:32:04 UTC
Created attachment 23461 [details]
System logs for both occurrences of the bug.
This happened for the third time just now. /var/log/syslog has absolutely nothing about it this time. Once more. This time with 2.6.32-rc5. Attaching corresponding syslog. Created attachment 23572 [details]
Fourth occurrence
This happened once more yesterday. This time I tried shutting it down using ssh. I could login and send the command, but the shutdown sequence didn't finish. I had to press the power button to turn it off. I'm starting to fear for my filesystem's consistency. For the first time ever I have two files on an ext3 FS' lost+found. fsck reported more errors than I would consider acceptable if these hangups wouldn't be happening. Aside from that, do you want me to continue submitting syslogs or is this enough? I'm considering going back to an older kernel. Created attachment 23590 [details]
Fifth occurrence
(switched to email. Please respond via emailed reply-to-all, not via the bugzilla web interface). On Sun, 18 Oct 2009 18:32:05 GMT bugzilla-daemon@bugzilla.kernel.org wrote: > http://bugzilla.kernel.org/show_bug.cgi?id=14436 > > Summary: Computer becomes unusable without any apparent reason > Product: Memory Management > Version: 2.5 > Kernel Version: 2.6.32-rc4 > Platform: All > OS/Version: Linux > Tree: Mainline > Status: NEW > Severity: high > Priority: P1 > Component: Other > AssignedTo: akpm@linux-foundation.org > ReportedBy: Pitxyoki@gmail.com > Regression: No > > > Hi, > This happened to me two times today. > The first time, I wasn't even in front of the computer: I heard a system beep > and when I looked at it, the computer was totally irresponsive. I couldn't > input anything on the screen, the Num, Caps or Scroll Lock keys wouldn't do > any > effect on the keyboard lights, and the cursor wouldn't move. After a > cold-reboot I ran memcheck86+ and fsck on all drives, but no errors appeared. > > The second time, I had just clicked on an URL on icedove (= Mozilla > Thunderbird) to a (trusted) PDF file sent by a friend. When the file was > opening, the system beep started sounding uninterruptedly and no input could > be > sent to the computer. After a cold-reboot, still no errors found. > > I'm attaching the syslog for both occurrences. > Reproducible oops in tty_devnum(): http://bugzilla.kernel.org/attachment.cgi?id=23572 I think it would be safe to assume that this is a regression. Pitxyoki, was 2.6.31 OK? Thanks. > Reproducible oops in tty_devnum():
> http://bugzilla.kernel.org/attachment.cgi?id=23572
>
> I think it would be safe to assume that this is a regression.
Looks to me like a memory scribble or freeing up of stuff under the
kernel. The oopses are coming from the fact the task struct now contains
ascii.
Turn on slab poison and all the memory debug and try and repeat it. Grabs
the oops and after that if you are using 4K stacks switch to 8K stacks and
repeat the attempt
On Tue, Nov 3, 2009 at 10:16 AM, Alan Cox <alan@lxorguk.ukuu.org.uk> wrote: > > Reproducible oops in tty_devnum(): > > http://bugzilla.kernel.org/attachment.cgi?id=23572 > > > > I think it would be safe to assume that this is a regression. > > Thank you for paying attention to this. I can't tell if 2.6.31 was OK, but I really never had these problems with it. Even with 2.6.32 everything seems to be OK. Sometimes everything is fine for some days, other times it crashes multiple times a day. > Looks to me like a memory scribble or freeing up of stuff under the > kernel. The oopses are coming from the fact the task struct now contains > ascii. > > Turn on slab poison and all the memory debug and try and repeat it. Grabs > the oops and after that if you are using 4K stacks switch to 8K stacks and > repeat the attempt > I'm sorry, but I'm not sure I know how to do this. Are these options on the .config file? If not, can you please instruct me more clearly on how to do this? Regards, Luís Picciochi > I'm sorry, but I'm not sure I know how to do this. Are these options on the
> .config file? If not, can you please instruct me more clearly on how to do
> this?
They are .config options
I would enable
DEBUG_KERNEL
DEBUG_PAGEALLOC
PAGE_POISONING
DEBUG_STACKOVERFLOW
DEBUG_STACK_USAGE
DEBUG_OBJECTS
DEBUG_OBJECTS_FREE
DEBUG_SLAB or DEBUG_SLUB or SQLB_DEBUG
and the stack size is configured with
4KSTACKS
I'm suspecting more and more that this bug might be related with bug #12794. Please see my last attachment on that bug. After I recompiled the kernel with the options you asked me I haven't seen any messages on syslog that seemed related with the ones I reported here. When my computer hang again I have been seeing messages like thoe ones I reported on bug #12794, related with the rndis_wlan driver. On the logs I reported to this bug you can see other programs associated with the oops, and not rndis_wlan... But you can see "BUG: unable to handle kernel paging request at xxx" on the rndis_wlan-related logs. I can't be sure if these are the same bug, if they are related or if they are completely separate issues, but maybe you would know it better than me. Thanks and regards, Luís Picciochi On Tue, Nov 3, 2009 at 11:17 AM, Alan Cox <alan@lxorguk.ukuu.org.uk> wrote: > > I'm sorry, but I'm not sure I know how to do this. Are these options on > the > > .config file? If not, can you please instruct me more clearly on how to > do > > this? > > They are .config options > > I would enable > > DEBUG_KERNEL > DEBUG_PAGEALLOC > PAGE_POISONING > DEBUG_STACKOVERFLOW > DEBUG_STACK_USAGE > DEBUG_OBJECTS > DEBUG_OBJECTS_FREE > DEBUG_SLAB or DEBUG_SLUB or SQLB_DEBUG > > and the stack size is configured with > > 4KSTACKS > As reported on bug #12794, I consider that bug to be resolved. Since I applied the last patch I did not have any more crashes like the ones reported here. These two bugs really seemed the same to me. You may close this if you consider that you don't need any more info about this. Regards, Pitxyoki On Tuesday 29 December 2009, Luís Picciochi Oliveira wrote:
> Hi,
> The bug is present on 2.6.32 and subsequent versions (2.6.32.1, 2.6.32.2).
> It has been resolved as of 2.6.33-rc1.
>
> Regards,
> Luís Picciochi
>
> On Tue, Dec 29, 2009 at 3:28 PM, Rafael J. Wysocki <rjw@sisk.pl> wrote:
> >
> > This message has been generated automatically as a part of a report
> > of regressions introduced between 2.6.31 and 2.6.32.
> >
> > The following bug entry is on the current list of known regressions
> > introduced between 2.6.31 and 2.6.32. Please verify if it still should
> > be listed and let me know (either way).
> >
> >
> > Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=14436
> > Subject : Computer becomes unusable without any apparent reason
> > Submitter : Pitxyoki <Pitxyoki@gmail.com>
> > Date : 2009-10-18 18:32 (73 days old)
On Tuesday 29 December 2009, Luís Picciochi Oliveira wrote:
> On Tue, Dec 29, 2009 at 10:04 PM, Rafael J. Wysocki <rjw@sisk.pl> wrote:
> > On Tuesday 29 December 2009, Luís Picciochi Oliveira wrote:
> >> Hi,
> >> The bug is present on 2.6.32 and subsequent versions (2.6.32.1, 2.6.32.2).
> >> It has been resolved as of 2.6.33-rc1.
> >
> > Thanks for the update.
> >
> > Is it known how it was fixed in 2.6.33-rc1 or do you just see that the bug
> is
> > not present in there any more?
>
> Hi,
> Like I reported at [1], I strongly believe this was the same as bug as
> #12794, which was resolved by a patch resulting from my feedback and
> Jussi Kivilinna's work. After I enabled memory debug like suggested on
> bug #14436, everything pointed in the direction of that bug.
> The issue was solved after applying Jussi's patch. That patch has been
> commited to the mainline kernel and I can assert that since then the
> bug didn't occur again.
|