Bug 11296

Summary: 2.6.27-rc2-git4: suspend and power off fails on Asus M3A32-MVP
Product: Power Management Reporter: Rafael J. Wysocki (rjw)
Component: cpufreqAssignee: cpufreq
Status: CLOSED CODE_FIX    
Severity: normal CC: bunk, mark.langsdorf
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 2.6.27-rc2-git4 Tree: Mainline
Regression: Yes
Bug Depends on:    
Bug Blocks: 7216, 11167    
Attachments: oops message
another crash report
yet another crash report

Description Rafael J. Wysocki 2008-08-09 15:03:30 UTC
Subject    : 2.6.27-rc2-git4: suspend and power off fails on Asus M3A32-MVP
Submitter  : "Rafael J. Wysocki" <rjw@sisk.pl>
Date       : 2008-08-09 21:21
References : http://marc.info/?l=linux-kernel&m=121831675111794&w=4

This entry is being used for tracking a regression from 2.6.26.  Please don't
close it until the problem is fixed in the mainline.
Comment 1 Rafael J. Wysocki 2008-08-15 10:57:26 UTC
Handled-By : "Langsdorf, Mark" <mark.langsdorf@amd.com>
Comment 2 Adrian Bunk 2008-08-20 10:40:32 UTC
fixed by commit f607e3a03c90e8c050cb0c12ec9967c2925cc812
Comment 3 Mark Langsdorf 2008-08-22 08:29:34 UTC
Created attachment 17374 [details]
oops message

We used hardware tools to break into a failing system and catch the OOPS.  It's slab corruption on resume, with a slab entry not having any valid links.
Comment 4 Mark Langsdorf 2008-08-22 08:30:21 UTC
Created attachment 17375 [details]
another crash report

I rebuilt the kernel with slab and suspend/resume debugging enabled and tried again.  A different but similar crash report.
Comment 5 Mark Langsdorf 2008-08-22 08:30:57 UTC
Created attachment 17376 [details]
yet another crash report

One more crash report.  This shows the slab corruption again, but doesn't indicate why.
Comment 6 Rafael J. Wysocki 2008-08-22 09:17:42 UTC
This looks similar to the problem described in this e-mail thread:

http://marc.info/?t=121933979400002&r=1&w=4
Comment 7 Rafael J. Wysocki 2008-08-22 09:26:08 UTC
Well, no, it doesn't really.

Is that with SLUB or SLAB?  If this is with SLUB, then corrupting per-CPU memory could lead to that.

Would it be practicable to split the patch reverted by commit f607e3a03c90e8c050cb0c12ec9967c2925cc812 into two patches, one introducing the per-CPU variables and the other one actually causing them to be passed to acpi_processor_preregister_performance() ?  Then, we could verify which part of the original patch causes the problem to happen.
Comment 8 Mark Langsdorf 2008-08-22 10:24:38 UTC
No, it's SLAB.

I'll see what I can do split the patch.  It won't be very useful that way, but I could create a bunch of unused per-cpu variables.
Comment 9 Rafael J. Wysocki 2008-08-22 11:21:56 UTC
(In reply to comment #8)
> No, it's SLAB.
> 
> I'll see what I can do split the patch.  It won't be very useful that way,
> but
> I could create a bunch of unused per-cpu variables.

Well, if they are unused, we won't be able to check if per-CPU memory is corrupted.

In fact, I don't see anything other than some corruption of per-CPU memory that could result from this patch and lead to slab corruption.