Bug 22462

Summary: 2.6.33.1 Regression: Unexpected power-off during boot or when resuming from suspend on Vostro V13
Product: Platform Specific/Hardware Reporter: Guillaume Pothier (gpothier)
Component: x86-64Assignee: platform_x86_64 (platform_x86_64)
Status: RESOLVED DUPLICATE    
Severity: normal CC: acpi-bugzilla, andrea.cimitan, bjorn.helgaas, jbarnes, peponnet.cyril, rjw, xcoulon
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 2.6.33.1 Subsystem:
Regression: Yes Bisected commit-id:
Bug Depends on:    
Bug Blocks: 7216    
Attachments: Output of dmesg
Output of lsmod
Config file
ubuntu config that reproduces the bug even with latest git
diff between the ubuntu broken config and a working one
this *always* crashes during boot with bluetooth disabled in the bios
dump during a broken suspend/resume
this config crashes
diff between the a broken config and a working one

Description Guillaume Pothier 2010-11-08 13:07:03 UTC
This bug was originally reported in the Launchpad bug tracker:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/588194

The bug is not present in kernel 2.6.31-22.

Pasting relevant parts of the summary here:
Using Ubuntu 10.04 32 bit Desktop, the system suddenly and unexpectedly powers off either when booting or when resuming from suspend or hibernate. If waking from suspend, this happens after the login box appears but even before I have begun typing my password. Unfortunately, this happens only once in about 7 or 8 suspends and not everytime. This is on a Dell Vostro V13 laptop.

After powering on the system again immediately after the problem has occurred, I have checked kernel.log and /var/log/messages and found nothing out of the ordinary there.
Comment 1 Guillaume Pothier 2010-11-08 17:04:42 UTC
I did a bit more testing, and it seems the regression occurred between 2.6.33 and 2.6.33.1. I'm now trying to bisect to find the precise commit.
Comment 2 Guillaume Pothier 2010-11-09 11:32:54 UTC
I'm bisecting now, but it takes a long time so I wanted to make that known:
in the previous comment I mentioned that the regression occurred between 2.6.33 and 2.6.33.1, but it seems it is only true for Ubuntu mainline kernels downloaded there:
http://kernel.ubuntu.com/~kernel-ppa/mainline/v2.6.33/
http://kernel.ubuntu.com/~kernel-ppa/mainline/v2.6.33.1-lucid/

Compiling from kernel.org's git, the regression is between 2.6.34-rc2 and 2.6.34-rc6.

I don't know what could be the difference between Ubuntu's mainline kernels and the ones from git, except maybe the config.
Comment 3 Guillaume Pothier 2010-11-12 02:08:57 UTC
Here's the result of the bisect. Note that the last step did not build:

gpothier@tadzim:linux-git$ git bisect skip
There are only 'skip'ped commits left to test.
The first bad commit could be any of:
bb910a7040e90a0ca3d3e8245d6d5c128a5d1287
522dba7134d6b2e5821d3457f7941ec34f668e6d
We cannot bisect more!

The commit message for the first candidate is "PCI/PM Runtime: Make runtime PM of PCI devices inactive by default", and message for the second candidate is "Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/pci-2.6"


My testing process consisted in considering a build "good" if I could do at least 10 suspend/resume cycles in a row. 

Is there anything else I can do to help?


Here is the full bisect log:
git bisect start
# bad: [f6f94e2ab1b33f0082ac22d71f66385a60d8157f] Linux 2.6.36
git bisect bad f6f94e2ab1b33f0082ac22d71f66385a60d8157f
# good: [57d54889cd00db2752994b389ba714138652e60c] Linux 2.6.34-rc1
git bisect good 57d54889cd00db2752994b389ba714138652e60c
# skip: [e9a5f426b85e429bffaee4e0b086b1e742a39fa6] CPU: Avoid using unititialized error variable in disable_nonboot_cpus()
git bisect skip e9a5f426b85e429bffaee4e0b086b1e742a39fa6
# bad: [fd4dc88e46c4d9dd845ffef50a975ceea110fd85] staging: hv: Fix error checking in channel.c
git bisect bad fd4dc88e46c4d9dd845ffef50a975ceea110fd85
# bad: [c6c352371c1ce486a62f4eb92e545b05cfcef76b] ARM: 5965/1: Fix soft lockup in at91 udc driver
git bisect bad c6c352371c1ce486a62f4eb92e545b05cfcef76b
# bad: [11130736c99c37e253f45b2d3fd30b07313f83c6] ACPI: processor: refactor internal map_lapic_id()
git bisect bad 11130736c99c37e253f45b2d3fd30b07313f83c6
# skip: [91e013827c0bcbb187ecf02213c5446b6f62d445] Merge branch 'master' into for-linus
git bisect skip 91e013827c0bcbb187ecf02213c5446b6f62d445
# skip: [c251c7f738cd94eb3a1febda318078c661eccb4d] drivers/net/tulip/eeprom.c: fix bogus "(null)" in tulip init messages
git bisect skip c251c7f738cd94eb3a1febda318078c661eccb4d
# good: [0636b33c5f2fac4e274464ae6867805f080fc433] Merge branches 'cxgb3', 'ipoib', 'misc' and 'nes' into for-next
git bisect good 0636b33c5f2fac4e274464ae6867805f080fc433
# skip: [41dcc17c735d4e99a91002b09850d0f09ee4ab4b] ARM: mach-shmobile: pfc-sh7377: modify KEYIN settings
git bisect skip 41dcc17c735d4e99a91002b09850d0f09ee4ab4b
# bad: [21768639be419d00275ac4e58b863361d0c24ee4] edac: mpc85xx mask ecc syndrome correctly
git bisect bad 21768639be419d00275ac4e58b863361d0c24ee4
# bad: [2afb18981739a1426af2a6c952e03c5966b3dfc6] broadsheetfb: add MMIO hooks
git bisect bad 2afb18981739a1426af2a6c952e03c5966b3dfc6
# bad: [416d8d2888db392c562fb8afaf9136730ef0da9e] drivers/block/floppy.c: remove REPEAT macro
git bisect bad 416d8d2888db392c562fb8afaf9136730ef0da9e
# bad: [045f98363080ddbbcef6b8b8306ec58a818406a0] drivers/block/floppy.c: remove used once CHECK_READY macro
git bisect bad 045f98363080ddbbcef6b8b8306ec58a818406a0
# bad: [516a82422209e078345d0ca54b16793d7bfd4782] sdio: recognize io card without powercycle
git bisect bad 516a82422209e078345d0ca54b16793d7bfd4782
# bad: [522dba7134d6b2e5821d3457f7941ec34f668e6d] Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/pci-2.6
git bisect bad 522dba7134d6b2e5821d3457f7941ec34f668e6d
# good: [51d0f6d1f50349579f007adf5c0b51aaedd93b94] Merge git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable
git bisect good 51d0f6d1f50349579f007adf5c0b51aaedd93b94
# skip: [bb910a7040e90a0ca3d3e8245d6d5c128a5d1287] PCI/PM Runtime: Make runtime PM of PCI devices inactive by default
git bisect skip bb910a7040e90a0ca3d3e8245d6d5c128a5d1287
Comment 4 Rafael J. Wysocki 2010-11-16 22:05:50 UTC
The change made by "PCI/PM Runtime: Make runtime PM of PCI devices inactive
by default" has no effect on the behavior of PCI devices that don't
support runtime PM.

Can you check if 2.6.37-rc2 fixes the problem for you, please?
Comment 5 Guillaume Pothier 2010-11-17 02:33:27 UTC
Thanks for taking a look at this issue. The problem persists in 2.6.37-rc2. 
Poweroff occurs either during boot or when resuming from suspend.
Comment 6 Rafael J. Wysocki 2010-11-17 20:29:50 UTC
What happens if you do:

# echo core > /sys/power/pm_test
# echo mem > /sys/power/state

(it should carry out a fake suspend-resume cycle and return to the command
prompt in about 5-10 sec.)?

Does it reboot too?
Comment 7 Guillaume Pothier 2010-11-17 20:48:53 UTC
Yes, I got a poweroff at the first attempt (with 2.6.37-rc2). The poweroff occurred after about 10s.
Comment 8 Rafael J. Wysocki 2010-11-22 23:25:21 UTC
Does it also happen with the following test (after a fresh boot):

# echo devices > /sys/power/pm_test
# echo mem > /sys/power/state
Comment 9 Guillaume Pothier 2010-11-23 04:19:10 UTC
Yes, it still happens with "echo devices" instead of "echo core".
Comment 10 Rafael J. Wysocki 2010-11-30 22:17:49 UTC
Does it happen 100% of the time?

Please attach a dmesg log and the output of lsmod from your system.
Comment 11 Guillaume Pothier 2010-12-01 00:00:49 UTC
Created attachment 38692 [details]
Output of dmesg
Comment 12 Guillaume Pothier 2010-12-01 00:01:26 UTC
Created attachment 38702 [details]
Output of lsmod
Comment 13 Guillaume Pothier 2010-12-01 00:06:09 UTC
Requested information attached.
It doesn't happen 100% of the time, but I've never been able to perform more than 5-6 suspend/resume cycles with the faulty kernels.
I have been playing with bios settings and found a way to trigger it 100% of the time (assuming it is the same bug) by disabling bluetooth. Here is the original comment about this:

I did some more testing and although it seems several devices cause suspend/resume problems, the biggest issue is with bluetooth and usb.
I went to the bios settings and disabled all the integrated devices. In that configuration I did 10 suspend/resume cycles without problem.
Then I started enabling a few devices. With WLAN enabled I had one lock-up out of 10 suspend/resume cycles, but it was a lockup during suspend, not during resume, so it is probably another issue.
However enabling bluetooth makes it impossible to suspend/resume more than once or twice (after the second resume, the machine powers off during resume).
But the strangest thing is this: disabling bluetooth but enabling outside USB ports systematically triggers the unexpected power off during boot, the system becomes unbootable.
Comment 14 Guillaume Pothier 2010-12-01 00:31:21 UTC
Correction: I just tested that again and disabling bluetooth doesn't seem to have any effect anymore on 2.6.37-rc2. 
With bluetooth disabled I had 4 poweroffs out of 10 boots, and with bluetooth enabled I had 3 out of 10. I don't think there is a statically significant difference.
Comment 15 Andrea Cimitan 2010-12-02 09:43:08 UTC
same issue on my brand new dell vostro v13,  if you need any info, I'm glad to help
Comment 16 Rafael J. Wysocki 2010-12-02 20:11:42 UTC
Hmm.  Does booting with pci=nocrs makes things better?
Comment 17 Rafael J. Wysocki 2010-12-02 20:17:25 UTC
Also please try to get the contents of /proc/iomem before suspend and
after a successful resume, if possible.
Comment 18 Guillaume Pothier 2010-12-02 20:32:10 UTC
pci_nocrs seems to mitigate the problem. Actually, I was going to say it makes the problem disappear because I was able to perform 15 suspend/resume cycles, but when attempting the 16th to get the contents of /proc/iomem, it powered off...

Anyway, here are the contents of /proc/iomem before and after a suspend/resume cycle, on 2.6.37-rc without pci=nocrs (or do you need it with pci=nocrs?):


gpothier@tadzim:~$ cat /proc/iomem 
00000000-0000ffff : reserved
00010000-0009d3ff : System RAM
0009d400-0009ffff : reserved
000a0000-000bffff : PCI Bus 0000:00
000d2000-000d3fff : reserved
000d4000-000d7fff : PCI Bus 0000:00
000d8000-000dbfff : PCI Bus 0000:00
000dc000-000fffff : reserved
00100000-bbaa0fff : System RAM
  01000000-015880b5 : Kernel code
  015880b6-01a9d9cf : Kernel data
  01b89000-01c6f57f : Kernel bss
bbaa1000-bbaa6fff : reserved
bbaa7000-bbbb9fff : System RAM
bbbba000-bbc0efff : reserved
bbc0f000-bbd07fff : System RAM
bbd08000-bbf0efff : reserved
bbf0f000-bbf18fff : System RAM
bbf19000-bbf1efff : reserved
bbf1f000-bbf64fff : System RAM
bbf65000-bbf9efff : ACPI Non-volatile Storage
bbf9f000-bbfe4fff : System RAM
bbfe5000-bbffefff : ACPI Tables
bbfff000-bbffffff : System RAM
c0000000-dfffffff : PCI Bus 0000:00
  c8000000-c9ffffff : PCI Bus 0000:09
    c8000000-c8001fff : 0000:09:00.0
      c8000000-c8001fff : iwlagn
  ca000000-cbffffff : PCI Bus 0000:07
  cc000000-cdffffff : PCI Bus 0000:05
  ce000000-cfffffff : PCI Bus 0000:03
  d0000000-dfffffff : 0000:00:02.0
e0000000-efffffff : PCI MMCONFIG 0000 [bus 00-ff]
  e0000000-efffffff : pnp 00:0a
f0000000-febfffff : PCI Bus 0000:00
  f0000000-f1ffffff : PCI Bus 0000:01
  f2000000-f3ffffff : PCI Bus 0000:03
  f4000000-f7ffffff : PCI Bus 0000:05
    f4000000-f4003fff : 0000:05:00.0
      f4000000-f4003fff : r8169
    f6000000-f6000fff : 0000:05:00.0
      f6000000-f6000fff : r8169
    f7fe0000-f7ffffff : 0000:05:00.0
  f8000000-f9ffffff : PCI Bus 0000:07
  fa000000-fbffffff : PCI Bus 0000:09
  fc000000-fdffffff : PCI Bus 0000:01
  fe000000-fe3fffff : 0000:00:02.0
  fe400000-fe4fffff : 0000:00:02.1
  fe700000-fe703fff : 0000:00:1b.0
    fe700000-fe703fff : ICH HD audio
  fe704000-fe7047ff : 0000:00:1f.2
    fe704000-fe7047ff : ahci
  fe704800-fe704bff : 0000:00:1a.7
    fe704800-fe704bff : ehci_hcd
  fe704c00-fe704fff : 0000:00:1d.7
    fe704c00-fe704fff : ehci_hcd
  febfe000-febfefff : Intel Flush Page
  febfff00-febfffff : 0000:00:1f.3
fec00000-fec003ff : IOAPIC 0
fed00000-fed003ff : HPET 0
  fed00000-fed003ff : pnp 00:03
fed10000-fed13fff : pnp 00:0a
fed18000-fed18fff : pnp 00:0a
fed19000-fed19fff : pnp 00:0a
fed1c000-fed1ffff : pnp 00:0a
fed20000-fed3ffff : pnp 00:0a
fed45000-fed8ffff : pnp 00:0a
fee00000-fee00fff : Local APIC
100000000-13fffffff : System RAM
gpothier@tadzim:~$ sudo pm-suspend
[sudo] password for gpothier: 
gpothier@tadzim:~$ cat /proc/iomem 
00000000-0000ffff : reserved
00010000-0009d3ff : System RAM
0009d400-0009ffff : reserved
000a0000-000bffff : PCI Bus 0000:00
000d2000-000d3fff : reserved
000d4000-000d7fff : PCI Bus 0000:00
000d8000-000dbfff : PCI Bus 0000:00
000dc000-000fffff : reserved
00100000-bbaa0fff : System RAM
  01000000-015880b5 : Kernel code
  015880b6-01a9d9cf : Kernel data
  01b89000-01c6f57f : Kernel bss
bbaa1000-bbaa6fff : reserved
bbaa7000-bbbb9fff : System RAM
bbbba000-bbc0efff : reserved
bbc0f000-bbd07fff : System RAM
bbd08000-bbf0efff : reserved
bbf0f000-bbf18fff : System RAM
bbf19000-bbf1efff : reserved
bbf1f000-bbf64fff : System RAM
bbf65000-bbf9efff : ACPI Non-volatile Storage
bbf9f000-bbfe4fff : System RAM
bbfe5000-bbffefff : ACPI Tables
bbfff000-bbffffff : System RAM
c0000000-dfffffff : PCI Bus 0000:00
  c8000000-c9ffffff : PCI Bus 0000:09
    c8000000-c8001fff : 0000:09:00.0
      c8000000-c8001fff : iwlagn
  ca000000-cbffffff : PCI Bus 0000:07
  cc000000-cdffffff : PCI Bus 0000:05
  ce000000-cfffffff : PCI Bus 0000:03
  d0000000-dfffffff : 0000:00:02.0
e0000000-efffffff : PCI MMCONFIG 0000 [bus 00-ff]
  e0000000-efffffff : pnp 00:0a
f0000000-febfffff : PCI Bus 0000:00
  f0000000-f1ffffff : PCI Bus 0000:01
  f2000000-f3ffffff : PCI Bus 0000:03
  f4000000-f7ffffff : PCI Bus 0000:05
    f4000000-f4003fff : 0000:05:00.0
      f4000000-f4003fff : r8169
    f6000000-f6000fff : 0000:05:00.0
      f6000000-f6000fff : r8169
    f7fe0000-f7ffffff : 0000:05:00.0
  f8000000-f9ffffff : PCI Bus 0000:07
  fa000000-fbffffff : PCI Bus 0000:09
  fc000000-fdffffff : PCI Bus 0000:01
  fe000000-fe3fffff : 0000:00:02.0
  fe400000-fe4fffff : 0000:00:02.1
  fe700000-fe703fff : 0000:00:1b.0
    fe700000-fe703fff : ICH HD audio
  fe704000-fe7047ff : 0000:00:1f.2
    fe704000-fe7047ff : ahci
  fe704800-fe704bff : 0000:00:1a.7
    fe704800-fe704bff : ehci_hcd
  fe704c00-fe704fff : 0000:00:1d.7
    fe704c00-fe704fff : ehci_hcd
  febfe000-febfefff : Intel Flush Page
  febfff00-febfffff : 0000:00:1f.3
fec00000-fec003ff : IOAPIC 0
fed00000-fed003ff : HPET 0
  fed00000-fed003ff : pnp 00:03
fed10000-fed13fff : pnp 00:0a
fed18000-fed18fff : pnp 00:0a
fed19000-fed19fff : pnp 00:0a
fed1c000-fed1ffff : pnp 00:0a
fed20000-fed3ffff : pnp 00:0a
fed45000-fed8ffff : pnp 00:0a
fee00000-fee00fff : Local APIC
100000000-13fffffff : System RAM
Comment 19 Rafael J. Wysocki 2010-12-02 21:48:39 UTC
This very likely is a problem with the allocation of PCI resources which the
BIOS apparently doesn't give us useful information about.
Comment 20 Rafael J. Wysocki 2010-12-02 21:58:34 UTC
Guillaume, why did you skip so many commits during the bisection?
Comment 21 Rafael J. Wysocki 2010-12-02 22:05:35 UTC
Also, would it be possible to bisect again, starting from good=v2.6.34-rc2
and bad=v2.6.34-rc6 (if that's what you're still seeing)?
Comment 22 Guillaume Pothier 2010-12-02 22:11:17 UTC
I had to skip commits because either the kernel didn't compile, or produced a kernel panic on boot.
I'll try to bisect again, but as it takes a long time, would it be useful if I try first with 2.6.37-rc2 minus the "PCI/PM Runtime: Make runtime PM of PCI devices inactive by default" commit?
Comment 23 Rafael J. Wysocki 2010-12-02 22:13:42 UTC
Yes, please do.
Comment 24 Bjorn Helgaas 2010-12-02 22:48:16 UTC
All the reports I've seen are on the Dell Vostro V13:
  Anoop P B (launchpad 588194 reporter)
  Kenneth (launchpad 588194 comment 22)
  gpothier (launchpad 588194 comment 24, kernel.org 22462 reporter)
  znotdead (launchpad 588194 comment 25)
  Emile (launchpad 588194 comment 40)
  Alejandro Arcos (launchpad 588194 comment 43)
  Andrea Cimitan (launchpad 588194 comment 45, kernel.org comment 15)

No /proc/iomem difference from before/after (comment 18).

pci=nocrs should only affect PCI resource assignment.  From the dmesg
in attachment 38692 [details], I think we do three assignments (the 02.0 "Flush
Page" actually doesn't show in dmesg, but it's in the iomem above):

  pci 0000:00:1f.3: BAR 0: assigned [mem 0xfebfff00-0xfebfffff 64bit]
  pci 0000:05:00.0: BAR 6: assigned [mem 0xf7fe0000-0xf7ffffff pref]
  pci 0000:00:02.0: Flush Page assigned [mem 0xfebfe000-0xfebfefff]

00:1f.3: i2c-i801 SMBus controller; we assign BAR 0, but I don't think
you (Guillaume) have the driver loaded, so I doubt this matters.

05:00.0: r8169 NIC; we assign the option ROM, but I don't think the
driver looks at it, so I doubt this matters either.

00:02.0: i915 video; hmm...  intel-gtt.c allocates this flush page, and
this *does* matter.  It'd be interesting to see if removing intel_gtt
from the picture makes a difference.

The btusb_intr_complete complaints in the dmesg are also interesting.
Clearly *something* is wrong there, and it looks like it's connected
via the 00:1a.0 USB controller, which we didn't touch.
Comment 25 peponnet.cyril 2010-12-03 15:06:36 UTC
If it can help, from SLED11 GA up to date, the laptop is freezing during rsync transfers in gigabit full duplex (about 4GB). And when rebooting we can hear some errors beep from the main board indicating an NVRAM error according to Dell Support.

I don't made tests in SP1 for rsync transfer in SLED11 SP1.

For the powerloss at boot / resume and sometimes losing network interface, it never occurs on SLED11GA, it only occures since we are using the SLED11 SP1 kernel.

SLED11 GA : 2.6.27.45
SLED11 SP1 : 2.6.32.24

I've tried to bissect to but I give up.

On my own, I though all this stuff is a network side issue (sometimes after a poweroff on boot, I don't see any network card in BIOS).


I know too, that some people are experiencing network interface loss when using dual boot (Linux first, resume from sleep, and reboot windows os). Sometimes the network interface is not here, and not in the BIOS too and may cause linux to poweroff the system when loading the kernel.
Comment 26 Rafael J. Wysocki 2010-12-03 20:47:20 UTC
(In reply to comment #25)
> For the powerloss at boot / resume and sometimes losing network interface, it
> never occurs on SLED11GA, it only occures since we are using the SLED11 SP1
> kernel.

This bug is for tracking this particular issue, which seems to have been
introduced way after 2.6.27.  Please report the earlier issues separately.

> SLED11 GA : 2.6.27.45
> SLED11 SP1 : 2.6.32.24
> 
> I've tried to bissect to but I give up.
> 
> On my own, I though all this stuff is a network side issue (sometimes after a
> poweroff on boot, I don't see any network card in BIOS).
> 
> I know too, that some people are experiencing network interface loss when
> using
> dual boot (Linux first, resume from sleep, and reboot windows os). Sometimes
> the network interface is not here, and not in the BIOS too and may cause
> linux
> to poweroff the system when loading the kernel.

That may be a result of the network adapter's PCI resources overlapping with
something else.
Comment 27 Andrea Cimitan 2010-12-07 10:16:33 UTC
*** Bug 24192 has been marked as a duplicate of this bug. ***
Comment 28 Guillaume Pothier 2011-01-01 19:33:49 UTC
Sorry for the silence, I've been very busy this month. I just tested with the latest kernel (cloned today from kernel.org), and it seems the problem got fixed: I went through 20 suspend/resume cycles without a hiccup.
Comment 29 Andrea Cimitan 2011-01-03 10:26:37 UTC
(In reply to comment #28)
> Sorry for the silence, I've been very busy this month. I just tested with the
> latest kernel (cloned today from kernel.org), and it seems the problem got
> fixed: I went through 20 suspend/resume cycles without a hiccup.

with mainline? I did a git clone git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6.git linux-2.6
compiled in ubuntu maverick using config from previous kernel, and it fails to resume.
which is your config? which kernel did you grab?
Comment 30 Guillaume Pothier 2011-01-03 11:29:03 UTC
@Andrea:
Yes, with mainline. I followed the instructions from this page:
https://wiki.kubuntu.org/KernelTeam/GitKernelBuild
So that's a fresh clone from 2 days ago. When did you clone?
I also took the config from a previous kernel (2.6.31-22), I'll attach it for reference.
Comment 31 Guillaume Pothier 2011-01-03 11:30:45 UTC
Created attachment 42232 [details]
Config file
Comment 32 Andrea Cimitan 2011-01-03 13:48:16 UTC
(In reply to comment #31)
> Created an attachment (id=42232) [details]
> Config file

this work (though I had to adjust this to latest git).
now compiling with ubuntu default config, to see if it's the config file that breaks
Comment 33 Andrea Cimitan 2011-01-03 16:36:25 UTC
ok, using ubuntu config it crashes with git. so, it's not fixed, but something in the ubuntu config is triggering the bug. I'll attach the ubuntu config, and a diff between the ubuntu config and the config that does not reproduce the bug.
Comment 34 Andrea Cimitan 2011-01-03 16:38:23 UTC
Created attachment 42282 [details]
ubuntu config that reproduces the bug even with latest git
Comment 35 Andrea Cimitan 2011-01-03 16:39:07 UTC
Created attachment 42292 [details]
diff between the ubuntu broken config and a working one
Comment 36 Andrea Cimitan 2011-01-05 19:16:16 UTC
Created attachment 42422 [details]
this *always* crashes during boot with bluetooth disabled in the bios
Comment 37 Andrea Cimitan 2011-01-05 20:00:23 UTC
Created attachment 42432 [details]
dump during a broken suspend/resume

I don't know if those are useful or not
Comment 38 Rafael J. Wysocki 2011-01-05 20:14:49 UTC
One thing that might make a difference is that your failing kernels have
CONFIG_HIGHMEM64G set.

Can please set CONFIG_HIGHMEM4G instead in your failing .config and see if
that helps?
Comment 39 Rafael J. Wysocki 2011-01-05 20:16:12 UTC
Also, which is related, CONFIG_X86_PAE=y looks like it might be the culprit.
Comment 40 Rafael J. Wysocki 2011-01-05 20:23:37 UTC
Other suspicious .config options apparently set in the failing .configs are:

CONFIG_ACPI_BLACKLIST_YEAR=2000
CONFIG_ACPI_APEI=y
CONFIG_X86_APM_BOOT=y
CONFIG_X86_SPEEDSTEP_ICH=y
CONFIG_X86_SPEEDSTEP_SMI=y
CONFIG_INTEL_IDLE=y

There are probably more, but I'd start from checking the above (and the
ones in the two previous comments).
Comment 41 Andrea Cimitan 2011-01-06 02:09:01 UTC
Created attachment 42522 [details]
this config crashes
Comment 42 Andrea Cimitan 2011-01-06 02:09:35 UTC
(In reply to comment #40)
> Other suspicious .config options apparently set in the failing .configs are:
> 
> CONFIG_ACPI_BLACKLIST_YEAR=2000
> CONFIG_ACPI_APEI=y
> CONFIG_X86_APM_BOOT=y
> CONFIG_X86_SPEEDSTEP_ICH=y
> CONFIG_X86_SPEEDSTEP_SMI=y
> CONFIG_INTEL_IDLE=y
> 
> There are probably more, but I'd start from checking the above (and the
> ones in the two previous comments).

tried, it crashes. see above
Comment 43 Andrea Cimitan 2011-01-06 16:42:39 UTC
Created attachment 42562 [details]
diff between the a broken config and a working one

continue doing a config-bisect :) here is a diff between a broken kernel and a kernel that works (strange, it doesn't seem to touch acpi)
Comment 44 Andrea Cimitan 2011-01-06 19:20:36 UTC
(In reply to comment #43)
> Created an attachment (id=42562) [details]
> diff between the a broken config and a working one
> 
> continue doing a config-bisect :) here is a diff between a broken kernel and
> a
> kernel that works (strange, it doesn't seem to touch acpi)

the strange thing is that if I apply that diff to the ubuntu config it doesn't work :D
Comment 45 Rafael J. Wysocki 2011-01-12 23:27:40 UTC
First, in what sense doesn't it work?  Do you mean it doesn't build?

Second, I _think_ we can rule out things that are build as modules in your
failing .config, unless one of these modules is actually loaded.  Would it be
possible to check if they aren't loaded?
Comment 46 Andrea Cimitan 2011-01-14 17:40:36 UTC
Patch attached to this bug fixes the issue:
https://bugzilla.kernel.org/show_bug.cgi?id=15416

Patch is: https://bugzilla.kernel.org/attachment.cgi?id=41862
Comment 47 Rafael J. Wysocki 2011-01-15 22:02:15 UTC
Cool, thanks for the information.

*** This bug has been marked as a duplicate of bug 15416 ***