Bug 15621 - BUG: unable to handle kernel paging request - comm: pccardd
Summary: BUG: unable to handle kernel paging request - comm: pccardd
Status: CLOSED UNREPRODUCIBLE
Alias: None
Product: Drivers
Classification: Unclassified
Component: PCMCIA (show other bugs)
Hardware: All Linux
: P1 normal
Assignee: linux-pcmcia
URL:
Keywords:
Depends on:
Blocks: 15310
  Show dependency tree
 
Reported: 2010-03-24 10:07 UTC by Ozgur Yuksel
Modified: 2010-04-04 19:17 UTC (History)
2 users (show)

See Also:
Kernel Version: 2.6.34-rc2 ae6be51ed01d6c4aaf249a207b4434bc7785853b
Subsystem:
Regression: Yes
Bisected commit-id:


Attachments
Dump for first failure (3.12 KB, application/octet-stream)
2010-03-24 10:09 UTC, Ozgur Yuksel
Details
Recurring failures that follow (12.65 KB, application/octet-stream)
2010-03-24 10:09 UTC, Ozgur Yuksel
Details
Serial console log with pci=nocrs set (38.49 KB, application/octet-stream)
2010-03-29 08:46 UTC, Ozgur Yuksel
Details
dmesg iomem ioports acpidump outputs (65.35 KB, application/x-bzip2)
2010-03-29 09:08 UTC, Ozgur Yuksel
Details
Serial console log without pci=nocrs and ignore_loglevel (49.42 KB, application/octet-stream)
2010-04-01 08:30 UTC, Ozgur Yuksel
Details
Serial console log without pci=nocrs and with ignore_loglevel (66.16 KB, application/octet-stream)
2010-04-01 08:31 UTC, Ozgur Yuksel
Details
Serial console log without pci=nocrs and with acpi debugging (186.82 KB, application/octet-stream)
2010-04-01 11:49 UTC, Ozgur Yuksel
Details

Description Ozgur Yuksel 2010-03-24 10:07:25 UTC
After building ae6be51ed01d6c4aaf249a207b4434bc7785853b, bootup gives out:
[   75.245698] BUG: unable to handle kernel paging request at 746f7274
[   75.249007] IP: [<c014ded0>] iomem_map_sanity_check+0x70/0x170
[   75.249007] *pdpt = 000000002371c001 *pde = 0000000000000000
[   75.249007] Oops: 0000 [#1] SMP
[   75.249007] last sysfs file: /sys/devices/pnp0/00:0e/id
[   75.272054] Modules linked in: sbp2 ip_tables snd yenta_socket ppdev psmouse soundcort
[   75.272054]
[   75.272054] Pid: 998, comm: pccardd Not tainted 2.6.34-rc2 #1 0KU184/Latitude D630
[   75.306331] EIP: 0060:[<c014ded0>] EFLAGS: 00010202 CPU: 1
[   75.306331] EIP is at iomem_map_sanity_check+0x70/0x170
[   75.306331] EAX: 746f7270 EBX: 000f4800 ECX: 01100018 EDX: 746f7270
[   75.306331] ESI: 00000000 EDI: 00001000 EBP: e4701d34 ESP: e4701cd0
[   75.306331]  DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068
[   75.306331] Process pccardd (pid: 998, ti=e4700000 task=e36f3fc0 task.ti=e4700000)
[   75.306331] Stack:
[   75.359751]  c04a0f5c 00000004 00000000 00000002 00000000 28172cf5 e47d4390 00000013
[   75.359751] <0> 00000000 e4701d04 f4800fff 00000000 000f4800 00000000 000f4800 0000000
[   75.359751] <0> f4800000 00000000 f4800fff 00000000 f4801000 00000000 f4800000 0000000
[   75.359751] Call Trace:
[   75.359751]  [<c04a0f5c>] ? raw_pci_write+0x7c/0x80
[   75.359751]  [<c012a02e>] ? __ioremap_caller+0xae/0x3f0
[   75.359751]  [<c01f001b>] ? kmem_cache_alloc_notrace+0x6b/0xb0
[   75.421337]  [<c014eb6e>] ? __request_region+0x1e/0x210
[   75.421337]  [<c043963b>] ? usb_hcd_pci_probe+0x17b/0x3f0
[   75.436059]  [<c012a43a>] ? ioremap_nocache+0x1a/0x20
[   75.436059]  [<c043963b>] ? usb_hcd_pci_probe+0x17b/0x3f0
[   75.436059]  [<c043963b>] ? usb_hcd_pci_probe+0x17b/0x3f0
[   75.436059]  [<c024de78>] ? sysfs_add_one+0x18/0x100
[   75.436059]  [<c024d477>] ? sysfs_new_dirent+0x67/0x100
[   75.436059]  [<c033c0be>] ? local_pci_probe+0xe/0x10
[   75.436059]  [<c033ce40>] ? pci_device_probe+0x60/0x80
[   75.436059]  [<c03bba69>] ? driver_probe_device+0x69/0x150
[   75.436059]  [<c03bbb91>] ? __device_attach+0x41/0x50
[   75.436059]  [<c03badd8>] ? bus_for_each_drv+0x48/0x70
[   75.436059]  [<c03bbd8d>] ? device_attach+0x6d/0x80
[   75.436059]  [<c03bbb50>] ? __device_attach+0x0/0x50
[   75.436059]  [<c03bac2d>] ? bus_probe_device+0x1d/0x40
[   75.436059]  [<c03b997a>] ? device_add+0x48a/0x560
[   75.436059]  [<c033a43e>] ? pci_set_cacheline_size+0x8e/0xe0
[   75.436059]  [<c03374a7>] ? pci_bus_add_device+0x17/0x40
[   75.436059]  [<c0337510>] ? pci_bus_add_devices+0x40/0x120
[   75.436059]  [<f8b3e9ba>] ? cb_alloc+0xca/0xe0 [pcmcia_core]
[   75.436059]  [<f8b3de29>] ? socket_insert+0xd9/0x100 [pcmcia_core]
[   75.436059]  [<f8b3e289>] ? pccardd+0x309/0x400 [pcmcia_core]
[   75.436059]  [<f8b3df80>] ? pccardd+0x0/0x400 [pcmcia_core]
[   75.436059]  [<c0161e5c>] ? kthread+0x6c/0x80
[   75.436059]  [<c0161df0>] ? kthread+0x0/0x80
[   75.436059]  [<c01035c6>] ? kernel_thread_helper+0x6/0x10
[   75.436059] Code: 55 ec 89 4d d8 8b 4d f0 89 5d dc 89 75 e0 83 c2 ff 83 d1 ff 89 55 c
[   75.436059] EIP: [<c014ded0>] iomem_map_sanity_check+0x70/0x170 SS:ESP 0068:e4701cd0
[   75.436059] CR2: 00000000746f7274
[   75.439957] ---[ end trace c9fcf1971e726fcf ]---

But kernel continues to boot .. But unfortunately fails with below later on:

[  141.736006] BUG: soft lockup - CPU#0 stuck for 61s! [modprobe:573]
[  141.736006] Modules linked in: auth_rpcgss iwl3945(+) snd_timer uinput snd_seq_devicet
[  141.736006] Modules linked in: auth_rpcgss iwl3945(+) snd_timer uinput snd_seq_devicet
[  141.736006]
[  141.736006] Pid: 573, comm: modprobe Tainted: G      D 2.6.34-rc2 #1 0KU184/Latitu
[  141.736006] EIP: 0060:[<c058ff4c>] EFLAGS: 00000287 CPU: 0
[  141.736006] EIP is at __write_lock_failed+0xc/0x20
[  141.736006] EAX: c077e2e4 EBX: fe8fffff ECX: e4713240 EDX: e4713240
[  141.736006] ESI: 00000000 EDI: c077e2c0 EBP: e454bd70 ESP: e454bd70
[  141.736006]  DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068
[  141.736006] Process modprobe (pid: 573, ti=e454a000 task=e3425940 task.ti=e454a000)
[  141.736006] Stack:
[  141.736006]  e454bd78 c0590151 e454bda8 c014ebc9 00000000 0000000d e4713240 c04a0f5c
[  141.736006] <0> 0000000d 00000040 e4713240 00000000 00001000 00000000 e454bdf0 c0339c8
[  141.736006] <0> 00001000 00000000 f87b3d33 00000000 00000000 fe8fffff 00000000 0000100
[  141.736006] Call Trace:
[  141.736006]  [<c0590151>] ? _raw_write_lock+0x11/0x20
[  141.736006]  [<c014ebc9>] ? __request_region+0x79/0x210
[  141.736006]  [<c04a0f5c>] ? raw_pci_write+0x7c/0x80
[  141.736006]  [<c0339cd8>] ? __pci_request_region+0x158/0x1c0
[  141.736006]  [<c0339f07>] ? __pci_request_selected_regions+0x37/0x70
[  141.736006]  [<c0339f92>] ? pci_request_selected_regions+0x12/0x20
[  141.736006]  [<c0339faf>] ? pci_request_regions+0xf/0x20
[  141.736006]  [<f87a8602>] ? iwl3945_pci_probe+0x112/0x9d0 [iwl3945]
[  141.736006]  [<c058ed34>] ? mutex_lock+0x14/0x40
[  141.736006]  [<c033c0be>] ? local_pci_probe+0xe/0x10
[  141.736006]  [<c033ce40>] ? pci_device_probe+0x60/0x80
[  141.736006]  [<c03bba69>] ? driver_probe_device+0x69/0x150
[  141.736006]  [<c03bbe19>] ? __driver_attach+0x79/0x80
[  141.736006]  [<c03baed8>] ? bus_for_each_dev+0x48/0x70
[  141.736006]  [<c03bb919>] ? driver_attach+0x19/0x20
[  141.736006]  [<c03bbda0>] ? __driver_attach+0x0/0x80
[  141.736006]  [<c03bb21f>] ? bus_add_driver+0xbf/0x2a0
[  141.736006]  [<c033cd80>] ? pci_device_remove+0x0/0x40
[  141.736006]  [<c03bbf35>] ? driver_register+0x65/0x120
[  141.736006]  [<f87cc2e8>] ? ieee80211_rate_control_register+0xc8/0x120 [mac80211]
[  141.736006]  [<c033d060>] ? __pci_register_driver+0x40/0xb0
[  141.736006]  [<f86da050>] ? iwl3945_init+0x50/0x6e [iwl3945]
[  141.736006]  [<c010112c>] ? do_one_initcall+0x2c/0x190
[  141.736006]  [<f86da000>] ? iwl3945_init+0x0/0x6e [iwl3945]
[  141.736006]  [<c017c011>] ? sys_init_module+0xb1/0x220
[  141.736006]  [<c0102fe3>] ? sysenter_do_call+0x12/0x28
[  141.736006] Code: c7 45 f8 01 00 00 00 e8 03 fe ff ff 89 d8 83 c4 10 5b 5d c3 90 90 9

And the bootup starts to loop around dumps with similar / same stack ..
Comment 1 Ozgur Yuksel 2010-03-24 10:09:07 UTC
Created attachment 25680 [details]
Dump for first failure
Comment 2 Ozgur Yuksel 2010-03-24 10:09:43 UTC
Created attachment 25681 [details]
Recurring failures that follow
Comment 3 Andrew Morton 2010-03-24 14:14:17 UTC
(switched to email.  Please respond via emailed reply-to-all, not via the
bugzilla web interface).

On Wed, 24 Mar 2010 10:07:54 GMT bugzilla-daemon@bugzilla.kernel.org wrote:

> https://bugzilla.kernel.org/show_bug.cgi?id=15621
> 
>            Summary: BUG: unable to handle kernel paging request  - comm:
>                     pccardd
>            Product: Drivers
>            Version: 2.5
>     Kernel Version: 2.6.34-rc2 ae6be51ed01d6c4aaf249a207b4434bc7785853b
>           Platform: All
>         OS/Version: Linux
>               Tree: Mainline
>             Status: NEW
>           Severity: normal
>           Priority: P1
>          Component: PCMCIA
>         AssignedTo: linux-pcmcia@lists.infradead.org
>         ReportedBy: ozgur.yuksel@oracle.com
>         Regression: Yes

It looks like the iomem_resource tree got wrecked.  Has anyone been
changing anything in there lately?

> 
> After building ae6be51ed01d6c4aaf249a207b4434bc7785853b, bootup gives out:
> [   75.245698] BUG: unable to handle kernel paging request at 746f7274
> [   75.249007] IP: [<c014ded0>] iomem_map_sanity_check+0x70/0x170
> [   75.249007] *pdpt = 000000002371c001 *pde = 0000000000000000
> [   75.249007] Oops: 0000 [#1] SMP
> [   75.249007] last sysfs file: /sys/devices/pnp0/00:0e/id
> [   75.272054] Modules linked in: sbp2 ip_tables snd yenta_socket ppdev
> psmouse
> soundcort
> [   75.272054]
> [   75.272054] Pid: 998, comm: pccardd Not tainted 2.6.34-rc2 #1
> 0KU184/Latitude D630
> [   75.306331] EIP: 0060:[<c014ded0>] EFLAGS: 00010202 CPU: 1
> [   75.306331] EIP is at iomem_map_sanity_check+0x70/0x170
> [   75.306331] EAX: 746f7270 EBX: 000f4800 ECX: 01100018 EDX: 746f7270
> [   75.306331] ESI: 00000000 EDI: 00001000 EBP: e4701d34 ESP: e4701cd0
> [   75.306331]  DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068
> [   75.306331] Process pccardd (pid: 998, ti=e4700000 task=e36f3fc0
> task.ti=e4700000)
> [   75.306331] Stack:
> [   75.359751]  c04a0f5c 00000004 00000000 00000002 00000000 28172cf5
> e47d4390
> 00000013
> [   75.359751] <0> 00000000 e4701d04 f4800fff 00000000 000f4800 00000000
> 000f4800 0000000
> [   75.359751] <0> f4800000 00000000 f4800fff 00000000 f4801000 00000000
> f4800000 0000000
> [   75.359751] Call Trace:
> [   75.359751]  [<c04a0f5c>] ? raw_pci_write+0x7c/0x80
> [   75.359751]  [<c012a02e>] ? __ioremap_caller+0xae/0x3f0
> [   75.359751]  [<c01f001b>] ? kmem_cache_alloc_notrace+0x6b/0xb0
> [   75.421337]  [<c014eb6e>] ? __request_region+0x1e/0x210
> [   75.421337]  [<c043963b>] ? usb_hcd_pci_probe+0x17b/0x3f0
> [   75.436059]  [<c012a43a>] ? ioremap_nocache+0x1a/0x20
> [   75.436059]  [<c043963b>] ? usb_hcd_pci_probe+0x17b/0x3f0
> [   75.436059]  [<c043963b>] ? usb_hcd_pci_probe+0x17b/0x3f0
> [   75.436059]  [<c024de78>] ? sysfs_add_one+0x18/0x100
> [   75.436059]  [<c024d477>] ? sysfs_new_dirent+0x67/0x100
> [   75.436059]  [<c033c0be>] ? local_pci_probe+0xe/0x10
> [   75.436059]  [<c033ce40>] ? pci_device_probe+0x60/0x80
> [   75.436059]  [<c03bba69>] ? driver_probe_device+0x69/0x150
> [   75.436059]  [<c03bbb91>] ? __device_attach+0x41/0x50
> [   75.436059]  [<c03badd8>] ? bus_for_each_drv+0x48/0x70
> [   75.436059]  [<c03bbd8d>] ? device_attach+0x6d/0x80
> [   75.436059]  [<c03bbb50>] ? __device_attach+0x0/0x50
> [   75.436059]  [<c03bac2d>] ? bus_probe_device+0x1d/0x40
> [   75.436059]  [<c03b997a>] ? device_add+0x48a/0x560
> [   75.436059]  [<c033a43e>] ? pci_set_cacheline_size+0x8e/0xe0
> [   75.436059]  [<c03374a7>] ? pci_bus_add_device+0x17/0x40
> [   75.436059]  [<c0337510>] ? pci_bus_add_devices+0x40/0x120
> [   75.436059]  [<f8b3e9ba>] ? cb_alloc+0xca/0xe0 [pcmcia_core]
> [   75.436059]  [<f8b3de29>] ? socket_insert+0xd9/0x100 [pcmcia_core]
> [   75.436059]  [<f8b3e289>] ? pccardd+0x309/0x400 [pcmcia_core]
> [   75.436059]  [<f8b3df80>] ? pccardd+0x0/0x400 [pcmcia_core]
> [   75.436059]  [<c0161e5c>] ? kthread+0x6c/0x80
> [   75.436059]  [<c0161df0>] ? kthread+0x0/0x80
> [   75.436059]  [<c01035c6>] ? kernel_thread_helper+0x6/0x10
> [   75.436059] Code: 55 ec 89 4d d8 8b 4d f0 89 5d dc 89 75 e0 83 c2 ff 83 d1
> ff 89 55 c
> [   75.436059] EIP: [<c014ded0>] iomem_map_sanity_check+0x70/0x170 SS:ESP
> 0068:e4701cd0
> [   75.436059] CR2: 00000000746f7274
> [   75.439957] ---[ end trace c9fcf1971e726fcf ]---
> 
> But kernel continues to boot .. But unfortunately fails with below later on:
> 
> [  141.736006] BUG: soft lockup - CPU#0 stuck for 61s! [modprobe:573]
> [  141.736006] Modules linked in: auth_rpcgss iwl3945(+) snd_timer uinput
> snd_seq_devicet
> [  141.736006] Modules linked in: auth_rpcgss iwl3945(+) snd_timer uinput
> snd_seq_devicet
> [  141.736006]
> [  141.736006] Pid: 573, comm: modprobe Tainted: G      D 2.6.34-rc2 #1
> 0KU184/Latitu
> [  141.736006] EIP: 0060:[<c058ff4c>] EFLAGS: 00000287 CPU: 0
> [  141.736006] EIP is at __write_lock_failed+0xc/0x20
> [  141.736006] EAX: c077e2e4 EBX: fe8fffff ECX: e4713240 EDX: e4713240
> [  141.736006] ESI: 00000000 EDI: c077e2c0 EBP: e454bd70 ESP: e454bd70
> [  141.736006]  DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068
> [  141.736006] Process modprobe (pid: 573, ti=e454a000 task=e3425940
> task.ti=e454a000)
> [  141.736006] Stack:
> [  141.736006]  e454bd78 c0590151 e454bda8 c014ebc9 00000000 0000000d
> e4713240
> c04a0f5c
> [  141.736006] <0> 0000000d 00000040 e4713240 00000000 00001000 00000000
> e454bdf0 c0339c8
> [  141.736006] <0> 00001000 00000000 f87b3d33 00000000 00000000 fe8fffff
> 00000000 0000100
> [  141.736006] Call Trace:
> [  141.736006]  [<c0590151>] ? _raw_write_lock+0x11/0x20
> [  141.736006]  [<c014ebc9>] ? __request_region+0x79/0x210
> [  141.736006]  [<c04a0f5c>] ? raw_pci_write+0x7c/0x80
> [  141.736006]  [<c0339cd8>] ? __pci_request_region+0x158/0x1c0
> [  141.736006]  [<c0339f07>] ? __pci_request_selected_regions+0x37/0x70
> [  141.736006]  [<c0339f92>] ? pci_request_selected_regions+0x12/0x20
> [  141.736006]  [<c0339faf>] ? pci_request_regions+0xf/0x20
> [  141.736006]  [<f87a8602>] ? iwl3945_pci_probe+0x112/0x9d0 [iwl3945]
> [  141.736006]  [<c058ed34>] ? mutex_lock+0x14/0x40
> [  141.736006]  [<c033c0be>] ? local_pci_probe+0xe/0x10
> [  141.736006]  [<c033ce40>] ? pci_device_probe+0x60/0x80
> [  141.736006]  [<c03bba69>] ? driver_probe_device+0x69/0x150
> [  141.736006]  [<c03bbe19>] ? __driver_attach+0x79/0x80
> [  141.736006]  [<c03baed8>] ? bus_for_each_dev+0x48/0x70
> [  141.736006]  [<c03bb919>] ? driver_attach+0x19/0x20
> [  141.736006]  [<c03bbda0>] ? __driver_attach+0x0/0x80
> [  141.736006]  [<c03bb21f>] ? bus_add_driver+0xbf/0x2a0
> [  141.736006]  [<c033cd80>] ? pci_device_remove+0x0/0x40
> [  141.736006]  [<c03bbf35>] ? driver_register+0x65/0x120
> [  141.736006]  [<f87cc2e8>] ? ieee80211_rate_control_register+0xc8/0x120
> [mac80211]
> [  141.736006]  [<c033d060>] ? __pci_register_driver+0x40/0xb0
> [  141.736006]  [<f86da050>] ? iwl3945_init+0x50/0x6e [iwl3945]
> [  141.736006]  [<c010112c>] ? do_one_initcall+0x2c/0x190
> [  141.736006]  [<f86da000>] ? iwl3945_init+0x0/0x6e [iwl3945]
> [  141.736006]  [<c017c011>] ? sys_init_module+0xb1/0x220
> [  141.736006]  [<c0102fe3>] ? sysenter_do_call+0x12/0x28
> [  141.736006] Code: c7 45 f8 01 00 00 00 e8 03 fe ff ff 89 d8 83 c4 10 5b 5d
> c3 90 90 9
> 
> And the bootup starts to loop around dumps with similar / same stack ..
>
Comment 4 Bjorn Helgaas 2010-03-25 16:53:12 UTC
> It looks like the iomem_resource tree got wrecked.  Has anyone been
> changing anything in there lately?

My pci=use_crs patches change the contents of the iomem_resource tree,
and it's possible they broke some assumptions PCMCIA was making, so
you might see if "pci=nocrs" makes any difference.  If it does, please
attach an acpidump and the entire dmesg logs with and without that option.
Comment 5 Anonymous Emailer 2010-03-25 18:20:33 UTC
Reply-To: linux@dominikbrodowski.net

On Thu, Mar 25, 2010 at 10:51:39AM -0600, Bjorn Helgaas wrote:
> > It looks like the iomem_resource tree got wrecked.  Has anyone been
> > changing anything in there lately?
> 
> My pci=use_crs patches change the contents of the iomem_resource tree,
> and it's possible they broke some assumptions PCMCIA was making, so
> you might see if "pci=nocrs" makes any difference.  If it does, please
> attach an acpidump and the entire dmesg logs with and without that option.

... and /proc/iomem as well as /proc/ioports , please.
Comment 6 Anonymous Emailer 2010-03-25 18:20:41 UTC
Reply-To: linux@dominikbrodowski.net

On Thu, Mar 25, 2010 at 10:51:39AM -0600, Bjorn Helgaas wrote:
> > It looks like the iomem_resource tree got wrecked.  Has anyone been
> > changing anything in there lately?
> 
> My pci=use_crs patches change the contents of the iomem_resource tree,
> and it's possible they broke some assumptions PCMCIA was making, so
> you might see if "pci=nocrs" makes any difference.  If it does, please
> attach an acpidump and the entire dmesg logs with and without that option.

... and /proc/iomem as well as /proc/ioports , please.
Comment 7 Ozgur Yuksel 2010-03-29 08:46:20 UTC
Created attachment 25755 [details]
Serial console log with pci=nocrs set
Comment 8 Ozgur Yuksel 2010-03-29 09:08:06 UTC
Created attachment 25756 [details]
dmesg iomem ioports acpidump outputs
Comment 9 Ozgur Yuksel 2010-03-29 09:13:32 UTC
Thu, Mar 25, 2010 at 06:01:38PM +0100 was the time for Dominik Brodowski to speak thus:
> 
> On Thu, Mar 25, 2010 at 10:51:39AM -0600, Bjorn Helgaas wrote:
> > > It looks like the iomem_resource tree got wrecked.  Has anyone been
> > > changing anything in there lately?
> > 
> > My pci=use_crs patches change the contents of the iomem_resource tree,
> > and it's possible they broke some assumptions PCMCIA was making, so
> > you might see if "pci=nocrs" makes any difference.  If it does, please
> > attach an acpidump and the entire dmesg logs with and without that option.
> 
> ... and /proc/iomem as well as /proc/ioports , please.
Using pci=nocrs workarounds the problem. For data collection, since the boot
does not complete without the w/a - only dmesg is available. 

With pci=nocrs, accessing /proc/iomem gets killed by kernel for some reason.

/proc/iomem /proc/ioports and acpidump are provided for 2.6.31-20-generic-pae
kernel for convenience / comparison.
Comment 10 Bjorn Helgaas 2010-03-30 23:12:37 UTC
Rafael, this is a regression from 2.6.33, in case it's not on your
list yet.

Ozgur, thanks for attaching the logs.  There's some interesting stuff
there that I don't understand yet, such as this from the pci=nocrs dmesg:

  [    1.577758] pci 0000:00:1e.0: PCI bridge to [bus 03-04]
  [    1.583031] pci 0000:00:1e.0:   bridge window [io  0x5000-0x5fff]
  [    1.551889] pci 0000:03:01.0: CardBus bridge to [bus 04-07]
  [    1.557507] pci 0000:03:01.0:   bridge window [io  0x5000-0x50ff]
  [    1.603303] PCI: No. 2 try to assign unassigned res
  [    1.688208] pci 0000:03:01.0: CardBus bridge to [bus 04-07]
  [    1.693826] pci 0000:03:01.0:   bridge window [io  0x0000-0x00ff]

Apparently we moved that CardBus I/O window from [0x5000-0x5fff] to
[0x0-0xff].  I'm dubious about that because the upstream bridge at
00:1e.0 only positively decodes [0x5000-0x5fff] (though it *is* in
subtractive decode mode, so it will forward more).  I wish we had
a little more debug output about when & why we moved that window.

I'm especially dubious because your /proc/ioports with pci=nocrs
from comment 8 (which is the case that's supposed to be working)
contains this:

  5000-5fff : PCI Bus 0000:03
    0000-00ff : PCI CardBus 0000:04
    0000-00ff : PCI CardBus 0000:04

That looks completely broken in terms of the hierarchy.  It looks
like you have a USB device in the CardBus slot (ohci_hcd 0000:04:00.0).
Maybe the broken hierarchy doesn't cause problems with this device
because it doesn't use I/O ports.

Anyway, I'd like to see the entire dmesg log when booted *without*
pci=nocrs, because that's the case that fails.  Since the system doesn't
boot, you'll have to use a serial console or netconsole to collect the
whole thing.  The serial console log in comment 7 is corrupted; it looks
like all the lines got truncated to 80 columns or something.  And please
boot with "ignore_loglevel" so we see all the debug messages on the console.
Also, no need to tar up and compress your attachments -- I always figure
if bugzilla wants to compress stuff, it can do it internally without
bothering us.
Comment 11 Bjorn Helgaas 2010-03-31 21:51:02 UTC
Ozgur, can you please apply the patch from bug 15533, comment 5, turn on CONFIG_ACPI_DEBUG, and boot with "acpi.debug_level=0x00010000
acpi.debug_layer=0x00000100" when you collect the console log without "pci=nocrs"?
Comment 12 Ozgur Yuksel 2010-04-01 08:30:37 UTC
Created attachment 25797 [details]
Serial console log without pci=nocrs and ignore_loglevel
Comment 13 Ozgur Yuksel 2010-04-01 08:31:34 UTC
Created attachment 25798 [details]
Serial console log without pci=nocrs and with ignore_loglevel
Comment 14 Ozgur Yuksel 2010-04-01 08:33:32 UTC
Interestingly when ignore_loglevel is used, the problem does not reproduce. Now I'll proceed with actions in comment #11.
Comment 15 Ozgur Yuksel 2010-04-01 09:19:29 UTC
Tue, Mar 30, 2010 at 05:10:59PM -0600 was the time for Bjorn Helgaas to speak thus:
> Anyway, I'd like to see the entire dmesg log when booted *without*
> pci=nocrs, because that's the case that fails.  Since the system doesn't
> boot, you'll have to use a serial console or netconsole to collect the
> whole thing.  The serial console log in comment 7 is corrupted; it looks
> like all the lines got truncated to 80 columns or something.  And please
> boot with "ignore_loglevel" so we see all the debug messages on the console.

Interestingly when ignore_loglevel is used, the problem does not reproduce. Now
I'll proceed with actions in comment #11.
Comment 16 Ozgur Yuksel 2010-04-01 11:49:57 UTC
Created attachment 25799 [details]
Serial console log without pci=nocrs and with acpi debugging
Comment 17 Ozgur Yuksel 2010-04-01 12:15:46 UTC
Applied the patch from bug 15533, comment 5, turned on CONFIG_ACPI_DEBUG, and booted with "acpi.debug_level=0x00010000 acpi.debug_layer=0x00000100" without "pci=nocrs" - problem did not reproduce and the console log is uploaded.
Comment 18 Bjorn Helgaas 2010-04-01 17:36:04 UTC
Using ignore_loglevel shouldn't affect the problem, so I'm confused.
Can you reproduce the original problem and attach the entire serial
console log?
Comment 19 Ozgur Yuksel 2010-04-02 17:03:07 UTC
Thu, Apr 01, 2010 at 11:34:13AM -0600 was the time for Bjorn Helgaas to speak thus:
> Using ignore_loglevel shouldn't affect the problem, so I'm confused.
> Can you reproduce the original problem and attach the entire serial
> console log?

It seems that the problem does not reproduce at all now. Unfortunately I do not
have the images I have built on 2010-03-29 08:46 and building from a fresh
ae6be51ed01d6c4aaf249a207b4434bc7785853b does not reproduce the problem. It is
most likely the specific .config I used at the time (which I do not have
anymore). Also I have been doing other builds on the same system, so maybe it
was just a stale module or smth. 

FWIW the problem does not reproduce with 2.6.34-rc3 at all too (on the very same
hardware).

Note You need to log in before you can comment on or make changes to this bug.