Bug 19942

Summary: Not a intel bug: kernel BUG at drivers/pci/intel-iommu.c:1656
Product: Drivers Reporter: MartinG (gronslet)
Component: NetworkAssignee: drivers_network (drivers_network)
Status: RESOLVED OBSOLETE    
Severity: normal CC: alan
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 2.6.36-0.35.rc7.git0.fc15.x86_64 Subsystem:
Regression: No Bisected commit-id:

Description MartinG 2010-10-09 10:07:08 UTC
On my Fedora Rawhide system, I keep getting these errors, which kills my wifi and require me to reboot my Lenovo Thinkpad T400. Please also see
https://bugzilla.redhat.com/show_bug.cgi?id=637554
https://bugs.freedesktop.org/show_bug.cgi?id=30722

In the latter, I was asked to file the bug here, as it isn't a intel bug.
Fedora Rawhide, kernel-2.6.36-0.35.rc7.git0.fc15.x86_64,
xorg-x11-drv-intel-2.12.0-6.fc14.1.x86_64, xorg-x11-drivers-7.4-1.fc14.x86_64
xorg-x11-server-utils-7.4-20.fc15.x86_64, NetworkManager-0.8.1-7.git20100831.fc15.x86_64

This happens when I resume my laptop after suspend to ram:

[24572.218077] PM: resume devices took 0.987 seconds
[24572.239068] PM: Finishing wakeup.
[24572.239216] Restarting tasks ... 
[24572.239332] usb 2-4: USB disconnect, address 2
[24572.245520] done.
[24572.245702] video LNXVIDEO:00: Restoring backlight state
[24572.249109] ehci_hcd 0000:00:1d.7: dma_pool_free buffer-2048, ffff880134f9d000/ffffb000 (bad dma)
[24572.249631] ehci_hcd 0000:00:1d.7: dma_pool_free buffer-2048, ffff880134f9d080/ffffb080 (bad dma)
[24572.249977] cdc_ether 2-4:1.7: wwan0: unregister 'cdc_ether' usb-0000:00:1d.7-4, Mobile Broadband Network Device
[24573.685674] ------------[ cut here ]------------
[24573.685709] kernel BUG at drivers/pci/intel-iommu.c:1656!
[24573.685734] invalid opcode: 0000 [#1] SMP 
[24573.685761] last sysfs file: /sys/devices/system/cpu/sched_mc_power_savings
[24573.685791] CPU 0 
[24573.685803] Modules linked in: rfcomm sunrpc sco bnep l2cap cpufreq_ondemand acpi_cpufreq freq_table mperf ip6t_REJECT xt_physdev nf_conntrack_ipv6 ip6table_filter ipt_MASQUERADE iptable_nat ip6_tables nf_nat sha256_generic cryptd aes_x86_64 aes_generic cbc dm_crypt uinput arc4 ecb snd_hda_codec_conexant snd_hda_intel iwlagn snd_hda_codec snd_hwdep zaurus iwlcore snd_seq snd_seq_device r852 sm_common cdc_ether nand nand_ids nand_ecc microcode mac80211 uvcvideo usbnet mtd mii cdc_acm snd_pcm btusb cdc_wdm bluetooth videodev iTCO_wdt i2c_i801 iTCO_vendor_support joydev cfg80211 thinkpad_acpi v4l1_compat v4l2_compat_ioctl32 e1000e snd_timer rfkill snd_page_alloc wmi snd soundcore ipv6 sdhci_pci sdhci firewire_ohci mmc_core firewire_core yenta_socket crc_itu_t i915 drm_kms_helper drm i2c_algo_bit i2c_core video output [last unloaded: scsi_wait_scan]
[24573.686007] 
[24573.686007] Pid: 8321, comm: NetworkManager Not tainted 2.6.36-0.35.rc7.git0.fc15.x86_64 #1 6474AR4/6474AR4
[24573.686007] RIP: 0010:[<ffffffff8126ea48>]  [<ffffffff8126ea48>] __domain_mapping+0x43/0x1ce
[24573.686007] RSP: 0018:ffff880133727648  EFLAGS: 00010206
[24573.694051] RAX: 0000000001ffffff RBX: ffff8800b4687400 RCX: 000000000000001b
[24573.694051] RDX: 000000000008b621 RSI: 000ffffffffffdff RDI: ffff8801320f6dc0
[24573.694051] RBP: ffff880133727698 R08: 0000000000000001 R09: 0000000000000003
[24573.694051] R10: ffff8801320f6df8 R11: 0000000000000000 R12: 0000000000000000
[24573.694051] R13: ffff8801320f6dc0 R14: ffff88013bc04ff8 R15: 0000000000000001
[24573.694051] FS:  00007fb24c872800(0000) GS:ffff880002c00000(0000) knlGS:0000000000000000
[24573.694051] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[24573.694051] CR2: 000000000042da00 CR3: 000000012fbc2000 CR4: 00000000000006f0
[24573.694051] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[24573.694051] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[24573.694051] Process NetworkManager (pid: 8321, threadinfo ffff880133726000, task ffff88012f8f0000)
[24573.694051] Stack:
[24573.694051]  ffff88013707e240 ffff8801320f6dc0 ffff880133727698 000ffffffffffdff
[24573.694051] <0> 0000000000000000 ffff8800b4687400 000000008b621000 ffff8801320f6dc0
[24573.694051] <0> ffff88013bc04ff8 0000000000000000 ffff8801337276f8 ffffffff8126f710
[24573.694051] Call Trace:
[24573.694051]  [<ffffffff8126f710>] __intel_map_single.clone.25+0xdc/0x16b
[24573.694051]  [<ffffffff8126f88c>] intel_alloc_coherent+0xae/0xd5
[24573.694051]  [<ffffffffa018c128>] e1000_alloc_ring_dma.clone.28+0x94/0xc0 [e1000e]
[24573.694051]  [<ffffffffa018e359>] e1000e_setup_tx_resources+0x65/0xaa [e1000e]
[24573.694051]  [<ffffffffa018e891>] e1000_open+0x64/0x41e [e1000e]
[24573.694051]  [<ffffffff813eeb45>] __dev_open+0x9b/0xd2
[24573.694051]  [<ffffffff813eed87>] __dev_change_flags+0xad/0x130
[24573.694051]  [<ffffffff813eee8b>] dev_change_flags+0x21/0x56
[24573.694051]  [<ffffffff813f90f9>] do_setlink+0x2ba/0x61f
[24573.694051]  [<ffffffff8107d8c7>] ? print_lock_contention_bug+0x1b/0xd5
[24573.694051]  [<ffffffff81249f7f>] ? debug_check_no_obj_freed+0x65/0x18a
[24573.694051]  [<ffffffff8107d8c7>] ? print_lock_contention_bug+0x1b/0xd5
[24573.694051]  [<ffffffff813f96be>] rtnl_setlink+0xd0/0xf2
[24573.694051]  [<ffffffff813f99ac>] rtnetlink_rcv_msg+0x1eb/0x201
[24573.694051]  [<ffffffff813f97c1>] ? rtnetlink_rcv_msg+0x0/0x201
[24573.694051]  [<ffffffff8140d3e5>] netlink_rcv_skb+0x45/0x90
[24573.694051]  [<ffffffff813f8d29>] rtnetlink_rcv+0x26/0x2d
[24573.694051]  [<ffffffff8140cec0>] netlink_unicast+0xee/0x157
[24573.694051]  [<ffffffff8140d1e1>] netlink_sendmsg+0x2b8/0x2d6
[24573.694051]  [<ffffffff813da64e>] __sock_sendmsg+0x6b/0x77
[24573.694051]  [<ffffffff813da9a8>] sock_sendmsg+0xa8/0xc1
[24573.694051]  [<ffffffff8107ff07>] ? lock_acquire+0xee/0xfd
[24573.694051]  [<ffffffff810fb080>] ? might_fault+0x5c/0xac
[24573.694051]  [<ffffffff8107fe0d>] ? lock_release+0x19a/0x1a6
[24573.694051]  [<ffffffff810fb0c9>] ? might_fault+0xa5/0xac
[24573.694051]  [<ffffffff813e4dbb>] ? copy_from_user+0x2f/0x31
[24573.694051]  [<ffffffff813e51ae>] ? verify_iovec+0x57/0x99
[24573.694051]  [<ffffffff813dc971>] sys_sendmsg+0x235/0x2b3
[24573.694051]  [<ffffffff8112bb26>] ? rcu_read_lock+0x0/0x35
[24573.694051]  [<ffffffff8107ff07>] ? lock_acquire+0xee/0xfd
[24573.694051]  [<ffffffff8112bb26>] ? rcu_read_lock+0x0/0x35
[24573.694051]  [<ffffffff813dc405>] ? sys_sendto+0x125/0x152
[24573.694051]  [<ffffffff8112c5a2>] ? fput+0x22/0x1d6
[24573.694051]  [<ffffffff8112c4ae>] ? fget_light+0x79/0x83
[24573.694051]  [<ffffffff81133e5b>] ? path_put+0x22/0x27
[24573.694051]  [<ffffffff810a8443>] ? audit_syscall_entry+0x11c/0x148
[24573.694051]  [<ffffffff8149da45>] ? trace_hardirqs_on_thunk+0x3a/0x3f
[24573.694051]  [<ffffffff81009cf2>] system_call_fastpath+0x16/0x1b
[24573.694051] Code: d4 48 89 ca 48 89 7d b8 6b 8f 84 00 00 00 09 48 89 75 c8 4d 89 c7 83 c1 12 83 f9 3f 7f 0f 4a 8d 44 06 ff 48 d3 e8 48 85 c0 74 02 <0f> 0b 41 f6 c1 03 b8 ea ff ff ff 0f 84 6b 01 00 00 41 81 e1 03 
[24573.694051] RIP  [<ffffffff8126ea48>] __domain_mapping+0x43/0x1ce
[24573.694051]  RSP <ffff880133727648>
[24573.821392] ---[ end trace 391efc8948e1496b ]---
[24573.832050] NetworkManager used greatest stack depth: 2064 bytes left
[24574.026042] usb 4-2: new full speed USB device using uhci_hcd and address 3
[24574.187102] usb 4-2: New USB device found, idVendor=0a5c, idProduct=2145
[24574.188244] usb 4-2: New USB device strings: Mfr=1, Product=2, SerialNumber=0
[24574.189418] usb 4-2: Product: ThinkPad Bluetooth with Enhanced Data Rate II
[24574.190567] usb 4-2: Manufacturer: Lenovo Computer Corp
[24576.230085] usb 2-4: new high speed USB device using ehci_hcd and address 3
[24577.080715] usb 2-4: New USB device found, idVendor=0bdb, idProduct=1900
[24577.081862] usb 2-4: New USB device strings: Mfr=1, Product=2, SerialNumber=3
[24577.083009] usb 2-4: Product: Ericsson F3507g Mobile Broadband Minicard Composite Device
[24577.084140] usb 2-4: Manufacturer: Ericsson
[24577.085263] usb 2-4: SerialNumber: 3541430207407750
[24577.144202] cdc_acm 2-4:1.1: ttyACM0: USB ACM device
[24577.163044] cdc_acm 2-4:1.3: ttyACM1: USB ACM device
[24577.174389] cdc_wdm 2-4:1.5: cdc-wdm0: USB WDM device
[24577.183588] cdc_wdm 2-4:1.6: cdc-wdm1: USB WDM device
[26974.894966] thinkpad_acpi: EC reports that Thermal Table has changed


Note that I explicitly have disabled iommu for intel:
# cat /proc/cmdline 
ro root=/dev/VolGroup00/lv_root rhgb quiet selinux=0 vga=0x318 SYSFONT=latarcyrheb-sun16 LANG=en_US.UTF-8 KEYTABLE=no intel_iommu=igfx_off

I've seen this on 2.6.36-0.35.rc7.git0.fc15.x86_64, 2.6.36-0.27.rc5.git6.fc15.x86_64,2.6.36-0.32.rc6.git2.fc15.x86_64.
Comment 1 MartinG 2010-10-09 17:44:11 UTC
I can actually trigger this bug by simply doing "service NetworkManager restart". Seems I am unable to start any apps after the bug occurs.

NetworkManager-0.8.1-8.git20100831.fc15.x86_64 (actually updated from NetworkManager-0.8.1-7.git20100831.fc15.x86_64 right before i restarted it - don't think that NetworkManager itself is the cause, since I've seen this bug several time on the previous version of NM.)

This is my network controller:
03:00.0 Network controller: Intel Corporation Ultimate N WiFi Link 5300
        Subsystem: Intel Corporation Device 1011
        Physical Slot: 1
        Flags: bus master, fast devsel, latency 0, IRQ 50
        Memory at f4300000 (64-bit, non-prefetchable) [size=8K]
        Capabilities: [c8] Power Management version 3
        Capabilities: [d0] MSI: Enable+ Count=1/1 Maskable- 64bit+
        Capabilities: [e0] Express Endpoint, MSI 00
        Capabilities: [100] Advanced Error Reporting
        Capabilities: [140] Device Serial Number 00-16-ea-ff-ff-e3-60-d4
        Kernel driver in use: iwlagn
        Kernel modules: iwlagn



These are my modules: 
coretemp sunrpc cpufreq_ondemand acpi_cpufreq freq_table mperf rfcomm ip6t_REJECT nf_conntrack_ipv6 xt_physdev ipt_MASQUERADE iptable_nat ip6table_filter nf_nat ip6_tables sco bnep l2cap sha256_generic cryptd aes_x86_64 aes_generic cbc dm_crypt uinput snd_hda_codec_conexant arc4 ecb snd_hda_intel iwlagn snd_hda_codec snd_hwdep snd_seq microcode snd_seq_device iwlcore zaurus r852 sm_common cdc_ether nand snd_pcm uvcvideo usbnet nand_ids nand_ecc mac80211 joydev mii btusb videodev cdc_wdm cdc_acm mtd i2c_i801 iTCO_wdt v4l1_compat v4l2_compat_ioctl32 iTCO_vendor_support bluetooth cfg80211 snd_timer thinkpad_acpi rfkill snd_page_alloc e1000e snd wmi soundcore ipv6 sdhci_pci sdhci firewire_ohci firewire_core crc_itu_t mmc_core yenta_socket i915 drm_kms_helper drm i2c_algo_bit i2c_core video output

Specifically:
/lib/modules/2.6.36-0.35.rc7.git0.fc15.x86_64/kernel/drivers/net/e1000e/e1000e.ko
/lib/modules/2.6.36-0.35.rc7.git0.fc15.x86_64/kernel/drivers/net/wireless/iwlwifi/iwlagn.ko
/lib/modules/2.6.36-0.35.rc7.git0.fc15.x86_64/kernel/drivers/net/wireless/iwlwifi/iwlcore.ko
Comment 2 Andrew Morton 2010-10-11 20:45:55 UTC
(switched to email.  Please respond via emailed reply-to-all, not via the
bugzilla web interface).


On Sat, 9 Oct 2010 10:07:15 GMT
bugzilla-daemon@bugzilla.kernel.org wrote:

> https://bugzilla.kernel.org/show_bug.cgi?id=19942
> 
>            Summary: Not a intel bug: kernel BUG at
>                     drivers/pci/intel-iommu.c:1656
>            Product: Drivers
>            Version: 2.5
>     Kernel Version: 2.6.36-0.35.rc7.git0.fc15.x86_64
>           Platform: All
>         OS/Version: Linux
>               Tree: Fedora
>             Status: NEW
>           Severity: normal
>           Priority: P1
>          Component: Network
>         AssignedTo: drivers_network@kernel-bugs.osdl.org
>         ReportedBy: gronslet@gmail.com
>         Regression: No
> 
> 
> On my Fedora Rawhide system, I keep getting these errors, which kills my wifi
> and require me to reboot my Lenovo Thinkpad T400. Please also see
> https://bugzilla.redhat.com/show_bug.cgi?id=637554
> https://bugs.freedesktop.org/show_bug.cgi?id=30722
> 
> In the latter, I was asked to file the bug here, as it isn't a intel bug.
> Fedora Rawhide, kernel-2.6.36-0.35.rc7.git0.fc15.x86_64,
> xorg-x11-drv-intel-2.12.0-6.fc14.1.x86_64, xorg-x11-drivers-7.4-1.fc14.x86_64
> xorg-x11-server-utils-7.4-20.fc15.x86_64,
> NetworkManager-0.8.1-7.git20100831.fc15.x86_64
> 
> This happens when I resume my laptop after suspend to ram:
> 
> [24572.218077] PM: resume devices took 0.987 seconds
> [24572.239068] PM: Finishing wakeup.
> [24572.239216] Restarting tasks ... 
> [24572.239332] usb 2-4: USB disconnect, address 2
> [24572.245520] done.
> [24572.245702] video LNXVIDEO:00: Restoring backlight state
> [24572.249109] ehci_hcd 0000:00:1d.7: dma_pool_free buffer-2048,
> ffff880134f9d000/ffffb000 (bad dma)
> [24572.249631] ehci_hcd 0000:00:1d.7: dma_pool_free buffer-2048,
> ffff880134f9d080/ffffb080 (bad dma)
> [24572.249977] cdc_ether 2-4:1.7: wwan0: unregister 'cdc_ether'
> usb-0000:00:1d.7-4, Mobile Broadband Network Device
> [24573.685674] ------------[ cut here ]------------
> [24573.685709] kernel BUG at drivers/pci/intel-iommu.c:1656!
> [24573.685734] invalid opcode: 0000 [#1] SMP 
> [24573.685761] last sysfs file:
> /sys/devices/system/cpu/sched_mc_power_savings
> [24573.685791] CPU 0 
> [24573.685803] Modules linked in: rfcomm sunrpc sco bnep l2cap
> cpufreq_ondemand
> acpi_cpufreq freq_table mperf ip6t_REJECT xt_physdev nf_conntrack_ipv6
> ip6table_filter ipt_MASQUERADE iptable_nat ip6_tables nf_nat sha256_generic
> cryptd aes_x86_64 aes_generic cbc dm_crypt uinput arc4 ecb
> snd_hda_codec_conexant snd_hda_intel iwlagn snd_hda_codec snd_hwdep zaurus
> iwlcore snd_seq snd_seq_device r852 sm_common cdc_ether nand nand_ids
> nand_ecc
> microcode mac80211 uvcvideo usbnet mtd mii cdc_acm snd_pcm btusb cdc_wdm
> bluetooth videodev iTCO_wdt i2c_i801 iTCO_vendor_support joydev cfg80211
> thinkpad_acpi v4l1_compat v4l2_compat_ioctl32 e1000e snd_timer rfkill
> snd_page_alloc wmi snd soundcore ipv6 sdhci_pci sdhci firewire_ohci mmc_core
> firewire_core yenta_socket crc_itu_t i915 drm_kms_helper drm i2c_algo_bit
> i2c_core video output [last unloaded: scsi_wait_scan]
> [24573.686007] 
> [24573.686007] Pid: 8321, comm: NetworkManager Not tainted
> 2.6.36-0.35.rc7.git0.fc15.x86_64 #1 6474AR4/6474AR4
> [24573.686007] RIP: 0010:[<ffffffff8126ea48>]  [<ffffffff8126ea48>]
> __domain_mapping+0x43/0x1ce
> [24573.686007] RSP: 0018:ffff880133727648  EFLAGS: 00010206
> [24573.694051] RAX: 0000000001ffffff RBX: ffff8800b4687400 RCX:
> 000000000000001b
> [24573.694051] RDX: 000000000008b621 RSI: 000ffffffffffdff RDI:
> ffff8801320f6dc0
> [24573.694051] RBP: ffff880133727698 R08: 0000000000000001 R09:
> 0000000000000003
> [24573.694051] R10: ffff8801320f6df8 R11: 0000000000000000 R12:
> 0000000000000000
> [24573.694051] R13: ffff8801320f6dc0 R14: ffff88013bc04ff8 R15:
> 0000000000000001
> [24573.694051] FS:  00007fb24c872800(0000) GS:ffff880002c00000(0000)
> knlGS:0000000000000000
> [24573.694051] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [24573.694051] CR2: 000000000042da00 CR3: 000000012fbc2000 CR4:
> 00000000000006f0
> [24573.694051] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
> 0000000000000000
> [24573.694051] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7:
> 0000000000000400
> [24573.694051] Process NetworkManager (pid: 8321, threadinfo
> ffff880133726000,
> task ffff88012f8f0000)
> [24573.694051] Stack:
> [24573.694051]  ffff88013707e240 ffff8801320f6dc0 ffff880133727698
> 000ffffffffffdff
> [24573.694051] <0> 0000000000000000 ffff8800b4687400 000000008b621000
> ffff8801320f6dc0
> [24573.694051] <0> ffff88013bc04ff8 0000000000000000 ffff8801337276f8
> ffffffff8126f710
> [24573.694051] Call Trace:
> [24573.694051]  [<ffffffff8126f710>] __intel_map_single.clone.25+0xdc/0x16b
> [24573.694051]  [<ffffffff8126f88c>] intel_alloc_coherent+0xae/0xd5
> [24573.694051]  [<ffffffffa018c128>] e1000_alloc_ring_dma.clone.28+0x94/0xc0
> [e1000e]
> [24573.694051]  [<ffffffffa018e359>] e1000e_setup_tx_resources+0x65/0xaa
> [e1000e]
> [24573.694051]  [<ffffffffa018e891>] e1000_open+0x64/0x41e [e1000e]
> [24573.694051]  [<ffffffff813eeb45>] __dev_open+0x9b/0xd2
> [24573.694051]  [<ffffffff813eed87>] __dev_change_flags+0xad/0x130
> [24573.694051]  [<ffffffff813eee8b>] dev_change_flags+0x21/0x56
> [24573.694051]  [<ffffffff813f90f9>] do_setlink+0x2ba/0x61f
> [24573.694051]  [<ffffffff8107d8c7>] ? print_lock_contention_bug+0x1b/0xd5
> [24573.694051]  [<ffffffff81249f7f>] ? debug_check_no_obj_freed+0x65/0x18a
> [24573.694051]  [<ffffffff8107d8c7>] ? print_lock_contention_bug+0x1b/0xd5
> [24573.694051]  [<ffffffff813f96be>] rtnl_setlink+0xd0/0xf2
> [24573.694051]  [<ffffffff813f99ac>] rtnetlink_rcv_msg+0x1eb/0x201
> [24573.694051]  [<ffffffff813f97c1>] ? rtnetlink_rcv_msg+0x0/0x201
> [24573.694051]  [<ffffffff8140d3e5>] netlink_rcv_skb+0x45/0x90
> [24573.694051]  [<ffffffff813f8d29>] rtnetlink_rcv+0x26/0x2d
> [24573.694051]  [<ffffffff8140cec0>] netlink_unicast+0xee/0x157
> [24573.694051]  [<ffffffff8140d1e1>] netlink_sendmsg+0x2b8/0x2d6
> [24573.694051]  [<ffffffff813da64e>] __sock_sendmsg+0x6b/0x77
> [24573.694051]  [<ffffffff813da9a8>] sock_sendmsg+0xa8/0xc1
> [24573.694051]  [<ffffffff8107ff07>] ? lock_acquire+0xee/0xfd
> [24573.694051]  [<ffffffff810fb080>] ? might_fault+0x5c/0xac
> [24573.694051]  [<ffffffff8107fe0d>] ? lock_release+0x19a/0x1a6
> [24573.694051]  [<ffffffff810fb0c9>] ? might_fault+0xa5/0xac
> [24573.694051]  [<ffffffff813e4dbb>] ? copy_from_user+0x2f/0x31
> [24573.694051]  [<ffffffff813e51ae>] ? verify_iovec+0x57/0x99
> [24573.694051]  [<ffffffff813dc971>] sys_sendmsg+0x235/0x2b3
> [24573.694051]  [<ffffffff8112bb26>] ? rcu_read_lock+0x0/0x35
> [24573.694051]  [<ffffffff8107ff07>] ? lock_acquire+0xee/0xfd
> [24573.694051]  [<ffffffff8112bb26>] ? rcu_read_lock+0x0/0x35
> [24573.694051]  [<ffffffff813dc405>] ? sys_sendto+0x125/0x152
> [24573.694051]  [<ffffffff8112c5a2>] ? fput+0x22/0x1d6
> [24573.694051]  [<ffffffff8112c4ae>] ? fget_light+0x79/0x83
> [24573.694051]  [<ffffffff81133e5b>] ? path_put+0x22/0x27
> [24573.694051]  [<ffffffff810a8443>] ? audit_syscall_entry+0x11c/0x148
> [24573.694051]  [<ffffffff8149da45>] ? trace_hardirqs_on_thunk+0x3a/0x3f
> [24573.694051]  [<ffffffff81009cf2>] system_call_fastpath+0x16/0x1b
> [24573.694051] Code: d4 48 89 ca 48 89 7d b8 6b 8f 84 00 00 00 09 48 89 75 c8
> 4d 89 c7 83 c1 12 83 f9 3f 7f 0f 4a 8d 44 06 ff 48 d3 e8 48 85 c0 74 02 <0f>
> 0b
> 41 f6 c1 03 b8 ea ff ff ff 0f 84 6b 01 00 00 41 81 e1 03 
> [24573.694051] RIP  [<ffffffff8126ea48>] __domain_mapping+0x43/0x1ce
> [24573.694051]  RSP <ffff880133727648>
> [24573.821392] ---[ end trace 391efc8948e1496b ]---
> [24573.832050] NetworkManager used greatest stack depth: 2064 bytes left
> [24574.026042] usb 4-2: new full speed USB device using uhci_hcd and address
> 3
> [24574.187102] usb 4-2: New USB device found, idVendor=0a5c, idProduct=2145
> [24574.188244] usb 4-2: New USB device strings: Mfr=1, Product=2,
> SerialNumber=0
> [24574.189418] usb 4-2: Product: ThinkPad Bluetooth with Enhanced Data Rate
> II
> [24574.190567] usb 4-2: Manufacturer: Lenovo Computer Corp
> [24576.230085] usb 2-4: new high speed USB device using ehci_hcd and address
> 3
> [24577.080715] usb 2-4: New USB device found, idVendor=0bdb, idProduct=1900
> [24577.081862] usb 2-4: New USB device strings: Mfr=1, Product=2,
> SerialNumber=3
> [24577.083009] usb 2-4: Product: Ericsson F3507g Mobile Broadband Minicard
> Composite Device
> [24577.084140] usb 2-4: Manufacturer: Ericsson
> [24577.085263] usb 2-4: SerialNumber: 3541430207407750
> [24577.144202] cdc_acm 2-4:1.1: ttyACM0: USB ACM device
> [24577.163044] cdc_acm 2-4:1.3: ttyACM1: USB ACM device
> [24577.174389] cdc_wdm 2-4:1.5: cdc-wdm0: USB WDM device
> [24577.183588] cdc_wdm 2-4:1.6: cdc-wdm1: USB WDM device
> [26974.894966] thinkpad_acpi: EC reports that Thermal Table has changed
> 
> 
> Note that I explicitly have disabled iommu for intel:
> # cat /proc/cmdline 
> ro root=/dev/VolGroup00/lv_root rhgb quiet selinux=0 vga=0x318
> SYSFONT=latarcyrheb-sun16 LANG=en_US.UTF-8 KEYTABLE=no intel_iommu=igfx_off
> 
> I've seen this on 2.6.36-0.35.rc7.git0.fc15.x86_64,
> 2.6.36-0.27.rc5.git6.fc15.x86_64,2.6.36-0.32.rc6.git2.fc15.x86_64.

I don't have any of those kernel versions here, but I'm guessing that
this test is triggering:

	BUG_ON(addr_width < BITS_PER_LONG && (iov_pfn + nr_pages - 1) >> addr_width);

It could be that e1000e is feeding in garbage, or it could be that
intel-iommu is screwed up.


It's a bit hard to tell what's happening because that BUG_ON was quite
poorly thought out.  It tests three different variables, doesn't tell
us their values and even though it _could_ cleanly recover and allow
the machine to continue to operate it simply whacks the box.

So we now have a pickle on our hands, because you use prebuilt kernels
and are probably not in a position to test patches.
Comment 3 MartinG 2010-10-11 21:18:29 UTC
On Mon, Oct 11, 2010 at 10:45 PM,  <bugzilla-daemon@bugzilla.kernel.org> wrote:
> https://bugzilla.kernel.org/show_bug.cgi?id=19942
>
> --- Comment #2 from Andrew Morton <akpm@linux-foundation.org>  2010-10-11
> 20:45:55 ---
> (switched to email.  Please respond via emailed reply-to-all, not via the
> bugzilla web interface).
>
>
> On Sat, 9 Oct 2010 10:07:15 GMT
> bugzilla-daemon@bugzilla.kernel.org wrote:
>
>> https://bugzilla.kernel.org/show_bug.cgi?id=19942
>>
>>            Summary: Not a intel bug: kernel BUG at
>>                     drivers/pci/intel-iommu.c:1656
>>            Product: Drivers
>>            Version: 2.5
>>     Kernel Version: 2.6.36-0.35.rc7.git0.fc15.x86_64
>>           Platform: All
>>         OS/Version: Linux
>>               Tree: Fedora
>>             Status: NEW
>>           Severity: normal
>>           Priority: P1
>>          Component: Network
>>         AssignedTo: drivers_network@kernel-bugs.osdl.org
>>         ReportedBy: gronslet@gmail.com
>>         Regression: No
>>
>>
>> On my Fedora Rawhide system, I keep getting these errors, which kills my
>> wifi
>> and require me to reboot my Lenovo Thinkpad T400. Please also see
>> https://bugzilla.redhat.com/show_bug.cgi?id=637554
>> https://bugs.freedesktop.org/show_bug.cgi?id=30722
>>
>> In the latter, I was asked to file the bug here, as it isn't a intel bug.
>> Fedora Rawhide, kernel-2.6.36-0.35.rc7.git0.fc15.x86_64,
>> xorg-x11-drv-intel-2.12.0-6.fc14.1.x86_64,
>> xorg-x11-drivers-7.4-1.fc14.x86_64
>> xorg-x11-server-utils-7.4-20.fc15.x86_64,
>> NetworkManager-0.8.1-7.git20100831.fc15.x86_64
>>
>> This happens when I resume my laptop after suspend to ram:
>>
>> [24572.218077] PM: resume devices took 0.987 seconds
>> [24572.239068] PM: Finishing wakeup.
>> [24572.239216] Restarting tasks ...
>> [24572.239332] usb 2-4: USB disconnect, address 2
>> [24572.245520] done.
>> [24572.245702] video LNXVIDEO:00: Restoring backlight state
>> [24572.249109] ehci_hcd 0000:00:1d.7: dma_pool_free buffer-2048,
>> ffff880134f9d000/ffffb000 (bad dma)
>> [24572.249631] ehci_hcd 0000:00:1d.7: dma_pool_free buffer-2048,
>> ffff880134f9d080/ffffb080 (bad dma)
>> [24572.249977] cdc_ether 2-4:1.7: wwan0: unregister 'cdc_ether'
>> usb-0000:00:1d.7-4, Mobile Broadband Network Device
>> [24573.685674] ------------[ cut here ]------------
>> [24573.685709] kernel BUG at drivers/pci/intel-iommu.c:1656!
>> [24573.685734] invalid opcode: 0000 [#1] SMP
>> [24573.685761] last sysfs file:
>> /sys/devices/system/cpu/sched_mc_power_savings
>> [24573.685791] CPU 0
>> [24573.685803] Modules linked in: rfcomm sunrpc sco bnep l2cap
>> cpufreq_ondemand
>> acpi_cpufreq freq_table mperf ip6t_REJECT xt_physdev nf_conntrack_ipv6
>> ip6table_filter ipt_MASQUERADE iptable_nat ip6_tables nf_nat sha256_generic
>> cryptd aes_x86_64 aes_generic cbc dm_crypt uinput arc4 ecb
>> snd_hda_codec_conexant snd_hda_intel iwlagn snd_hda_codec snd_hwdep zaurus
>> iwlcore snd_seq snd_seq_device r852 sm_common cdc_ether nand nand_ids
>> nand_ecc
>> microcode mac80211 uvcvideo usbnet mtd mii cdc_acm snd_pcm btusb cdc_wdm
>> bluetooth videodev iTCO_wdt i2c_i801 iTCO_vendor_support joydev cfg80211
>> thinkpad_acpi v4l1_compat v4l2_compat_ioctl32 e1000e snd_timer rfkill
>> snd_page_alloc wmi snd soundcore ipv6 sdhci_pci sdhci firewire_ohci mmc_core
>> firewire_core yenta_socket crc_itu_t i915 drm_kms_helper drm i2c_algo_bit
>> i2c_core video output [last unloaded: scsi_wait_scan]
>> [24573.686007]
>> [24573.686007] Pid: 8321, comm: NetworkManager Not tainted
>> 2.6.36-0.35.rc7.git0.fc15.x86_64 #1 6474AR4/6474AR4
>> [24573.686007] RIP: 0010:[<ffffffff8126ea48>]  [<ffffffff8126ea48>]
>> __domain_mapping+0x43/0x1ce
>> [24573.686007] RSP: 0018:ffff880133727648  EFLAGS: 00010206
>> [24573.694051] RAX: 0000000001ffffff RBX: ffff8800b4687400 RCX:
>> 000000000000001b
>> [24573.694051] RDX: 000000000008b621 RSI: 000ffffffffffdff RDI:
>> ffff8801320f6dc0
>> [24573.694051] RBP: ffff880133727698 R08: 0000000000000001 R09:
>> 0000000000000003
>> [24573.694051] R10: ffff8801320f6df8 R11: 0000000000000000 R12:
>> 0000000000000000
>> [24573.694051] R13: ffff8801320f6dc0 R14: ffff88013bc04ff8 R15:
>> 0000000000000001
>> [24573.694051] FS:  00007fb24c872800(0000) GS:ffff880002c00000(0000)
>> knlGS:0000000000000000
>> [24573.694051] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> [24573.694051] CR2: 000000000042da00 CR3: 000000012fbc2000 CR4:
>> 00000000000006f0
>> [24573.694051] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
>> 0000000000000000
>> [24573.694051] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7:
>> 0000000000000400
>> [24573.694051] Process NetworkManager (pid: 8321, threadinfo
>> ffff880133726000,
>> task ffff88012f8f0000)
>> [24573.694051] Stack:
>> [24573.694051]  ffff88013707e240 ffff8801320f6dc0 ffff880133727698
>> 000ffffffffffdff
>> [24573.694051] <0> 0000000000000000 ffff8800b4687400 000000008b621000
>> ffff8801320f6dc0
>> [24573.694051] <0> ffff88013bc04ff8 0000000000000000 ffff8801337276f8
>> ffffffff8126f710
>> [24573.694051] Call Trace:
>> [24573.694051]  [<ffffffff8126f710>] __intel_map_single.clone.25+0xdc/0x16b
>> [24573.694051]  [<ffffffff8126f88c>] intel_alloc_coherent+0xae/0xd5
>> [24573.694051]  [<ffffffffa018c128>] e1000_alloc_ring_dma.clone.28+0x94/0xc0
>> [e1000e]
>> [24573.694051]  [<ffffffffa018e359>] e1000e_setup_tx_resources+0x65/0xaa
>> [e1000e]
>> [24573.694051]  [<ffffffffa018e891>] e1000_open+0x64/0x41e [e1000e]
>> [24573.694051]  [<ffffffff813eeb45>] __dev_open+0x9b/0xd2
>> [24573.694051]  [<ffffffff813eed87>] __dev_change_flags+0xad/0x130
>> [24573.694051]  [<ffffffff813eee8b>] dev_change_flags+0x21/0x56
>> [24573.694051]  [<ffffffff813f90f9>] do_setlink+0x2ba/0x61f
>> [24573.694051]  [<ffffffff8107d8c7>] ? print_lock_contention_bug+0x1b/0xd5
>> [24573.694051]  [<ffffffff81249f7f>] ? debug_check_no_obj_freed+0x65/0x18a
>> [24573.694051]  [<ffffffff8107d8c7>] ? print_lock_contention_bug+0x1b/0xd5
>> [24573.694051]  [<ffffffff813f96be>] rtnl_setlink+0xd0/0xf2
>> [24573.694051]  [<ffffffff813f99ac>] rtnetlink_rcv_msg+0x1eb/0x201
>> [24573.694051]  [<ffffffff813f97c1>] ? rtnetlink_rcv_msg+0x0/0x201
>> [24573.694051]  [<ffffffff8140d3e5>] netlink_rcv_skb+0x45/0x90
>> [24573.694051]  [<ffffffff813f8d29>] rtnetlink_rcv+0x26/0x2d
>> [24573.694051]  [<ffffffff8140cec0>] netlink_unicast+0xee/0x157
>> [24573.694051]  [<ffffffff8140d1e1>] netlink_sendmsg+0x2b8/0x2d6
>> [24573.694051]  [<ffffffff813da64e>] __sock_sendmsg+0x6b/0x77
>> [24573.694051]  [<ffffffff813da9a8>] sock_sendmsg+0xa8/0xc1
>> [24573.694051]  [<ffffffff8107ff07>] ? lock_acquire+0xee/0xfd
>> [24573.694051]  [<ffffffff810fb080>] ? might_fault+0x5c/0xac
>> [24573.694051]  [<ffffffff8107fe0d>] ? lock_release+0x19a/0x1a6
>> [24573.694051]  [<ffffffff810fb0c9>] ? might_fault+0xa5/0xac
>> [24573.694051]  [<ffffffff813e4dbb>] ? copy_from_user+0x2f/0x31
>> [24573.694051]  [<ffffffff813e51ae>] ? verify_iovec+0x57/0x99
>> [24573.694051]  [<ffffffff813dc971>] sys_sendmsg+0x235/0x2b3
>> [24573.694051]  [<ffffffff8112bb26>] ? rcu_read_lock+0x0/0x35
>> [24573.694051]  [<ffffffff8107ff07>] ? lock_acquire+0xee/0xfd
>> [24573.694051]  [<ffffffff8112bb26>] ? rcu_read_lock+0x0/0x35
>> [24573.694051]  [<ffffffff813dc405>] ? sys_sendto+0x125/0x152
>> [24573.694051]  [<ffffffff8112c5a2>] ? fput+0x22/0x1d6
>> [24573.694051]  [<ffffffff8112c4ae>] ? fget_light+0x79/0x83
>> [24573.694051]  [<ffffffff81133e5b>] ? path_put+0x22/0x27
>> [24573.694051]  [<ffffffff810a8443>] ? audit_syscall_entry+0x11c/0x148
>> [24573.694051]  [<ffffffff8149da45>] ? trace_hardirqs_on_thunk+0x3a/0x3f
>> [24573.694051]  [<ffffffff81009cf2>] system_call_fastpath+0x16/0x1b
>> [24573.694051] Code: d4 48 89 ca 48 89 7d b8 6b 8f 84 00 00 00 09 48 89 75
>> c8
>> 4d 89 c7 83 c1 12 83 f9 3f 7f 0f 4a 8d 44 06 ff 48 d3 e8 48 85 c0 74 02 <0f>
>> 0b
>> 41 f6 c1 03 b8 ea ff ff ff 0f 84 6b 01 00 00 41 81 e1 03
>> [24573.694051] RIP  [<ffffffff8126ea48>] __domain_mapping+0x43/0x1ce
>> [24573.694051]  RSP <ffff880133727648>
>> [24573.821392] ---[ end trace 391efc8948e1496b ]---
>> [24573.832050] NetworkManager used greatest stack depth: 2064 bytes left
>> [24574.026042] usb 4-2: new full speed USB device using uhci_hcd and address
>> 3
>> [24574.187102] usb 4-2: New USB device found, idVendor=0a5c, idProduct=2145
>> [24574.188244] usb 4-2: New USB device strings: Mfr=1, Product=2,
>> SerialNumber=0
>> [24574.189418] usb 4-2: Product: ThinkPad Bluetooth with Enhanced Data Rate
>> II
>> [24574.190567] usb 4-2: Manufacturer: Lenovo Computer Corp
>> [24576.230085] usb 2-4: new high speed USB device using ehci_hcd and address
>> 3
>> [24577.080715] usb 2-4: New USB device found, idVendor=0bdb, idProduct=1900
>> [24577.081862] usb 2-4: New USB device strings: Mfr=1, Product=2,
>> SerialNumber=3
>> [24577.083009] usb 2-4: Product: Ericsson F3507g Mobile Broadband Minicard
>> Composite Device
>> [24577.084140] usb 2-4: Manufacturer: Ericsson
>> [24577.085263] usb 2-4: SerialNumber: 3541430207407750
>> [24577.144202] cdc_acm 2-4:1.1: ttyACM0: USB ACM device
>> [24577.163044] cdc_acm 2-4:1.3: ttyACM1: USB ACM device
>> [24577.174389] cdc_wdm 2-4:1.5: cdc-wdm0: USB WDM device
>> [24577.183588] cdc_wdm 2-4:1.6: cdc-wdm1: USB WDM device
>> [26974.894966] thinkpad_acpi: EC reports that Thermal Table has changed
>>
>>
>> Note that I explicitly have disabled iommu for intel:
>> # cat /proc/cmdline
>> ro root=/dev/VolGroup00/lv_root rhgb quiet selinux=0 vga=0x318
>> SYSFONT=latarcyrheb-sun16 LANG=en_US.UTF-8 KEYTABLE=no intel_iommu=igfx_off
>>
>> I've seen this on 2.6.36-0.35.rc7.git0.fc15.x86_64,
>> 2.6.36-0.27.rc5.git6.fc15.x86_64,2.6.36-0.32.rc6.git2.fc15.x86_64.
>
> I don't have any of those kernel versions here, but I'm guessing that
> this test is triggering:
>
>    BUG_ON(addr_width < BITS_PER_LONG && (iov_pfn + nr_pages - 1) >>
> addr_width);
>
> It could be that e1000e is feeding in garbage, or it could be that
> intel-iommu is screwed up.
>
>
> It's a bit hard to tell what's happening because that BUG_ON was quite
> poorly thought out.  It tests three different variables, doesn't tell
> us their values and even though it _could_ cleanly recover and allow
> the machine to continue to operate it simply whacks the box.
>
> So we now have a pickle on our hands, because you use prebuilt kernels
> and are probably not in a position to test patches.

Thank you for your response! I'd be happy to cook a vanilla kernel and
test. It's been a while since I did that, but I hope this is the
correct thing to do:
localhost:~/linux-2.6.36-rc7:$ cp
/boot/config-2.6.36-0.35.rc7.git0.fc15.x86_64 .config
localhost:~/linux-2.6.36-rc7:$ make -j3
(still building)

Please let me know if I should use a different version and/or a
different config file. I'll post back when/if I get the bug also with
the vanilla kernel.

Thanks,
MartinG
Comment 4 MartinG 2010-10-12 19:05:16 UTC
On Mon, Oct 11, 2010 at 11:16 PM, MartinG <gronslet@gmail.com> wrote:
> On Mon, Oct 11, 2010 at 10:45 PM,  <bugzilla-daemon@bugzilla.kernel.org>
> wrote:
>> https://bugzilla.kernel.org/show_bug.cgi?id=19942
>>
>> --- Comment #2 from Andrew Morton <akpm@linux-foundation.org>  2010-10-11
>> 20:45:55 ---
>> (switched to email.  Please respond via emailed reply-to-all, not via the
>> bugzilla web interface).
>>
>>
>> On Sat, 9 Oct 2010 10:07:15 GMT
>> bugzilla-daemon@bugzilla.kernel.org wrote:
>>
>>> https://bugzilla.kernel.org/show_bug.cgi?id=19942
>>>
>>>            Summary: Not a intel bug: kernel BUG at
>>>                     drivers/pci/intel-iommu.c:1656
>>>            Product: Drivers
>>>            Version: 2.5
>>>     Kernel Version: 2.6.36-0.35.rc7.git0.fc15.x86_64
>>>           Platform: All
>>>         OS/Version: Linux
>>>               Tree: Fedora
>>>             Status: NEW
>>>           Severity: normal
>>>           Priority: P1
>>>          Component: Network
>>>         AssignedTo: drivers_network@kernel-bugs.osdl.org
>>>         ReportedBy: gronslet@gmail.com
>>>         Regression: No
>>>
>>>
>>> On my Fedora Rawhide system, I keep getting these errors, which kills my
>>> wifi
>>> and require me to reboot my Lenovo Thinkpad T400. Please also see
>>> https://bugzilla.redhat.com/show_bug.cgi?id=637554
>>> https://bugs.freedesktop.org/show_bug.cgi?id=30722
>>>
>>> In the latter, I was asked to file the bug here, as it isn't a intel bug.
>>> Fedora Rawhide, kernel-2.6.36-0.35.rc7.git0.fc15.x86_64,
>>> xorg-x11-drv-intel-2.12.0-6.fc14.1.x86_64,
>>> xorg-x11-drivers-7.4-1.fc14.x86_64
>>> xorg-x11-server-utils-7.4-20.fc15.x86_64,
>>> NetworkManager-0.8.1-7.git20100831.fc15.x86_64
>>>
>>> This happens when I resume my laptop after suspend to ram:
>>>
>>> [24572.218077] PM: resume devices took 0.987 seconds
>>> [24572.239068] PM: Finishing wakeup.
>>> [24572.239216] Restarting tasks ...
>>> [24572.239332] usb 2-4: USB disconnect, address 2
>>> [24572.245520] done.
>>> [24572.245702] video LNXVIDEO:00: Restoring backlight state
>>> [24572.249109] ehci_hcd 0000:00:1d.7: dma_pool_free buffer-2048,
>>> ffff880134f9d000/ffffb000 (bad dma)
>>> [24572.249631] ehci_hcd 0000:00:1d.7: dma_pool_free buffer-2048,
>>> ffff880134f9d080/ffffb080 (bad dma)
>>> [24572.249977] cdc_ether 2-4:1.7: wwan0: unregister 'cdc_ether'
>>> usb-0000:00:1d.7-4, Mobile Broadband Network Device
>>> [24573.685674] ------------[ cut here ]------------
>>> [24573.685709] kernel BUG at drivers/pci/intel-iommu.c:1656!
>>> [24573.685734] invalid opcode: 0000 [#1] SMP
>>> [24573.685761] last sysfs file:
>>> /sys/devices/system/cpu/sched_mc_power_savings
>>> [24573.685791] CPU 0
>>> [24573.685803] Modules linked in: rfcomm sunrpc sco bnep l2cap
>>> cpufreq_ondemand
>>> acpi_cpufreq freq_table mperf ip6t_REJECT xt_physdev nf_conntrack_ipv6
>>> ip6table_filter ipt_MASQUERADE iptable_nat ip6_tables nf_nat sha256_generic
>>> cryptd aes_x86_64 aes_generic cbc dm_crypt uinput arc4 ecb
>>> snd_hda_codec_conexant snd_hda_intel iwlagn snd_hda_codec snd_hwdep zaurus
>>> iwlcore snd_seq snd_seq_device r852 sm_common cdc_ether nand nand_ids
>>> nand_ecc
>>> microcode mac80211 uvcvideo usbnet mtd mii cdc_acm snd_pcm btusb cdc_wdm
>>> bluetooth videodev iTCO_wdt i2c_i801 iTCO_vendor_support joydev cfg80211
>>> thinkpad_acpi v4l1_compat v4l2_compat_ioctl32 e1000e snd_timer rfkill
>>> snd_page_alloc wmi snd soundcore ipv6 sdhci_pci sdhci firewire_ohci
>>> mmc_core
>>> firewire_core yenta_socket crc_itu_t i915 drm_kms_helper drm i2c_algo_bit
>>> i2c_core video output [last unloaded: scsi_wait_scan]
>>> [24573.686007]
>>> [24573.686007] Pid: 8321, comm: NetworkManager Not tainted
>>> 2.6.36-0.35.rc7.git0.fc15.x86_64 #1 6474AR4/6474AR4
>>> [24573.686007] RIP: 0010:[<ffffffff8126ea48>]  [<ffffffff8126ea48>]
>>> __domain_mapping+0x43/0x1ce
>>> [24573.686007] RSP: 0018:ffff880133727648  EFLAGS: 00010206
>>> [24573.694051] RAX: 0000000001ffffff RBX: ffff8800b4687400 RCX:
>>> 000000000000001b
>>> [24573.694051] RDX: 000000000008b621 RSI: 000ffffffffffdff RDI:
>>> ffff8801320f6dc0
>>> [24573.694051] RBP: ffff880133727698 R08: 0000000000000001 R09:
>>> 0000000000000003
>>> [24573.694051] R10: ffff8801320f6df8 R11: 0000000000000000 R12:
>>> 0000000000000000
>>> [24573.694051] R13: ffff8801320f6dc0 R14: ffff88013bc04ff8 R15:
>>> 0000000000000001
>>> [24573.694051] FS:  00007fb24c872800(0000) GS:ffff880002c00000(0000)
>>> knlGS:0000000000000000
>>> [24573.694051] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>> [24573.694051] CR2: 000000000042da00 CR3: 000000012fbc2000 CR4:
>>> 00000000000006f0
>>> [24573.694051] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
>>> 0000000000000000
>>> [24573.694051] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7:
>>> 0000000000000400
>>> [24573.694051] Process NetworkManager (pid: 8321, threadinfo
>>> ffff880133726000,
>>> task ffff88012f8f0000)
>>> [24573.694051] Stack:
>>> [24573.694051]  ffff88013707e240 ffff8801320f6dc0 ffff880133727698
>>> 000ffffffffffdff
>>> [24573.694051] <0> 0000000000000000 ffff8800b4687400 000000008b621000
>>> ffff8801320f6dc0
>>> [24573.694051] <0> ffff88013bc04ff8 0000000000000000 ffff8801337276f8
>>> ffffffff8126f710
>>> [24573.694051] Call Trace:
>>> [24573.694051]  [<ffffffff8126f710>] __intel_map_single.clone.25+0xdc/0x16b
>>> [24573.694051]  [<ffffffff8126f88c>] intel_alloc_coherent+0xae/0xd5
>>> [24573.694051]  [<ffffffffa018c128>]
>>> e1000_alloc_ring_dma.clone.28+0x94/0xc0
>>> [e1000e]
>>> [24573.694051]  [<ffffffffa018e359>] e1000e_setup_tx_resources+0x65/0xaa
>>> [e1000e]
>>> [24573.694051]  [<ffffffffa018e891>] e1000_open+0x64/0x41e [e1000e]
>>> [24573.694051]  [<ffffffff813eeb45>] __dev_open+0x9b/0xd2
>>> [24573.694051]  [<ffffffff813eed87>] __dev_change_flags+0xad/0x130
>>> [24573.694051]  [<ffffffff813eee8b>] dev_change_flags+0x21/0x56
>>> [24573.694051]  [<ffffffff813f90f9>] do_setlink+0x2ba/0x61f
>>> [24573.694051]  [<ffffffff8107d8c7>] ? print_lock_contention_bug+0x1b/0xd5
>>> [24573.694051]  [<ffffffff81249f7f>] ? debug_check_no_obj_freed+0x65/0x18a
>>> [24573.694051]  [<ffffffff8107d8c7>] ? print_lock_contention_bug+0x1b/0xd5
>>> [24573.694051]  [<ffffffff813f96be>] rtnl_setlink+0xd0/0xf2
>>> [24573.694051]  [<ffffffff813f99ac>] rtnetlink_rcv_msg+0x1eb/0x201
>>> [24573.694051]  [<ffffffff813f97c1>] ? rtnetlink_rcv_msg+0x0/0x201
>>> [24573.694051]  [<ffffffff8140d3e5>] netlink_rcv_skb+0x45/0x90
>>> [24573.694051]  [<ffffffff813f8d29>] rtnetlink_rcv+0x26/0x2d
>>> [24573.694051]  [<ffffffff8140cec0>] netlink_unicast+0xee/0x157
>>> [24573.694051]  [<ffffffff8140d1e1>] netlink_sendmsg+0x2b8/0x2d6
>>> [24573.694051]  [<ffffffff813da64e>] __sock_sendmsg+0x6b/0x77
>>> [24573.694051]  [<ffffffff813da9a8>] sock_sendmsg+0xa8/0xc1
>>> [24573.694051]  [<ffffffff8107ff07>] ? lock_acquire+0xee/0xfd
>>> [24573.694051]  [<ffffffff810fb080>] ? might_fault+0x5c/0xac
>>> [24573.694051]  [<ffffffff8107fe0d>] ? lock_release+0x19a/0x1a6
>>> [24573.694051]  [<ffffffff810fb0c9>] ? might_fault+0xa5/0xac
>>> [24573.694051]  [<ffffffff813e4dbb>] ? copy_from_user+0x2f/0x31
>>> [24573.694051]  [<ffffffff813e51ae>] ? verify_iovec+0x57/0x99
>>> [24573.694051]  [<ffffffff813dc971>] sys_sendmsg+0x235/0x2b3
>>> [24573.694051]  [<ffffffff8112bb26>] ? rcu_read_lock+0x0/0x35
>>> [24573.694051]  [<ffffffff8107ff07>] ? lock_acquire+0xee/0xfd
>>> [24573.694051]  [<ffffffff8112bb26>] ? rcu_read_lock+0x0/0x35
>>> [24573.694051]  [<ffffffff813dc405>] ? sys_sendto+0x125/0x152
>>> [24573.694051]  [<ffffffff8112c5a2>] ? fput+0x22/0x1d6
>>> [24573.694051]  [<ffffffff8112c4ae>] ? fget_light+0x79/0x83
>>> [24573.694051]  [<ffffffff81133e5b>] ? path_put+0x22/0x27
>>> [24573.694051]  [<ffffffff810a8443>] ? audit_syscall_entry+0x11c/0x148
>>> [24573.694051]  [<ffffffff8149da45>] ? trace_hardirqs_on_thunk+0x3a/0x3f
>>> [24573.694051]  [<ffffffff81009cf2>] system_call_fastpath+0x16/0x1b
>>> [24573.694051] Code: d4 48 89 ca 48 89 7d b8 6b 8f 84 00 00 00 09 48 89 75
>>> c8
>>> 4d 89 c7 83 c1 12 83 f9 3f 7f 0f 4a 8d 44 06 ff 48 d3 e8 48 85 c0 74 02
>>> <0f> 0b
>>> 41 f6 c1 03 b8 ea ff ff ff 0f 84 6b 01 00 00 41 81 e1 03
>>> [24573.694051] RIP  [<ffffffff8126ea48>] __domain_mapping+0x43/0x1ce
>>> [24573.694051]  RSP <ffff880133727648>
>>> [24573.821392] ---[ end trace 391efc8948e1496b ]---
>>> [24573.832050] NetworkManager used greatest stack depth: 2064 bytes left
>>> [24574.026042] usb 4-2: new full speed USB device using uhci_hcd and
>>> address 3
>>> [24574.187102] usb 4-2: New USB device found, idVendor=0a5c, idProduct=2145
>>> [24574.188244] usb 4-2: New USB device strings: Mfr=1, Product=2,
>>> SerialNumber=0
>>> [24574.189418] usb 4-2: Product: ThinkPad Bluetooth with Enhanced Data Rate
>>> II
>>> [24574.190567] usb 4-2: Manufacturer: Lenovo Computer Corp
>>> [24576.230085] usb 2-4: new high speed USB device using ehci_hcd and
>>> address 3
>>> [24577.080715] usb 2-4: New USB device found, idVendor=0bdb, idProduct=1900
>>> [24577.081862] usb 2-4: New USB device strings: Mfr=1, Product=2,
>>> SerialNumber=3
>>> [24577.083009] usb 2-4: Product: Ericsson F3507g Mobile Broadband Minicard
>>> Composite Device
>>> [24577.084140] usb 2-4: Manufacturer: Ericsson
>>> [24577.085263] usb 2-4: SerialNumber: 3541430207407750
>>> [24577.144202] cdc_acm 2-4:1.1: ttyACM0: USB ACM device
>>> [24577.163044] cdc_acm 2-4:1.3: ttyACM1: USB ACM device
>>> [24577.174389] cdc_wdm 2-4:1.5: cdc-wdm0: USB WDM device
>>> [24577.183588] cdc_wdm 2-4:1.6: cdc-wdm1: USB WDM device
>>> [26974.894966] thinkpad_acpi: EC reports that Thermal Table has changed
>>>
>>>
>>> Note that I explicitly have disabled iommu for intel:
>>> # cat /proc/cmdline
>>> ro root=/dev/VolGroup00/lv_root rhgb quiet selinux=0 vga=0x318
>>> SYSFONT=latarcyrheb-sun16 LANG=en_US.UTF-8 KEYTABLE=no intel_iommu=igfx_off
>>>
>>> I've seen this on 2.6.36-0.35.rc7.git0.fc15.x86_64,
>>> 2.6.36-0.27.rc5.git6.fc15.x86_64,2.6.36-0.32.rc6.git2.fc15.x86_64.
>>
>> I don't have any of those kernel versions here, but I'm guessing that
>> this test is triggering:
>>
>>    BUG_ON(addr_width < BITS_PER_LONG && (iov_pfn + nr_pages - 1) >>
>> addr_width);
>>
>> It could be that e1000e is feeding in garbage, or it could be that
>> intel-iommu is screwed up.
>>
>>
>> It's a bit hard to tell what's happening because that BUG_ON was quite
>> poorly thought out.  It tests three different variables, doesn't tell
>> us their values and even though it _could_ cleanly recover and allow
>> the machine to continue to operate it simply whacks the box.
>>
>> So we now have a pickle on our hands, because you use prebuilt kernels
>> and are probably not in a position to test patches.
>
> Thank you for your response! I'd be happy to cook a vanilla kernel and
> test. It's been a while since I did that, but I hope this is the
> correct thing to do:
> localhost:~/linux-2.6.36-rc7:$ cp
> /boot/config-2.6.36-0.35.rc7.git0.fc15.x86_64 .config
> localhost:~/linux-2.6.36-rc7:$ make -j3
> (still building)
>
> Please let me know if I should use a different version and/or a
> different config file. I'll post back when/if I get the bug also with
> the vanilla kernel.

Okay, so this is what I found: When using the vanilla
kernel-2.6.36rc7-2.x86_64 (config file from Fedora Rawhide
config-2.6.36-0.35.rc7.git0.fc15.x86_64), I can not reproduce the bug,
while on the Fedora Rawhide kernel-2.6.36-0.35.rc7.git0.fc15.x86_64 I
can reproduce the bug 100% if I do "service NetworkManager restart".

So what do you suggest? Should I apply some patches to the vanilla
kernel to see what is causing this? Suggestions appreciated.

Thanks,
MartinG