Bug 19942 - Not a intel bug: kernel BUG at drivers/pci/intel-iommu.c:1656
Summary: Not a intel bug: kernel BUG at drivers/pci/intel-iommu.c:1656
Status: RESOLVED OBSOLETE
Alias: None
Product: Drivers
Classification: Unclassified
Component: Network (show other bugs)
Hardware: All Linux
: P1 normal
Assignee: drivers_network@kernel-bugs.osdl.org
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2010-10-09 10:07 UTC by MartinG
Modified: 2013-12-10 22:13 UTC (History)
1 user (show)

See Also:
Kernel Version: 2.6.36-0.35.rc7.git0.fc15.x86_64
Subsystem:
Regression: No
Bisected commit-id:


Attachments

Description MartinG 2010-10-09 10:07:08 UTC
On my Fedora Rawhide system, I keep getting these errors, which kills my wifi and require me to reboot my Lenovo Thinkpad T400. Please also see
https://bugzilla.redhat.com/show_bug.cgi?id=637554
https://bugs.freedesktop.org/show_bug.cgi?id=30722

In the latter, I was asked to file the bug here, as it isn't a intel bug.
Fedora Rawhide, kernel-2.6.36-0.35.rc7.git0.fc15.x86_64,
xorg-x11-drv-intel-2.12.0-6.fc14.1.x86_64, xorg-x11-drivers-7.4-1.fc14.x86_64
xorg-x11-server-utils-7.4-20.fc15.x86_64, NetworkManager-0.8.1-7.git20100831.fc15.x86_64

This happens when I resume my laptop after suspend to ram:

[24572.218077] PM: resume devices took 0.987 seconds
[24572.239068] PM: Finishing wakeup.
[24572.239216] Restarting tasks ... 
[24572.239332] usb 2-4: USB disconnect, address 2
[24572.245520] done.
[24572.245702] video LNXVIDEO:00: Restoring backlight state
[24572.249109] ehci_hcd 0000:00:1d.7: dma_pool_free buffer-2048, ffff880134f9d000/ffffb000 (bad dma)
[24572.249631] ehci_hcd 0000:00:1d.7: dma_pool_free buffer-2048, ffff880134f9d080/ffffb080 (bad dma)
[24572.249977] cdc_ether 2-4:1.7: wwan0: unregister 'cdc_ether' usb-0000:00:1d.7-4, Mobile Broadband Network Device
[24573.685674] ------------[ cut here ]------------
[24573.685709] kernel BUG at drivers/pci/intel-iommu.c:1656!
[24573.685734] invalid opcode: 0000 [#1] SMP 
[24573.685761] last sysfs file: /sys/devices/system/cpu/sched_mc_power_savings
[24573.685791] CPU 0 
[24573.685803] Modules linked in: rfcomm sunrpc sco bnep l2cap cpufreq_ondemand acpi_cpufreq freq_table mperf ip6t_REJECT xt_physdev nf_conntrack_ipv6 ip6table_filter ipt_MASQUERADE iptable_nat ip6_tables nf_nat sha256_generic cryptd aes_x86_64 aes_generic cbc dm_crypt uinput arc4 ecb snd_hda_codec_conexant snd_hda_intel iwlagn snd_hda_codec snd_hwdep zaurus iwlcore snd_seq snd_seq_device r852 sm_common cdc_ether nand nand_ids nand_ecc microcode mac80211 uvcvideo usbnet mtd mii cdc_acm snd_pcm btusb cdc_wdm bluetooth videodev iTCO_wdt i2c_i801 iTCO_vendor_support joydev cfg80211 thinkpad_acpi v4l1_compat v4l2_compat_ioctl32 e1000e snd_timer rfkill snd_page_alloc wmi snd soundcore ipv6 sdhci_pci sdhci firewire_ohci mmc_core firewire_core yenta_socket crc_itu_t i915 drm_kms_helper drm i2c_algo_bit i2c_core video output [last unloaded: scsi_wait_scan]
[24573.686007] 
[24573.686007] Pid: 8321, comm: NetworkManager Not tainted 2.6.36-0.35.rc7.git0.fc15.x86_64 #1 6474AR4/6474AR4
[24573.686007] RIP: 0010:[<ffffffff8126ea48>]  [<ffffffff8126ea48>] __domain_mapping+0x43/0x1ce
[24573.686007] RSP: 0018:ffff880133727648  EFLAGS: 00010206
[24573.694051] RAX: 0000000001ffffff RBX: ffff8800b4687400 RCX: 000000000000001b
[24573.694051] RDX: 000000000008b621 RSI: 000ffffffffffdff RDI: ffff8801320f6dc0
[24573.694051] RBP: ffff880133727698 R08: 0000000000000001 R09: 0000000000000003
[24573.694051] R10: ffff8801320f6df8 R11: 0000000000000000 R12: 0000000000000000
[24573.694051] R13: ffff8801320f6dc0 R14: ffff88013bc04ff8 R15: 0000000000000001
[24573.694051] FS:  00007fb24c872800(0000) GS:ffff880002c00000(0000) knlGS:0000000000000000
[24573.694051] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[24573.694051] CR2: 000000000042da00 CR3: 000000012fbc2000 CR4: 00000000000006f0
[24573.694051] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[24573.694051] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[24573.694051] Process NetworkManager (pid: 8321, threadinfo ffff880133726000, task ffff88012f8f0000)
[24573.694051] Stack:
[24573.694051]  ffff88013707e240 ffff8801320f6dc0 ffff880133727698 000ffffffffffdff
[24573.694051] <0> 0000000000000000 ffff8800b4687400 000000008b621000 ffff8801320f6dc0
[24573.694051] <0> ffff88013bc04ff8 0000000000000000 ffff8801337276f8 ffffffff8126f710
[24573.694051] Call Trace:
[24573.694051]  [<ffffffff8126f710>] __intel_map_single.clone.25+0xdc/0x16b
[24573.694051]  [<ffffffff8126f88c>] intel_alloc_coherent+0xae/0xd5
[24573.694051]  [<ffffffffa018c128>] e1000_alloc_ring_dma.clone.28+0x94/0xc0 [e1000e]
[24573.694051]  [<ffffffffa018e359>] e1000e_setup_tx_resources+0x65/0xaa [e1000e]
[24573.694051]  [<ffffffffa018e891>] e1000_open+0x64/0x41e [e1000e]
[24573.694051]  [<ffffffff813eeb45>] __dev_open+0x9b/0xd2
[24573.694051]  [<ffffffff813eed87>] __dev_change_flags+0xad/0x130
[24573.694051]  [<ffffffff813eee8b>] dev_change_flags+0x21/0x56
[24573.694051]  [<ffffffff813f90f9>] do_setlink+0x2ba/0x61f
[24573.694051]  [<ffffffff8107d8c7>] ? print_lock_contention_bug+0x1b/0xd5
[24573.694051]  [<ffffffff81249f7f>] ? debug_check_no_obj_freed+0x65/0x18a
[24573.694051]  [<ffffffff8107d8c7>] ? print_lock_contention_bug+0x1b/0xd5
[24573.694051]  [<ffffffff813f96be>] rtnl_setlink+0xd0/0xf2
[24573.694051]  [<ffffffff813f99ac>] rtnetlink_rcv_msg+0x1eb/0x201
[24573.694051]  [<ffffffff813f97c1>] ? rtnetlink_rcv_msg+0x0/0x201
[24573.694051]  [<ffffffff8140d3e5>] netlink_rcv_skb+0x45/0x90
[24573.694051]  [<ffffffff813f8d29>] rtnetlink_rcv+0x26/0x2d
[24573.694051]  [<ffffffff8140cec0>] netlink_unicast+0xee/0x157
[24573.694051]  [<ffffffff8140d1e1>] netlink_sendmsg+0x2b8/0x2d6
[24573.694051]  [<ffffffff813da64e>] __sock_sendmsg+0x6b/0x77
[24573.694051]  [<ffffffff813da9a8>] sock_sendmsg+0xa8/0xc1
[24573.694051]  [<ffffffff8107ff07>] ? lock_acquire+0xee/0xfd
[24573.694051]  [<ffffffff810fb080>] ? might_fault+0x5c/0xac
[24573.694051]  [<ffffffff8107fe0d>] ? lock_release+0x19a/0x1a6
[24573.694051]  [<ffffffff810fb0c9>] ? might_fault+0xa5/0xac
[24573.694051]  [<ffffffff813e4dbb>] ? copy_from_user+0x2f/0x31
[24573.694051]  [<ffffffff813e51ae>] ? verify_iovec+0x57/0x99
[24573.694051]  [<ffffffff813dc971>] sys_sendmsg+0x235/0x2b3
[24573.694051]  [<ffffffff8112bb26>] ? rcu_read_lock+0x0/0x35
[24573.694051]  [<ffffffff8107ff07>] ? lock_acquire+0xee/0xfd
[24573.694051]  [<ffffffff8112bb26>] ? rcu_read_lock+0x0/0x35
[24573.694051]  [<ffffffff813dc405>] ? sys_sendto+0x125/0x152
[24573.694051]  [<ffffffff8112c5a2>] ? fput+0x22/0x1d6
[24573.694051]  [<ffffffff8112c4ae>] ? fget_light+0x79/0x83
[24573.694051]  [<ffffffff81133e5b>] ? path_put+0x22/0x27
[24573.694051]  [<ffffffff810a8443>] ? audit_syscall_entry+0x11c/0x148
[24573.694051]  [<ffffffff8149da45>] ? trace_hardirqs_on_thunk+0x3a/0x3f
[24573.694051]  [<ffffffff81009cf2>] system_call_fastpath+0x16/0x1b
[24573.694051] Code: d4 48 89 ca 48 89 7d b8 6b 8f 84 00 00 00 09 48 89 75 c8 4d 89 c7 83 c1 12 83 f9 3f 7f 0f 4a 8d 44 06 ff 48 d3 e8 48 85 c0 74 02 <0f> 0b 41 f6 c1 03 b8 ea ff ff ff 0f 84 6b 01 00 00 41 81 e1 03 
[24573.694051] RIP  [<ffffffff8126ea48>] __domain_mapping+0x43/0x1ce
[24573.694051]  RSP <ffff880133727648>
[24573.821392] ---[ end trace 391efc8948e1496b ]---
[24573.832050] NetworkManager used greatest stack depth: 2064 bytes left
[24574.026042] usb 4-2: new full speed USB device using uhci_hcd and address 3
[24574.187102] usb 4-2: New USB device found, idVendor=0a5c, idProduct=2145
[24574.188244] usb 4-2: New USB device strings: Mfr=1, Product=2, SerialNumber=0
[24574.189418] usb 4-2: Product: ThinkPad Bluetooth with Enhanced Data Rate II
[24574.190567] usb 4-2: Manufacturer: Lenovo Computer Corp
[24576.230085] usb 2-4: new high speed USB device using ehci_hcd and address 3
[24577.080715] usb 2-4: New USB device found, idVendor=0bdb, idProduct=1900
[24577.081862] usb 2-4: New USB device strings: Mfr=1, Product=2, SerialNumber=3
[24577.083009] usb 2-4: Product: Ericsson F3507g Mobile Broadband Minicard Composite Device
[24577.084140] usb 2-4: Manufacturer: Ericsson
[24577.085263] usb 2-4: SerialNumber: 3541430207407750
[24577.144202] cdc_acm 2-4:1.1: ttyACM0: USB ACM device
[24577.163044] cdc_acm 2-4:1.3: ttyACM1: USB ACM device
[24577.174389] cdc_wdm 2-4:1.5: cdc-wdm0: USB WDM device
[24577.183588] cdc_wdm 2-4:1.6: cdc-wdm1: USB WDM device
[26974.894966] thinkpad_acpi: EC reports that Thermal Table has changed


Note that I explicitly have disabled iommu for intel:
# cat /proc/cmdline 
ro root=/dev/VolGroup00/lv_root rhgb quiet selinux=0 vga=0x318 SYSFONT=latarcyrheb-sun16 LANG=en_US.UTF-8 KEYTABLE=no intel_iommu=igfx_off

I've seen this on 2.6.36-0.35.rc7.git0.fc15.x86_64, 2.6.36-0.27.rc5.git6.fc15.x86_64,2.6.36-0.32.rc6.git2.fc15.x86_64.
Comment 1 MartinG 2010-10-09 17:44:11 UTC
I can actually trigger this bug by simply doing "service NetworkManager restart". Seems I am unable to start any apps after the bug occurs.

NetworkManager-0.8.1-8.git20100831.fc15.x86_64 (actually updated from NetworkManager-0.8.1-7.git20100831.fc15.x86_64 right before i restarted it - don't think that NetworkManager itself is the cause, since I've seen this bug several time on the previous version of NM.)

This is my network controller:
03:00.0 Network controller: Intel Corporation Ultimate N WiFi Link 5300
        Subsystem: Intel Corporation Device 1011
        Physical Slot: 1
        Flags: bus master, fast devsel, latency 0, IRQ 50
        Memory at f4300000 (64-bit, non-prefetchable) [size=8K]
        Capabilities: [c8] Power Management version 3
        Capabilities: [d0] MSI: Enable+ Count=1/1 Maskable- 64bit+
        Capabilities: [e0] Express Endpoint, MSI 00
        Capabilities: [100] Advanced Error Reporting
        Capabilities: [140] Device Serial Number 00-16-ea-ff-ff-e3-60-d4
        Kernel driver in use: iwlagn
        Kernel modules: iwlagn



These are my modules: 
coretemp sunrpc cpufreq_ondemand acpi_cpufreq freq_table mperf rfcomm ip6t_REJECT nf_conntrack_ipv6 xt_physdev ipt_MASQUERADE iptable_nat ip6table_filter nf_nat ip6_tables sco bnep l2cap sha256_generic cryptd aes_x86_64 aes_generic cbc dm_crypt uinput snd_hda_codec_conexant arc4 ecb snd_hda_intel iwlagn snd_hda_codec snd_hwdep snd_seq microcode snd_seq_device iwlcore zaurus r852 sm_common cdc_ether nand snd_pcm uvcvideo usbnet nand_ids nand_ecc mac80211 joydev mii btusb videodev cdc_wdm cdc_acm mtd i2c_i801 iTCO_wdt v4l1_compat v4l2_compat_ioctl32 iTCO_vendor_support bluetooth cfg80211 snd_timer thinkpad_acpi rfkill snd_page_alloc e1000e snd wmi soundcore ipv6 sdhci_pci sdhci firewire_ohci firewire_core crc_itu_t mmc_core yenta_socket i915 drm_kms_helper drm i2c_algo_bit i2c_core video output

Specifically:
/lib/modules/2.6.36-0.35.rc7.git0.fc15.x86_64/kernel/drivers/net/e1000e/e1000e.ko
/lib/modules/2.6.36-0.35.rc7.git0.fc15.x86_64/kernel/drivers/net/wireless/iwlwifi/iwlagn.ko
/lib/modules/2.6.36-0.35.rc7.git0.fc15.x86_64/kernel/drivers/net/wireless/iwlwifi/iwlcore.ko
Comment 2 Andrew Morton 2010-10-11 20:45:55 UTC
(switched to email.  Please respond via emailed reply-to-all, not via the
bugzilla web interface).


On Sat, 9 Oct 2010 10:07:15 GMT
bugzilla-daemon@bugzilla.kernel.org wrote:

> https://bugzilla.kernel.org/show_bug.cgi?id=19942
> 
>            Summary: Not a intel bug: kernel BUG at
>                     drivers/pci/intel-iommu.c:1656
>            Product: Drivers
>            Version: 2.5
>     Kernel Version: 2.6.36-0.35.rc7.git0.fc15.x86_64
>           Platform: All
>         OS/Version: Linux
>               Tree: Fedora
>             Status: NEW
>           Severity: normal
>           Priority: P1
>          Component: Network
>         AssignedTo: drivers_network@kernel-bugs.osdl.org
>         ReportedBy: gronslet@gmail.com
>         Regression: No
> 
> 
> On my Fedora Rawhide system, I keep getting these errors, which kills my wifi
> and require me to reboot my Lenovo Thinkpad T400. Please also see
> https://bugzilla.redhat.com/show_bug.cgi?id=637554
> https://bugs.freedesktop.org/show_bug.cgi?id=30722
> 
> In the latter, I was asked to file the bug here, as it isn't a intel bug.
> Fedora Rawhide, kernel-2.6.36-0.35.rc7.git0.fc15.x86_64,
> xorg-x11-drv-intel-2.12.0-6.fc14.1.x86_64, xorg-x11-drivers-7.4-1.fc14.x86_64
> xorg-x11-server-utils-7.4-20.fc15.x86_64,
> NetworkManager-0.8.1-7.git20100831.fc15.x86_64
> 
> This happens when I resume my laptop after suspend to ram:
> 
> [24572.218077] PM: resume devices took 0.987 seconds
> [24572.239068] PM: Finishing wakeup.
> [24572.239216] Restarting tasks ... 
> [24572.239332] usb 2-4: USB disconnect, address 2
> [24572.245520] done.
> [24572.245702] video LNXVIDEO:00: Restoring backlight state
> [24572.249109] ehci_hcd 0000:00:1d.7: dma_pool_free buffer-2048,
> ffff880134f9d000/ffffb000 (bad dma)
> [24572.249631] ehci_hcd 0000:00:1d.7: dma_pool_free buffer-2048,
> ffff880134f9d080/ffffb080 (bad dma)
> [24572.249977] cdc_ether 2-4:1.7: wwan0: unregister 'cdc_ether'
> usb-0000:00:1d.7-4, Mobile Broadband Network Device
> [24573.685674] ------------[ cut here ]------------
> [24573.685709] kernel BUG at drivers/pci/intel-iommu.c:1656!
> [24573.685734] invalid opcode: 0000 [#1] SMP 
> [24573.685761] last sysfs file:
> /sys/devices/system/cpu/sched_mc_power_savings
> [24573.685791] CPU 0 
> [24573.685803] Modules linked in: rfcomm sunrpc sco bnep l2cap
> cpufreq_ondemand
> acpi_cpufreq freq_table mperf ip6t_REJECT xt_physdev nf_conntrack_ipv6
> ip6table_filter ipt_MASQUERADE iptable_nat ip6_tables nf_nat sha256_generic
> cryptd aes_x86_64 aes_generic cbc dm_crypt uinput arc4 ecb
> snd_hda_codec_conexant snd_hda_intel iwlagn snd_hda_codec snd_hwdep zaurus
> iwlcore snd_seq snd_seq_device r852 sm_common cdc_ether nand nand_ids
> nand_ecc
> microcode mac80211 uvcvideo usbnet mtd mii cdc_acm snd_pcm btusb cdc_wdm
> bluetooth videodev iTCO_wdt i2c_i801 iTCO_vendor_support joydev cfg80211
> thinkpad_acpi v4l1_compat v4l2_compat_ioctl32 e1000e snd_timer rfkill
> snd_page_alloc wmi snd soundcore ipv6 sdhci_pci sdhci firewire_ohci mmc_core
> firewire_core yenta_socket crc_itu_t i915 drm_kms_helper drm i2c_algo_bit
> i2c_core video output [last unloaded: scsi_wait_scan]
> [24573.686007] 
> [24573.686007] Pid: 8321, comm: NetworkManager Not tainted
> 2.6.36-0.35.rc7.git0.fc15.x86_64 #1 6474AR4/6474AR4
> [24573.686007] RIP: 0010:[<ffffffff8126ea48>]  [<ffffffff8126ea48>]
> __domain_mapping+0x43/0x1ce
> [24573.686007] RSP: 0018:ffff880133727648  EFLAGS: 00010206
> [24573.694051] RAX: 0000000001ffffff RBX: ffff8800b4687400 RCX:
> 000000000000001b
> [24573.694051] RDX: 000000000008b621 RSI: 000ffffffffffdff RDI:
> ffff8801320f6dc0
> [24573.694051] RBP: ffff880133727698 R08: 0000000000000001 R09:
> 0000000000000003
> [24573.694051] R10: ffff8801320f6df8 R11: 0000000000000000 R12:
> 0000000000000000
> [24573.694051] R13: ffff8801320f6dc0 R14: ffff88013bc04ff8 R15:
> 0000000000000001
> [24573.694051] FS:  00007fb24c872800(0000) GS:ffff880002c00000(0000)
> knlGS:0000000000000000
> [24573.694051] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [24573.694051] CR2: 000000000042da00 CR3: 000000012fbc2000 CR4:
> 00000000000006f0
> [24573.694051] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
> 0000000000000000
> [24573.694051] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7:
> 0000000000000400
> [24573.694051] Process NetworkManager (pid: 8321, threadinfo
> ffff880133726000,
> task ffff88012f8f0000)
> [24573.694051] Stack:
> [24573.694051]  ffff88013707e240 ffff8801320f6dc0 ffff880133727698
> 000ffffffffffdff
> [24573.694051] <0> 0000000000000000 ffff8800b4687400 000000008b621000
> ffff8801320f6dc0
> [24573.694051] <0> ffff88013bc04ff8 0000000000000000 ffff8801337276f8
> ffffffff8126f710
> [24573.694051] Call Trace:
> [24573.694051]  [<ffffffff8126f710>] __intel_map_single.clone.25+0xdc/0x16b
> [24573.694051]  [<ffffffff8126f88c>] intel_alloc_coherent+0xae/0xd5
> [24573.694051]  [<ffffffffa018c128>] e1000_alloc_ring_dma.clone.28+0x94/0xc0
> [e1000e]
> [24573.694051]  [<ffffffffa018e359>] e1000e_setup_tx_resources+0x65/0xaa
> [e1000e]
> [24573.694051]  [<ffffffffa018e891>] e1000_open+0x64/0x41e [e1000e]
> [24573.694051]  [<ffffffff813eeb45>] __dev_open+0x9b/0xd2
> [24573.694051]  [<ffffffff813eed87>] __dev_change_flags+0xad/0x130
> [24573.694051]  [<ffffffff813eee8b>] dev_change_flags+0x21/0x56
> [24573.694051]  [<ffffffff813f90f9>] do_setlink+0x2ba/0x61f
> [24573.694051]  [<ffffffff8107d8c7>] ? print_lock_contention_bug+0x1b/0xd5
> [24573.694051]  [<ffffffff81249f7f>] ? debug_check_no_obj_freed+0x65/0x18a
> [24573.694051]  [<ffffffff8107d8c7>] ? print_lock_contention_bug+0x1b/0xd5
> [24573.694051]  [<ffffffff813f96be>] rtnl_setlink+0xd0/0xf2
> [24573.694051]  [<ffffffff813f99ac>] rtnetlink_rcv_msg+0x1eb/0x201
> [24573.694051]  [<ffffffff813f97c1>] ? rtnetlink_rcv_msg+0x0/0x201
> [24573.694051]  [<ffffffff8140d3e5>] netlink_rcv_skb+0x45/0x90
> [24573.694051]  [<ffffffff813f8d29>] rtnetlink_rcv+0x26/0x2d
> [24573.694051]  [<ffffffff8140cec0>] netlink_unicast+0xee/0x157
> [24573.694051]  [<ffffffff8140d1e1>] netlink_sendmsg+0x2b8/0x2d6
> [24573.694051]  [<ffffffff813da64e>] __sock_sendmsg+0x6b/0x77
> [24573.694051]  [<ffffffff813da9a8>] sock_sendmsg+0xa8/0xc1
> [24573.694051]  [<ffffffff8107ff07>] ? lock_acquire+0xee/0xfd
> [24573.694051]  [<ffffffff810fb080>] ? might_fault+0x5c/0xac
> [24573.694051]  [<ffffffff8107fe0d>] ? lock_release+0x19a/0x1a6
> [24573.694051]  [<ffffffff810fb0c9>] ? might_fault+0xa5/0xac
> [24573.694051]  [<ffffffff813e4dbb>] ? copy_from_user+0x2f/0x31
> [24573.694051]  [<ffffffff813e51ae>] ? verify_iovec+0x57/0x99
> [24573.694051]  [<ffffffff813dc971>] sys_sendmsg+0x235/0x2b3
> [24573.694051]  [<ffffffff8112bb26>] ? rcu_read_lock+0x0/0x35
> [24573.694051]  [<ffffffff8107ff07>] ? lock_acquire+0xee/0xfd
> [24573.694051]  [<ffffffff8112bb26>] ? rcu_read_lock+0x0/0x35
> [24573.694051]  [<ffffffff813dc405>] ? sys_sendto+0x125/0x152
> [24573.694051]  [<ffffffff8112c5a2>] ? fput+0x22/0x1d6
> [24573.694051]  [<ffffffff8112c4ae>] ? fget_light+0x79/0x83
> [24573.694051]  [<ffffffff81133e5b>] ? path_put+0x22/0x27
> [24573.694051]  [<ffffffff810a8443>] ? audit_syscall_entry+0x11c/0x148
> [24573.694051]  [<ffffffff8149da45>] ? trace_hardirqs_on_thunk+0x3a/0x3f
> [24573.694051]  [<ffffffff81009cf2>] system_call_fastpath+0x16/0x1b
> [24573.694051] Code: d4 48 89 ca 48 89 7d b8 6b 8f 84 00 00 00 09 48 89 75 c8
> 4d 89 c7 83 c1 12 83 f9 3f 7f 0f 4a 8d 44 06 ff 48 d3 e8 48 85 c0 74 02 <0f>
> 0b
> 41 f6 c1 03 b8 ea ff ff ff 0f 84 6b 01 00 00 41 81 e1 03 
> [24573.694051] RIP  [<ffffffff8126ea48>] __domain_mapping+0x43/0x1ce
> [24573.694051]  RSP <ffff880133727648>
> [24573.821392] ---[ end trace 391efc8948e1496b ]---
> [24573.832050] NetworkManager used greatest stack depth: 2064 bytes left
> [24574.026042] usb 4-2: new full speed USB device using uhci_hcd and address
> 3
> [24574.187102] usb 4-2: New USB device found, idVendor=0a5c, idProduct=2145
> [24574.188244] usb 4-2: New USB device strings: Mfr=1, Product=2,
> SerialNumber=0
> [24574.189418] usb 4-2: Product: ThinkPad Bluetooth with Enhanced Data Rate
> II
> [24574.190567] usb 4-2: Manufacturer: Lenovo Computer Corp
> [24576.230085] usb 2-4: new high speed USB device using ehci_hcd and address
> 3
> [24577.080715] usb 2-4: New USB device found, idVendor=0bdb, idProduct=1900
> [24577.081862] usb 2-4: New USB device strings: Mfr=1, Product=2,
> SerialNumber=3
> [24577.083009] usb 2-4: Product: Ericsson F3507g Mobile Broadband Minicard
> Composite Device
> [24577.084140] usb 2-4: Manufacturer: Ericsson
> [24577.085263] usb 2-4: SerialNumber: 3541430207407750
> [24577.144202] cdc_acm 2-4:1.1: ttyACM0: USB ACM device
> [24577.163044] cdc_acm 2-4:1.3: ttyACM1: USB ACM device
> [24577.174389] cdc_wdm 2-4:1.5: cdc-wdm0: USB WDM device
> [24577.183588] cdc_wdm 2-4:1.6: cdc-wdm1: USB WDM device
> [26974.894966] thinkpad_acpi: EC reports that Thermal Table has changed
> 
> 
> Note that I explicitly have disabled iommu for intel:
> # cat /proc/cmdline 
> ro root=/dev/VolGroup00/lv_root rhgb quiet selinux=0 vga=0x318
> SYSFONT=latarcyrheb-sun16 LANG=en_US.UTF-8 KEYTABLE=no intel_iommu=igfx_off
> 
> I've seen this on 2.6.36-0.35.rc7.git0.fc15.x86_64,
> 2.6.36-0.27.rc5.git6.fc15.x86_64,2.6.36-0.32.rc6.git2.fc15.x86_64.

I don't have any of those kernel versions here, but I'm guessing that
this test is triggering:

	BUG_ON(addr_width < BITS_PER_LONG && (iov_pfn + nr_pages - 1) >> addr_width);

It could be that e1000e is feeding in garbage, or it could be that
intel-iommu is screwed up.


It's a bit hard to tell what's happening because that BUG_ON was quite
poorly thought out.  It tests three different variables, doesn't tell
us their values and even though it _could_ cleanly recover and allow
the machine to continue to operate it simply whacks the box.

So we now have a pickle on our hands, because you use prebuilt kernels
and are probably not in a position to test patches.
Comment 3 MartinG 2010-10-11 21:18:29 UTC
On Mon, Oct 11, 2010 at 10:45 PM,  <bugzilla-daemon@bugzilla.kernel.org> wrote:
> https://bugzilla.kernel.org/show_bug.cgi?id=19942
>
> --- Comment #2 from Andrew Morton <akpm@linux-foundation.org>  2010-10-11
> 20:45:55 ---
> (switched to email.  Please respond via emailed reply-to-all, not via the
> bugzilla web interface).
>
>
> On Sat, 9 Oct 2010 10:07:15 GMT
> bugzilla-daemon@bugzilla.kernel.org wrote:
>
>> https://bugzilla.kernel.org/show_bug.cgi?id=19942
>>
>>            Summary: Not a intel bug: kernel BUG at
>>                     drivers/pci/intel-iommu.c:1656
>>            Product: Drivers
>>            Version: 2.5
>>     Kernel Version: 2.6.36-0.35.rc7.git0.fc15.x86_64
>>           Platform: All
>>         OS/Version: Linux
>>               Tree: Fedora
>>             Status: NEW
>>           Severity: normal
>>           Priority: P1
>>          Component: Network
>>         AssignedTo: drivers_network@kernel-bugs.osdl.org
>>         ReportedBy: gronslet@gmail.com
>>         Regression: No
>>
>>
>> On my Fedora Rawhide system, I keep getting these errors, which kills my
>> wifi
>> and require me to reboot my Lenovo Thinkpad T400. Please also see
>> https://bugzilla.redhat.com/show_bug.cgi?id=637554
>> https://bugs.freedesktop.org/show_bug.cgi?id=30722
>>
>> In the latter, I was asked to file the bug here, as it isn't a intel bug.
>> Fedora Rawhide, kernel-2.6.36-0.35.rc7.git0.fc15.x86_64,
>> xorg-x11-drv-intel-2.12.0-6.fc14.1.x86_64,
>> xorg-x11-drivers-7.4-1.fc14.x86_64
>> xorg-x11-server-utils-7.4-20.fc15.x86_64,
>> NetworkManager-0.8.1-7.git20100831.fc15.x86_64
>>
>> This happens when I resume my laptop after suspend to ram:
>>
>> [24572.218077] PM: resume devices took 0.987 seconds
>> [24572.239068] PM: Finishing wakeup.
>> [24572.239216] Restarting tasks ...
>> [24572.239332] usb 2-4: USB disconnect, address 2
>> [24572.245520] done.
>> [24572.245702] video LNXVIDEO:00: Restoring backlight state
>> [24572.249109] ehci_hcd 0000:00:1d.7: dma_pool_free buffer-2048,
>> ffff880134f9d000/ffffb000 (bad dma)
>> [24572.249631] ehci_hcd 0000:00:1d.7: dma_pool_free buffer-2048,
>> ffff880134f9d080/ffffb080 (bad dma)
>> [24572.249977] cdc_ether 2-4:1.7: wwan0: unregister 'cdc_ether'
>> usb-0000:00:1d.7-4, Mobile Broadband Network Device
>> [24573.685674] ------------[ cut here ]------------
>> [24573.685709] kernel BUG at drivers/pci/intel-iommu.c:1656!
>> [24573.685734] invalid opcode: 0000 [#1] SMP
>> [24573.685761] last sysfs file:
>> /sys/devices/system/cpu/sched_mc_power_savings
>> [24573.685791] CPU 0
>> [24573.685803] Modules linked in: rfcomm sunrpc sco bnep l2cap
>> cpufreq_ondemand
>> acpi_cpufreq freq_table mperf ip6t_REJECT xt_physdev nf_conntrack_ipv6
>> ip6table_filter ipt_MASQUERADE iptable_nat ip6_tables nf_nat sha256_generic
>> cryptd aes_x86_64 aes_generic cbc dm_crypt uinput arc4 ecb
>> snd_hda_codec_conexant snd_hda_intel iwlagn snd_hda_codec snd_hwdep zaurus
>> iwlcore snd_seq snd_seq_device r852 sm_common cdc_ether nand nand_ids
>> nand_ecc
>> microcode mac80211 uvcvideo usbnet mtd mii cdc_acm snd_pcm btusb cdc_wdm
>> bluetooth videodev iTCO_wdt i2c_i801 iTCO_vendor_support joydev cfg80211
>> thinkpad_acpi v4l1_compat v4l2_compat_ioctl32 e1000e snd_timer rfkill
>> snd_page_alloc wmi snd soundcore ipv6 sdhci_pci sdhci firewire_ohci mmc_core
>> firewire_core yenta_socket crc_itu_t i915 drm_kms_helper drm i2c_algo_bit
>> i2c_core video output [last unloaded: scsi_wait_scan]
>> [24573.686007]
>> [24573.686007] Pid: 8321, comm: NetworkManager Not tainted
>> 2.6.36-0.35.rc7.git0.fc15.x86_64 #1 6474AR4/6474AR4
>> [24573.686007] RIP: 0010:[<ffffffff8126ea48>]  [<ffffffff8126ea48>]
>> __domain_mapping+0x43/0x1ce
>> [24573.686007] RSP: 0018:ffff880133727648  EFLAGS: 00010206
>> [24573.694051] RAX: 0000000001ffffff RBX: ffff8800b4687400 RCX:
>> 000000000000001b
>> [24573.694051] RDX: 000000000008b621 RSI: 000ffffffffffdff RDI:
>> ffff8801320f6dc0
>> [24573.694051] RBP: ffff880133727698 R08: 0000000000000001 R09:
>> 0000000000000003
>> [24573.694051] R10: ffff8801320f6df8 R11: 0000000000000000 R12:
>> 0000000000000000
>> [24573.694051] R13: ffff8801320f6dc0 R14: ffff88013bc04ff8 R15:
>> 0000000000000001
>> [24573.694051] FS:  00007fb24c872800(0000) GS:ffff880002c00000(0000)
>> knlGS:0000000000000000
>> [24573.694051] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> [24573.694051] CR2: 000000000042da00 CR3: 000000012fbc2000 CR4:
>> 00000000000006f0
>> [24573.694051] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
>> 0000000000000000
>> [24573.694051] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7:
>> 0000000000000400
>> [24573.694051] Process NetworkManager (pid: 8321, threadinfo
>> ffff880133726000,
>> task ffff88012f8f0000)
>> [24573.694051] Stack:
>> [24573.694051]  ffff88013707e240 ffff8801320f6dc0 ffff880133727698
>> 000ffffffffffdff
>> [24573.694051] <0> 0000000000000000 ffff8800b4687400 000000008b621000
>> ffff8801320f6dc0
>> [24573.694051] <0> ffff88013bc04ff8 0000000000000000 ffff8801337276f8
>> ffffffff8126f710
>> [24573.694051] Call Trace:
>> [24573.694051]  [<ffffffff8126f710>] __intel_map_single.clone.25+0xdc/0x16b
>> [24573.694051]  [<ffffffff8126f88c>] intel_alloc_coherent+0xae/0xd5
>> [24573.694051]  [<ffffffffa018c128>] e1000_alloc_ring_dma.clone.28+0x94/0xc0
>> [e1000e]
>> [24573.694051]  [<ffffffffa018e359>] e1000e_setup_tx_resources+0x65/0xaa
>> [e1000e]
>> [24573.694051]  [<ffffffffa018e891>] e1000_open+0x64/0x41e [e1000e]
>> [24573.694051]  [<ffffffff813eeb45>] __dev_open+0x9b/0xd2
>> [24573.694051]  [<ffffffff813eed87>] __dev_change_flags+0xad/0x130
>> [24573.694051]  [<ffffffff813eee8b>] dev_change_flags+0x21/0x56
>> [24573.694051]  [<ffffffff813f90f9>] do_setlink+0x2ba/0x61f
>> [24573.694051]  [<ffffffff8107d8c7>] ? print_lock_contention_bug+0x1b/0xd5
>> [24573.694051]  [<ffffffff81249f7f>] ? debug_check_no_obj_freed+0x65/0x18a
>> [24573.694051]  [<ffffffff8107d8c7>] ? print_lock_contention_bug+0x1b/0xd5
>> [24573.694051]  [<ffffffff813f96be>] rtnl_setlink+0xd0/0xf2
>> [24573.694051]  [<ffffffff813f99ac>] rtnetlink_rcv_msg+0x1eb/0x201
>> [24573.694051]  [<ffffffff813f97c1>] ? rtnetlink_rcv_msg+0x0/0x201
>> [24573.694051]  [<ffffffff8140d3e5>] netlink_rcv_skb+0x45/0x90
>> [24573.694051]  [<ffffffff813f8d29>] rtnetlink_rcv+0x26/0x2d
>> [24573.694051]  [<ffffffff8140cec0>] netlink_unicast+0xee/0x157
>> [24573.694051]  [<ffffffff8140d1e1>] netlink_sendmsg+0x2b8/0x2d6
>> [24573.694051]  [<ffffffff813da64e>] __sock_sendmsg+0x6b/0x77
>> [24573.694051]  [<ffffffff813da9a8>] sock_sendmsg+0xa8/0xc1
>> [24573.694051]  [<ffffffff8107ff07>] ? lock_acquire+0xee/0xfd
>> [24573.694051]  [<ffffffff810fb080>] ? might_fault+0x5c/0xac
>> [24573.694051]  [<ffffffff8107fe0d>] ? lock_release+0x19a/0x1a6
>> [24573.694051]  [<ffffffff810fb0c9>] ? might_fault+0xa5/0xac
>> [24573.694051]  [<ffffffff813e4dbb>] ? copy_from_user+0x2f/0x31
>> [24573.694051]  [<ffffffff813e51ae>] ? verify_iovec+0x57/0x99
>> [24573.694051]  [<ffffffff813dc971>] sys_sendmsg+0x235/0x2b3
>> [24573.694051]  [<ffffffff8112bb26>] ? rcu_read_lock+0x0/0x35
>> [24573.694051]  [<ffffffff8107ff07>] ? lock_acquire+0xee/0xfd
>> [24573.694051]  [<ffffffff8112bb26>] ? rcu_read_lock+0x0/0x35
>> [24573.694051]  [<ffffffff813dc405>] ? sys_sendto+0x125/0x152
>> [24573.694051]  [<ffffffff8112c5a2>] ? fput+0x22/0x1d6
>> [24573.694051]  [<ffffffff8112c4ae>] ? fget_light+0x79/0x83
>> [24573.694051]  [<ffffffff81133e5b>] ? path_put+0x22/0x27
>> [24573.694051]  [<ffffffff810a8443>] ? audit_syscall_entry+0x11c/0x148
>> [24573.694051]  [<ffffffff8149da45>] ? trace_hardirqs_on_thunk+0x3a/0x3f
>> [24573.694051]  [<ffffffff81009cf2>] system_call_fastpath+0x16/0x1b
>> [24573.694051] Code: d4 48 89 ca 48 89 7d b8 6b 8f 84 00 00 00 09 48 89 75
>> c8
>> 4d 89 c7 83 c1 12 83 f9 3f 7f 0f 4a 8d 44 06 ff 48 d3 e8 48 85 c0 74 02 <0f>
>> 0b
>> 41 f6 c1 03 b8 ea ff ff ff 0f 84 6b 01 00 00 41 81 e1 03
>> [24573.694051] RIP  [<ffffffff8126ea48>] __domain_mapping+0x43/0x1ce
>> [24573.694051]  RSP <ffff880133727648>
>> [24573.821392] ---[ end trace 391efc8948e1496b ]---
>> [24573.832050] NetworkManager used greatest stack depth: 2064 bytes left
>> [24574.026042] usb 4-2: new full speed USB device using uhci_hcd and address
>> 3
>> [24574.187102] usb 4-2: New USB device found, idVendor=0a5c, idProduct=2145
>> [24574.188244] usb 4-2: New USB device strings: Mfr=1, Product=2,
>> SerialNumber=0
>> [24574.189418] usb 4-2: Product: ThinkPad Bluetooth with Enhanced Data Rate
>> II
>> [24574.190567] usb 4-2: Manufacturer: Lenovo Computer Corp
>> [24576.230085] usb 2-4: new high speed USB device using ehci_hcd and address
>> 3
>> [24577.080715] usb 2-4: New USB device found, idVendor=0bdb, idProduct=1900
>> [24577.081862] usb 2-4: New USB device strings: Mfr=1, Product=2,
>> SerialNumber=3
>> [24577.083009] usb 2-4: Product: Ericsson F3507g Mobile Broadband Minicard
>> Composite Device
>> [24577.084140] usb 2-4: Manufacturer: Ericsson
>> [24577.085263] usb 2-4: SerialNumber: 3541430207407750
>> [24577.144202] cdc_acm 2-4:1.1: ttyACM0: USB ACM device
>> [24577.163044] cdc_acm 2-4:1.3: ttyACM1: USB ACM device
>> [24577.174389] cdc_wdm 2-4:1.5: cdc-wdm0: USB WDM device
>> [24577.183588] cdc_wdm 2-4:1.6: cdc-wdm1: USB WDM device
>> [26974.894966] thinkpad_acpi: EC reports that Thermal Table has changed
>>
>>
>> Note that I explicitly have disabled iommu for intel:
>> # cat /proc/cmdline
>> ro root=/dev/VolGroup00/lv_root rhgb quiet selinux=0 vga=0x318
>> SYSFONT=latarcyrheb-sun16 LANG=en_US.UTF-8 KEYTABLE=no intel_iommu=igfx_off
>>
>> I've seen this on 2.6.36-0.35.rc7.git0.fc15.x86_64,
>> 2.6.36-0.27.rc5.git6.fc15.x86_64,2.6.36-0.32.rc6.git2.fc15.x86_64.
>
> I don't have any of those kernel versions here, but I'm guessing that
> this test is triggering:
>
>    BUG_ON(addr_width < BITS_PER_LONG && (iov_pfn + nr_pages - 1) >>
> addr_width);
>
> It could be that e1000e is feeding in garbage, or it could be that
> intel-iommu is screwed up.
>
>
> It's a bit hard to tell what's happening because that BUG_ON was quite
> poorly thought out.  It tests three different variables, doesn't tell
> us their values and even though it _could_ cleanly recover and allow
> the machine to continue to operate it simply whacks the box.
>
> So we now have a pickle on our hands, because you use prebuilt kernels
> and are probably not in a position to test patches.

Thank you for your response! I'd be happy to cook a vanilla kernel and
test. It's been a while since I did that, but I hope this is the
correct thing to do:
localhost:~/linux-2.6.36-rc7:$ cp
/boot/config-2.6.36-0.35.rc7.git0.fc15.x86_64 .config
localhost:~/linux-2.6.36-rc7:$ make -j3
(still building)

Please let me know if I should use a different version and/or a
different config file. I'll post back when/if I get the bug also with
the vanilla kernel.

Thanks,
MartinG
Comment 4 MartinG 2010-10-12 19:05:16 UTC
On Mon, Oct 11, 2010 at 11:16 PM, MartinG <gronslet@gmail.com> wrote:
> On Mon, Oct 11, 2010 at 10:45 PM,  <bugzilla-daemon@bugzilla.kernel.org>
> wrote:
>> https://bugzilla.kernel.org/show_bug.cgi?id=19942
>>
>> --- Comment #2 from Andrew Morton <akpm@linux-foundation.org>  2010-10-11
>> 20:45:55 ---
>> (switched to email.  Please respond via emailed reply-to-all, not via the
>> bugzilla web interface).
>>
>>
>> On Sat, 9 Oct 2010 10:07:15 GMT
>> bugzilla-daemon@bugzilla.kernel.org wrote:
>>
>>> https://bugzilla.kernel.org/show_bug.cgi?id=19942
>>>
>>>            Summary: Not a intel bug: kernel BUG at
>>>                     drivers/pci/intel-iommu.c:1656
>>>            Product: Drivers
>>>            Version: 2.5
>>>     Kernel Version: 2.6.36-0.35.rc7.git0.fc15.x86_64
>>>           Platform: All
>>>         OS/Version: Linux
>>>               Tree: Fedora
>>>             Status: NEW
>>>           Severity: normal
>>>           Priority: P1
>>>          Component: Network
>>>         AssignedTo: drivers_network@kernel-bugs.osdl.org
>>>         ReportedBy: gronslet@gmail.com
>>>         Regression: No
>>>
>>>
>>> On my Fedora Rawhide system, I keep getting these errors, which kills my
>>> wifi
>>> and require me to reboot my Lenovo Thinkpad T400. Please also see
>>> https://bugzilla.redhat.com/show_bug.cgi?id=637554
>>> https://bugs.freedesktop.org/show_bug.cgi?id=30722
>>>
>>> In the latter, I was asked to file the bug here, as it isn't a intel bug.
>>> Fedora Rawhide, kernel-2.6.36-0.35.rc7.git0.fc15.x86_64,
>>> xorg-x11-drv-intel-2.12.0-6.fc14.1.x86_64,
>>> xorg-x11-drivers-7.4-1.fc14.x86_64
>>> xorg-x11-server-utils-7.4-20.fc15.x86_64,
>>> NetworkManager-0.8.1-7.git20100831.fc15.x86_64
>>>
>>> This happens when I resume my laptop after suspend to ram:
>>>
>>> [24572.218077] PM: resume devices took 0.987 seconds
>>> [24572.239068] PM: Finishing wakeup.
>>> [24572.239216] Restarting tasks ...
>>> [24572.239332] usb 2-4: USB disconnect, address 2
>>> [24572.245520] done.
>>> [24572.245702] video LNXVIDEO:00: Restoring backlight state
>>> [24572.249109] ehci_hcd 0000:00:1d.7: dma_pool_free buffer-2048,
>>> ffff880134f9d000/ffffb000 (bad dma)
>>> [24572.249631] ehci_hcd 0000:00:1d.7: dma_pool_free buffer-2048,
>>> ffff880134f9d080/ffffb080 (bad dma)
>>> [24572.249977] cdc_ether 2-4:1.7: wwan0: unregister 'cdc_ether'
>>> usb-0000:00:1d.7-4, Mobile Broadband Network Device
>>> [24573.685674] ------------[ cut here ]------------
>>> [24573.685709] kernel BUG at drivers/pci/intel-iommu.c:1656!
>>> [24573.685734] invalid opcode: 0000 [#1] SMP
>>> [24573.685761] last sysfs file:
>>> /sys/devices/system/cpu/sched_mc_power_savings
>>> [24573.685791] CPU 0
>>> [24573.685803] Modules linked in: rfcomm sunrpc sco bnep l2cap
>>> cpufreq_ondemand
>>> acpi_cpufreq freq_table mperf ip6t_REJECT xt_physdev nf_conntrack_ipv6
>>> ip6table_filter ipt_MASQUERADE iptable_nat ip6_tables nf_nat sha256_generic
>>> cryptd aes_x86_64 aes_generic cbc dm_crypt uinput arc4 ecb
>>> snd_hda_codec_conexant snd_hda_intel iwlagn snd_hda_codec snd_hwdep zaurus
>>> iwlcore snd_seq snd_seq_device r852 sm_common cdc_ether nand nand_ids
>>> nand_ecc
>>> microcode mac80211 uvcvideo usbnet mtd mii cdc_acm snd_pcm btusb cdc_wdm
>>> bluetooth videodev iTCO_wdt i2c_i801 iTCO_vendor_support joydev cfg80211
>>> thinkpad_acpi v4l1_compat v4l2_compat_ioctl32 e1000e snd_timer rfkill
>>> snd_page_alloc wmi snd soundcore ipv6 sdhci_pci sdhci firewire_ohci
>>> mmc_core
>>> firewire_core yenta_socket crc_itu_t i915 drm_kms_helper drm i2c_algo_bit
>>> i2c_core video output [last unloaded: scsi_wait_scan]
>>> [24573.686007]
>>> [24573.686007] Pid: 8321, comm: NetworkManager Not tainted
>>> 2.6.36-0.35.rc7.git0.fc15.x86_64 #1 6474AR4/6474AR4
>>> [24573.686007] RIP: 0010:[<ffffffff8126ea48>]  [<ffffffff8126ea48>]
>>> __domain_mapping+0x43/0x1ce
>>> [24573.686007] RSP: 0018:ffff880133727648  EFLAGS: 00010206
>>> [24573.694051] RAX: 0000000001ffffff RBX: ffff8800b4687400 RCX:
>>> 000000000000001b
>>> [24573.694051] RDX: 000000000008b621 RSI: 000ffffffffffdff RDI:
>>> ffff8801320f6dc0
>>> [24573.694051] RBP: ffff880133727698 R08: 0000000000000001 R09:
>>> 0000000000000003
>>> [24573.694051] R10: ffff8801320f6df8 R11: 0000000000000000 R12:
>>> 0000000000000000
>>> [24573.694051] R13: ffff8801320f6dc0 R14: ffff88013bc04ff8 R15:
>>> 0000000000000001
>>> [24573.694051] FS:  00007fb24c872800(0000) GS:ffff880002c00000(0000)
>>> knlGS:0000000000000000
>>> [24573.694051] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>> [24573.694051] CR2: 000000000042da00 CR3: 000000012fbc2000 CR4:
>>> 00000000000006f0
>>> [24573.694051] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
>>> 0000000000000000
>>> [24573.694051] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7:
>>> 0000000000000400
>>> [24573.694051] Process NetworkManager (pid: 8321, threadinfo
>>> ffff880133726000,
>>> task ffff88012f8f0000)
>>> [24573.694051] Stack:
>>> [24573.694051]  ffff88013707e240 ffff8801320f6dc0 ffff880133727698
>>> 000ffffffffffdff
>>> [24573.694051] <0> 0000000000000000 ffff8800b4687400 000000008b621000
>>> ffff8801320f6dc0
>>> [24573.694051] <0> ffff88013bc04ff8 0000000000000000 ffff8801337276f8
>>> ffffffff8126f710
>>> [24573.694051] Call Trace:
>>> [24573.694051]  [<ffffffff8126f710>] __intel_map_single.clone.25+0xdc/0x16b
>>> [24573.694051]  [<ffffffff8126f88c>] intel_alloc_coherent+0xae/0xd5
>>> [24573.694051]  [<ffffffffa018c128>]
>>> e1000_alloc_ring_dma.clone.28+0x94/0xc0
>>> [e1000e]
>>> [24573.694051]  [<ffffffffa018e359>] e1000e_setup_tx_resources+0x65/0xaa
>>> [e1000e]
>>> [24573.694051]  [<ffffffffa018e891>] e1000_open+0x64/0x41e [e1000e]
>>> [24573.694051]  [<ffffffff813eeb45>] __dev_open+0x9b/0xd2
>>> [24573.694051]  [<ffffffff813eed87>] __dev_change_flags+0xad/0x130
>>> [24573.694051]  [<ffffffff813eee8b>] dev_change_flags+0x21/0x56
>>> [24573.694051]  [<ffffffff813f90f9>] do_setlink+0x2ba/0x61f
>>> [24573.694051]  [<ffffffff8107d8c7>] ? print_lock_contention_bug+0x1b/0xd5
>>> [24573.694051]  [<ffffffff81249f7f>] ? debug_check_no_obj_freed+0x65/0x18a
>>> [24573.694051]  [<ffffffff8107d8c7>] ? print_lock_contention_bug+0x1b/0xd5
>>> [24573.694051]  [<ffffffff813f96be>] rtnl_setlink+0xd0/0xf2
>>> [24573.694051]  [<ffffffff813f99ac>] rtnetlink_rcv_msg+0x1eb/0x201
>>> [24573.694051]  [<ffffffff813f97c1>] ? rtnetlink_rcv_msg+0x0/0x201
>>> [24573.694051]  [<ffffffff8140d3e5>] netlink_rcv_skb+0x45/0x90
>>> [24573.694051]  [<ffffffff813f8d29>] rtnetlink_rcv+0x26/0x2d
>>> [24573.694051]  [<ffffffff8140cec0>] netlink_unicast+0xee/0x157
>>> [24573.694051]  [<ffffffff8140d1e1>] netlink_sendmsg+0x2b8/0x2d6
>>> [24573.694051]  [<ffffffff813da64e>] __sock_sendmsg+0x6b/0x77
>>> [24573.694051]  [<ffffffff813da9a8>] sock_sendmsg+0xa8/0xc1
>>> [24573.694051]  [<ffffffff8107ff07>] ? lock_acquire+0xee/0xfd
>>> [24573.694051]  [<ffffffff810fb080>] ? might_fault+0x5c/0xac
>>> [24573.694051]  [<ffffffff8107fe0d>] ? lock_release+0x19a/0x1a6
>>> [24573.694051]  [<ffffffff810fb0c9>] ? might_fault+0xa5/0xac
>>> [24573.694051]  [<ffffffff813e4dbb>] ? copy_from_user+0x2f/0x31
>>> [24573.694051]  [<ffffffff813e51ae>] ? verify_iovec+0x57/0x99
>>> [24573.694051]  [<ffffffff813dc971>] sys_sendmsg+0x235/0x2b3
>>> [24573.694051]  [<ffffffff8112bb26>] ? rcu_read_lock+0x0/0x35
>>> [24573.694051]  [<ffffffff8107ff07>] ? lock_acquire+0xee/0xfd
>>> [24573.694051]  [<ffffffff8112bb26>] ? rcu_read_lock+0x0/0x35
>>> [24573.694051]  [<ffffffff813dc405>] ? sys_sendto+0x125/0x152
>>> [24573.694051]  [<ffffffff8112c5a2>] ? fput+0x22/0x1d6
>>> [24573.694051]  [<ffffffff8112c4ae>] ? fget_light+0x79/0x83
>>> [24573.694051]  [<ffffffff81133e5b>] ? path_put+0x22/0x27
>>> [24573.694051]  [<ffffffff810a8443>] ? audit_syscall_entry+0x11c/0x148
>>> [24573.694051]  [<ffffffff8149da45>] ? trace_hardirqs_on_thunk+0x3a/0x3f
>>> [24573.694051]  [<ffffffff81009cf2>] system_call_fastpath+0x16/0x1b
>>> [24573.694051] Code: d4 48 89 ca 48 89 7d b8 6b 8f 84 00 00 00 09 48 89 75
>>> c8
>>> 4d 89 c7 83 c1 12 83 f9 3f 7f 0f 4a 8d 44 06 ff 48 d3 e8 48 85 c0 74 02
>>> <0f> 0b
>>> 41 f6 c1 03 b8 ea ff ff ff 0f 84 6b 01 00 00 41 81 e1 03
>>> [24573.694051] RIP  [<ffffffff8126ea48>] __domain_mapping+0x43/0x1ce
>>> [24573.694051]  RSP <ffff880133727648>
>>> [24573.821392] ---[ end trace 391efc8948e1496b ]---
>>> [24573.832050] NetworkManager used greatest stack depth: 2064 bytes left
>>> [24574.026042] usb 4-2: new full speed USB device using uhci_hcd and
>>> address 3
>>> [24574.187102] usb 4-2: New USB device found, idVendor=0a5c, idProduct=2145
>>> [24574.188244] usb 4-2: New USB device strings: Mfr=1, Product=2,
>>> SerialNumber=0
>>> [24574.189418] usb 4-2: Product: ThinkPad Bluetooth with Enhanced Data Rate
>>> II
>>> [24574.190567] usb 4-2: Manufacturer: Lenovo Computer Corp
>>> [24576.230085] usb 2-4: new high speed USB device using ehci_hcd and
>>> address 3
>>> [24577.080715] usb 2-4: New USB device found, idVendor=0bdb, idProduct=1900
>>> [24577.081862] usb 2-4: New USB device strings: Mfr=1, Product=2,
>>> SerialNumber=3
>>> [24577.083009] usb 2-4: Product: Ericsson F3507g Mobile Broadband Minicard
>>> Composite Device
>>> [24577.084140] usb 2-4: Manufacturer: Ericsson
>>> [24577.085263] usb 2-4: SerialNumber: 3541430207407750
>>> [24577.144202] cdc_acm 2-4:1.1: ttyACM0: USB ACM device
>>> [24577.163044] cdc_acm 2-4:1.3: ttyACM1: USB ACM device
>>> [24577.174389] cdc_wdm 2-4:1.5: cdc-wdm0: USB WDM device
>>> [24577.183588] cdc_wdm 2-4:1.6: cdc-wdm1: USB WDM device
>>> [26974.894966] thinkpad_acpi: EC reports that Thermal Table has changed
>>>
>>>
>>> Note that I explicitly have disabled iommu for intel:
>>> # cat /proc/cmdline
>>> ro root=/dev/VolGroup00/lv_root rhgb quiet selinux=0 vga=0x318
>>> SYSFONT=latarcyrheb-sun16 LANG=en_US.UTF-8 KEYTABLE=no intel_iommu=igfx_off
>>>
>>> I've seen this on 2.6.36-0.35.rc7.git0.fc15.x86_64,
>>> 2.6.36-0.27.rc5.git6.fc15.x86_64,2.6.36-0.32.rc6.git2.fc15.x86_64.
>>
>> I don't have any of those kernel versions here, but I'm guessing that
>> this test is triggering:
>>
>>    BUG_ON(addr_width < BITS_PER_LONG && (iov_pfn + nr_pages - 1) >>
>> addr_width);
>>
>> It could be that e1000e is feeding in garbage, or it could be that
>> intel-iommu is screwed up.
>>
>>
>> It's a bit hard to tell what's happening because that BUG_ON was quite
>> poorly thought out.  It tests three different variables, doesn't tell
>> us their values and even though it _could_ cleanly recover and allow
>> the machine to continue to operate it simply whacks the box.
>>
>> So we now have a pickle on our hands, because you use prebuilt kernels
>> and are probably not in a position to test patches.
>
> Thank you for your response! I'd be happy to cook a vanilla kernel and
> test. It's been a while since I did that, but I hope this is the
> correct thing to do:
> localhost:~/linux-2.6.36-rc7:$ cp
> /boot/config-2.6.36-0.35.rc7.git0.fc15.x86_64 .config
> localhost:~/linux-2.6.36-rc7:$ make -j3
> (still building)
>
> Please let me know if I should use a different version and/or a
> different config file. I'll post back when/if I get the bug also with
> the vanilla kernel.

Okay, so this is what I found: When using the vanilla
kernel-2.6.36rc7-2.x86_64 (config file from Fedora Rawhide
config-2.6.36-0.35.rc7.git0.fc15.x86_64), I can not reproduce the bug,
while on the Fedora Rawhide kernel-2.6.36-0.35.rc7.git0.fc15.x86_64 I
can reproduce the bug 100% if I do "service NetworkManager restart".

So what do you suggest? Should I apply some patches to the vanilla
kernel to see what is causing this? Suggestions appreciated.

Thanks,
MartinG

Note You need to log in before you can comment on or make changes to this bug.