Bug 83201

Summary: CPU soft lockups in nouveau under load
Product: Drivers Reporter: Ted Percival (ted)
Component: Video(DRI - non Intel)Assignee: drivers_video-dri
Status: RESOLVED INVALID    
Severity: normal    
Priority: P1    
Hardware: x86-64   
OS: Linux   
Kernel Version: 3.17.0-rc1-00231-g7be141d Subsystem:
Regression: No Bisected commit-id:
Attachments: Full boot/run log

Description Ted Percival 2014-08-25 16:44:12 UTC
Created attachment 148071 [details]
Full boot/run log

I'm seeing a ton of CPU soft lockups in 3.17.0-rc1-00231-g7be141d when building a kernel (make -j8). It seems to lock up hard enough that I have to hard power off.

Sorry if this is the wrong component. I guessed that this is a bug in nouveau.

Full log attached.

Aug 25 10:30:27 slctperciva6520 kernel: [  367.753110] NMI watchdog: BUG: soft lockup - CPU#2 stuck for 22s! [Xorg:4775]
Aug 25 10:30:27 slctperciva6520 kernel: [  367.753114] Modules linked in: bnep rfcomm binfmt_misc uinput nfsd auth_rpcgss oid_registry nfs_acl nfs lockd fscache sunrpc loop hid_generic usbhid hid x86_pkg_temp_thermal snd_hda_codec_hdmi ecb coretemp kvm_intel kvm btusb bluetooth ghash_clmulni_intel snd_hda_codec_idt snd_hda_codec_generic joydev aesni_intel snd_hda_intel snd_hda_controller arc4 snd_hda_codec aes_x86_64 brcmsmac cordic brcmutil b43 ablk_helper snd_hwdep cryptd snd_pcm lrw gf128mul snd_seq snd_timer snd_seq_device snd soundcore ehci_pci glue_helper mac80211 cfg80211 nouveau ssb rng_core pcmcia pcmcia_core mxm_wmi ttm dell_wmi sparse_keymap dell_laptop rfkill ehci_hcd bcma drm_kms_helper drm i2c_algo_bit wmi psmouse usbcore iTCO_wdt iTCO_vendor_support tpm_tis i2c_i801 lpc_ich mfd_core i2ccore evdev serio_raw usb_common dcdbas acpi_cpufreq tpm battery processor video ac button ext4 crc16 jbd2 mbcache sg sd_mod sr_mod crc_t10dif cdrom crct10dif_common crc32c_intel microcode ahci libahci libata scsi_mod firewir
Aug 25 10:30:27 slctperciva6520 kernel: e_ohci sdhci_pci sdhci mmc_core firewire_core crc_itu_t thermal thermal_sys e1000e ptp pps_core
Aug 25 10:30:27 slctperciva6520 kernel: [  367.753181] CPU: 2 PID: 4775 Comm: Xorg Tainted: G        W    L 3.17.0-rc1-00231-g7be141d #4
Aug 25 10:30:27 slctperciva6520 kernel: [  367.753183] Hardware name: Dell Inc. Latitude E6520/0692FT, BIOS A13 05/17/2012
Aug 25 10:30:27 slctperciva6520 kernel: [  367.753185] task: ffff8800ce2d7750 ti: ffff880222c28000 task.ti: ffff880222c28000
Aug 25 10:30:27 slctperciva6520 kernel: [  367.753187] RIP: 0010:[<ffffffff8109e30a>]  [<ffffffff8109e30a>] csd_lock_wait.isra.1+0x7/0xa
Aug 25 10:30:27 slctperciva6520 kernel: [  367.753193] RSP: 0018:ffff880222c2ba90  EFLAGS: 00000202
Aug 25 10:30:27 slctperciva6520 kernel: [  367.753194] RAX: 0000000000000003 RBX: ffff88022dc54d48 RCX: 0000000000000002
Aug 25 10:30:27 slctperciva6520 kernel: [  367.753196] RDX: ffff88022dc77b98 RSI: fffffffffffffffc RDI: ffff88022dc77bb0
Aug 25 10:30:27 slctperciva6520 kernel: [  367.753197] RBP: 0000000000000003 R08: ffff88022dc54d48 R09: 0000000000000000
Aug 25 10:30:27 slctperciva6520 kernel: [  367.753199] R10: 0000000000000008 R11: ffff8800ced85d80 R12: 0000000000000002
Aug 25 10:30:27 slctperciva6520 kernel: [  367.753200] R13: 00000002000c0000 R14: ffff88022dc0de00 R15: 0000000000000296
Aug 25 10:30:27 slctperciva6520 kernel: [  367.753202] FS:  00007f8b97f35880(0000) GS:ffff88022dc40000(0000) knlGS:0000000000000000
Aug 25 10:30:27 slctperciva6520 kernel: [  367.753204] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Aug 25 10:30:27 slctperciva6520 kernel: [  367.753205] CR2: 00007f8b90560000 CR3: 0000000223ae7000 CR4: 00000000000407e0
Aug 25 10:30:27 slctperciva6520 kernel: [  367.753206] Stack:
Aug 25 10:30:27 slctperciva6520 kernel: [  367.753208]  ffffffff8109e8c2 ffff88022364c800 ffffffff00000007 0000000000000007
Aug 25 10:30:27 slctperciva6520 kernel: [  367.753210]  ffffffff810452d7 ffff88022364ca08 ffffffff810452d7 0000000000000000
Aug 25 10:30:27 slctperciva6520 kernel: [  367.753212]  0000000000000001 ffff880222c2bbe8 ffff880222c2bbf0 0000000000000000
Aug 25 10:30:27 slctperciva6520 kernel: [  367.753215] Call Trace:
Aug 25 10:30:27 slctperciva6520 kernel: [  367.753219]  [<ffffffff8109e8c2>] ? smp_call_function_many+0x1e3/0x21a
Aug 25 10:30:27 slctperciva6520 kernel: [  367.753223]  [<ffffffff810452d7>] ? leave_mm+0x9a/0x9a
Aug 25 10:30:27 slctperciva6520 kernel: [  367.753225]  [<ffffffff810452d7>] ? leave_mm+0x9a/0x9a
Aug 25 10:30:27 slctperciva6520 kernel: [  367.753228]  [<ffffffff8109e914>] ? smp_call_function+0x1b/0x1f
Aug 25 10:30:27 slctperciva6520 kernel: [  367.753231]  [<ffffffff8109e940>] ? on_each_cpu+0x12/0x3a
Aug 25 10:30:27 slctperciva6520 kernel: [  367.753233]  [<ffffffff810455df>] ? flush_tlb_kernel_range+0x50/0x55
Aug 25 10:30:27 slctperciva6520 kernel: [  367.753238]  [<ffffffff813d0f4d>] ? _raw_spin_trylock+0x5/0x13
Aug 25 10:30:27 slctperciva6520 kernel: [  367.753241]  [<ffffffff8110ea6f>] ? __purge_vmap_area_lazy+0x2ea/0x351
Aug 25 10:30:27 slctperciva6520 kernel: [  367.753244]  [<ffffffff811fc314>] ? __bitmap_weight+0x27/0x58
Aug 25 10:30:27 slctperciva6520 kernel: [  367.753248]  [<ffffffff8110ef34>] ? free_vmap_area_noflush+0x4f/0x55
Aug 25 10:30:27 slctperciva6520 kernel: [  367.753252]  [<ffffffff8110fc87>] ? remove_vm_area+0x53/0x67
Aug 25 10:30:27 slctperciva6520 kernel: [  367.753254]  [<ffffffff8110fdd2>] ? __vunmap+0xb1/0xc4
Aug 25 10:30:27 slctperciva6520 kernel: [  367.753260]  [<ffffffffa02e8c18>] ? ttm_dma_tt_fini+0x30/0x4a [ttm]
Aug 25 10:30:27 slctperciva6520 kernel: [  367.753281]  [<ffffffffa0387f6b>] ? nouveau_sgdma_destroy+0xe/0x19 [nouveau]
Aug 25 10:30:27 slctperciva6520 kernel: [  367.753286]  [<ffffffffa02e9091>] ? ttm_bo_cleanup_memtype_use+0x36/0x5a [ttm]
Aug 25 10:30:27 slctperciva6520 kernel: [  367.753291]  [<ffffffffa02e9e6e>] ? ttm_bo_release+0xe4/0x1c2 [ttm]
Aug 25 10:30:27 slctperciva6520 kernel: [  367.753296]  [<ffffffffa02e9d8a>] ? ttm_bo_delayed_workqueue+0x21/0x21 [ttm]
Aug 25 10:30:27 slctperciva6520 kernel: [  367.753300]  [<ffffffffa02e9030>] ? kref_sub+0x32/0x3c [ttm]
Aug 25 10:30:27 slctperciva6520 kernel: [  367.753318]  [<ffffffffa038a8b2>] ? nouveau_gem_object_del+0x50/0x56 [nouveau]
Aug 25 10:30:27 slctperciva6520 kernel: [  367.753324]  [<ffffffffa0261b79>] ? drm_gem_object_unreference_unlocked+0x38/0x55 [drm]
Aug 25 10:30:27 slctperciva6520 kernel: [  367.753331]  [<ffffffffa0261d26>] ? drm_gem_handle_delete+0xa4/0xb3 [drm]
Aug 25 10:30:27 slctperciva6520 kernel: [  367.753338]  [<ffffffffa02627cb>] ? drm_ioctl+0x288/0x3e3 [drm]
Aug 25 10:30:27 slctperciva6520 kernel: [  367.753342]  [<ffffffff81111e65>] ? free_pages_and_swap_cache+0x45/0x5b
Aug 25 10:30:27 slctperciva6520 kernel: [  367.753351]  [<ffffffffa02621c6>] ? drm_gem_handle_create+0x37/0x37 [drm]
Aug 25 10:30:27 slctperciva6520 kernel: [  367.753356]  [<ffffffff811026b5>] ? tlb_finish_mmu+0xb/0x2f
Aug 25 10:30:27 slctperciva6520 kernel: [  367.753360]  [<ffffffff8111fb34>] ? __cache_free.isra.45+0x1e8/0x1f7
Aug 25 10:30:27 slctperciva6520 kernel: [  367.753364]  [<ffffffff813d1016>] ? _raw_spin_unlock_irqrestore+0xc/0xd
Aug 25 10:30:27 slctperciva6520 kernel: [  367.753382]  [<ffffffffa038505a>] ? nouveau_drm_ioctl+0x74/0xa7 [nouveau]
Aug 25 10:30:27 slctperciva6520 kernel: [  367.753385]  [<ffffffff811399d4>] ? do_vfs_ioctl+0x3ed/0x436
Aug 25 10:30:27 slctperciva6520 kernel: [  367.753389]  [<ffffffff8106cc36>] ? vtime_account_user+0x35/0x40
Aug 25 10:30:27 slctperciva6520 kernel: [  367.753392]  [<ffffffff810e23b0>] ? context_tracking_user_exit+0x48/0xa3
Aug 25 10:30:27 slctperciva6520 kernel: [  367.753395]  [<ffffffff81139a66>] ? SyS_ioctl+0x49/0x77
Aug 25 10:30:27 slctperciva6520 kernel: [  367.753397]  [<ffffffff813d16d8>] ? tracesys+0x7e/0xe2
Aug 25 10:30:27 slctperciva6520 kernel: [  367.753399]  [<ffffffff813d1737>] ? tracesys+0xdd/0xe2
Aug 25 10:30:27 slctperciva6520 kernel: [  367.753400] Code: 90 66 66 90 c3 c3 f3 48 0f b8 c7 c3 ff c7 50 48 89 f0 48 63 d7 be 00 02 00 00 48 89 c7 e8 27 fb 15 00 5a c3 eb 02 f3 90 f6 07 01 <75> f9 c3 41 57 31 c0 49 89 d7 41 56 49 89 ce b9 08 00 00 00 41
Comment 1 Ted Percival 2014-08-25 17:00:16 UTC
Earlier in the log I see a machine check exception

[  300.954634] mce: [Hardware Error]: Machine check events logged

Maybe this is just the result of overheating some hardware?

I didn't see the problem on my old 3.2 distro kernel (3.2.60-1+deb7u3) though.
Comment 2 Ted Percival 2014-08-27 17:06:13 UTC
I've seen this on my old 3.2 kernel too. My hardware sucks. Sorry about the noise.