Bug 83201 - CPU soft lockups in nouveau under load
Summary: CPU soft lockups in nouveau under load
Status: RESOLVED INVALID
Alias: None
Product: Drivers
Classification: Unclassified
Component: Video(DRI - non Intel) (show other bugs)
Hardware: x86-64 Linux
: P1 normal
Assignee: drivers_video-dri
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2014-08-25 16:44 UTC by Ted Percival
Modified: 2014-08-27 17:06 UTC (History)
0 users

See Also:
Kernel Version: 3.17.0-rc1-00231-g7be141d
Subsystem:
Regression: No
Bisected commit-id:


Attachments
Full boot/run log (22.68 KB, application/x-gzip)
2014-08-25 16:44 UTC, Ted Percival
Details

Description Ted Percival 2014-08-25 16:44:12 UTC
Created attachment 148071 [details]
Full boot/run log

I'm seeing a ton of CPU soft lockups in 3.17.0-rc1-00231-g7be141d when building a kernel (make -j8). It seems to lock up hard enough that I have to hard power off.

Sorry if this is the wrong component. I guessed that this is a bug in nouveau.

Full log attached.

Aug 25 10:30:27 slctperciva6520 kernel: [  367.753110] NMI watchdog: BUG: soft lockup - CPU#2 stuck for 22s! [Xorg:4775]
Aug 25 10:30:27 slctperciva6520 kernel: [  367.753114] Modules linked in: bnep rfcomm binfmt_misc uinput nfsd auth_rpcgss oid_registry nfs_acl nfs lockd fscache sunrpc loop hid_generic usbhid hid x86_pkg_temp_thermal snd_hda_codec_hdmi ecb coretemp kvm_intel kvm btusb bluetooth ghash_clmulni_intel snd_hda_codec_idt snd_hda_codec_generic joydev aesni_intel snd_hda_intel snd_hda_controller arc4 snd_hda_codec aes_x86_64 brcmsmac cordic brcmutil b43 ablk_helper snd_hwdep cryptd snd_pcm lrw gf128mul snd_seq snd_timer snd_seq_device snd soundcore ehci_pci glue_helper mac80211 cfg80211 nouveau ssb rng_core pcmcia pcmcia_core mxm_wmi ttm dell_wmi sparse_keymap dell_laptop rfkill ehci_hcd bcma drm_kms_helper drm i2c_algo_bit wmi psmouse usbcore iTCO_wdt iTCO_vendor_support tpm_tis i2c_i801 lpc_ich mfd_core i2ccore evdev serio_raw usb_common dcdbas acpi_cpufreq tpm battery processor video ac button ext4 crc16 jbd2 mbcache sg sd_mod sr_mod crc_t10dif cdrom crct10dif_common crc32c_intel microcode ahci libahci libata scsi_mod firewir
Aug 25 10:30:27 slctperciva6520 kernel: e_ohci sdhci_pci sdhci mmc_core firewire_core crc_itu_t thermal thermal_sys e1000e ptp pps_core
Aug 25 10:30:27 slctperciva6520 kernel: [  367.753181] CPU: 2 PID: 4775 Comm: Xorg Tainted: G        W    L 3.17.0-rc1-00231-g7be141d #4
Aug 25 10:30:27 slctperciva6520 kernel: [  367.753183] Hardware name: Dell Inc. Latitude E6520/0692FT, BIOS A13 05/17/2012
Aug 25 10:30:27 slctperciva6520 kernel: [  367.753185] task: ffff8800ce2d7750 ti: ffff880222c28000 task.ti: ffff880222c28000
Aug 25 10:30:27 slctperciva6520 kernel: [  367.753187] RIP: 0010:[<ffffffff8109e30a>]  [<ffffffff8109e30a>] csd_lock_wait.isra.1+0x7/0xa
Aug 25 10:30:27 slctperciva6520 kernel: [  367.753193] RSP: 0018:ffff880222c2ba90  EFLAGS: 00000202
Aug 25 10:30:27 slctperciva6520 kernel: [  367.753194] RAX: 0000000000000003 RBX: ffff88022dc54d48 RCX: 0000000000000002
Aug 25 10:30:27 slctperciva6520 kernel: [  367.753196] RDX: ffff88022dc77b98 RSI: fffffffffffffffc RDI: ffff88022dc77bb0
Aug 25 10:30:27 slctperciva6520 kernel: [  367.753197] RBP: 0000000000000003 R08: ffff88022dc54d48 R09: 0000000000000000
Aug 25 10:30:27 slctperciva6520 kernel: [  367.753199] R10: 0000000000000008 R11: ffff8800ced85d80 R12: 0000000000000002
Aug 25 10:30:27 slctperciva6520 kernel: [  367.753200] R13: 00000002000c0000 R14: ffff88022dc0de00 R15: 0000000000000296
Aug 25 10:30:27 slctperciva6520 kernel: [  367.753202] FS:  00007f8b97f35880(0000) GS:ffff88022dc40000(0000) knlGS:0000000000000000
Aug 25 10:30:27 slctperciva6520 kernel: [  367.753204] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Aug 25 10:30:27 slctperciva6520 kernel: [  367.753205] CR2: 00007f8b90560000 CR3: 0000000223ae7000 CR4: 00000000000407e0
Aug 25 10:30:27 slctperciva6520 kernel: [  367.753206] Stack:
Aug 25 10:30:27 slctperciva6520 kernel: [  367.753208]  ffffffff8109e8c2 ffff88022364c800 ffffffff00000007 0000000000000007
Aug 25 10:30:27 slctperciva6520 kernel: [  367.753210]  ffffffff810452d7 ffff88022364ca08 ffffffff810452d7 0000000000000000
Aug 25 10:30:27 slctperciva6520 kernel: [  367.753212]  0000000000000001 ffff880222c2bbe8 ffff880222c2bbf0 0000000000000000
Aug 25 10:30:27 slctperciva6520 kernel: [  367.753215] Call Trace:
Aug 25 10:30:27 slctperciva6520 kernel: [  367.753219]  [<ffffffff8109e8c2>] ? smp_call_function_many+0x1e3/0x21a
Aug 25 10:30:27 slctperciva6520 kernel: [  367.753223]  [<ffffffff810452d7>] ? leave_mm+0x9a/0x9a
Aug 25 10:30:27 slctperciva6520 kernel: [  367.753225]  [<ffffffff810452d7>] ? leave_mm+0x9a/0x9a
Aug 25 10:30:27 slctperciva6520 kernel: [  367.753228]  [<ffffffff8109e914>] ? smp_call_function+0x1b/0x1f
Aug 25 10:30:27 slctperciva6520 kernel: [  367.753231]  [<ffffffff8109e940>] ? on_each_cpu+0x12/0x3a
Aug 25 10:30:27 slctperciva6520 kernel: [  367.753233]  [<ffffffff810455df>] ? flush_tlb_kernel_range+0x50/0x55
Aug 25 10:30:27 slctperciva6520 kernel: [  367.753238]  [<ffffffff813d0f4d>] ? _raw_spin_trylock+0x5/0x13
Aug 25 10:30:27 slctperciva6520 kernel: [  367.753241]  [<ffffffff8110ea6f>] ? __purge_vmap_area_lazy+0x2ea/0x351
Aug 25 10:30:27 slctperciva6520 kernel: [  367.753244]  [<ffffffff811fc314>] ? __bitmap_weight+0x27/0x58
Aug 25 10:30:27 slctperciva6520 kernel: [  367.753248]  [<ffffffff8110ef34>] ? free_vmap_area_noflush+0x4f/0x55
Aug 25 10:30:27 slctperciva6520 kernel: [  367.753252]  [<ffffffff8110fc87>] ? remove_vm_area+0x53/0x67
Aug 25 10:30:27 slctperciva6520 kernel: [  367.753254]  [<ffffffff8110fdd2>] ? __vunmap+0xb1/0xc4
Aug 25 10:30:27 slctperciva6520 kernel: [  367.753260]  [<ffffffffa02e8c18>] ? ttm_dma_tt_fini+0x30/0x4a [ttm]
Aug 25 10:30:27 slctperciva6520 kernel: [  367.753281]  [<ffffffffa0387f6b>] ? nouveau_sgdma_destroy+0xe/0x19 [nouveau]
Aug 25 10:30:27 slctperciva6520 kernel: [  367.753286]  [<ffffffffa02e9091>] ? ttm_bo_cleanup_memtype_use+0x36/0x5a [ttm]
Aug 25 10:30:27 slctperciva6520 kernel: [  367.753291]  [<ffffffffa02e9e6e>] ? ttm_bo_release+0xe4/0x1c2 [ttm]
Aug 25 10:30:27 slctperciva6520 kernel: [  367.753296]  [<ffffffffa02e9d8a>] ? ttm_bo_delayed_workqueue+0x21/0x21 [ttm]
Aug 25 10:30:27 slctperciva6520 kernel: [  367.753300]  [<ffffffffa02e9030>] ? kref_sub+0x32/0x3c [ttm]
Aug 25 10:30:27 slctperciva6520 kernel: [  367.753318]  [<ffffffffa038a8b2>] ? nouveau_gem_object_del+0x50/0x56 [nouveau]
Aug 25 10:30:27 slctperciva6520 kernel: [  367.753324]  [<ffffffffa0261b79>] ? drm_gem_object_unreference_unlocked+0x38/0x55 [drm]
Aug 25 10:30:27 slctperciva6520 kernel: [  367.753331]  [<ffffffffa0261d26>] ? drm_gem_handle_delete+0xa4/0xb3 [drm]
Aug 25 10:30:27 slctperciva6520 kernel: [  367.753338]  [<ffffffffa02627cb>] ? drm_ioctl+0x288/0x3e3 [drm]
Aug 25 10:30:27 slctperciva6520 kernel: [  367.753342]  [<ffffffff81111e65>] ? free_pages_and_swap_cache+0x45/0x5b
Aug 25 10:30:27 slctperciva6520 kernel: [  367.753351]  [<ffffffffa02621c6>] ? drm_gem_handle_create+0x37/0x37 [drm]
Aug 25 10:30:27 slctperciva6520 kernel: [  367.753356]  [<ffffffff811026b5>] ? tlb_finish_mmu+0xb/0x2f
Aug 25 10:30:27 slctperciva6520 kernel: [  367.753360]  [<ffffffff8111fb34>] ? __cache_free.isra.45+0x1e8/0x1f7
Aug 25 10:30:27 slctperciva6520 kernel: [  367.753364]  [<ffffffff813d1016>] ? _raw_spin_unlock_irqrestore+0xc/0xd
Aug 25 10:30:27 slctperciva6520 kernel: [  367.753382]  [<ffffffffa038505a>] ? nouveau_drm_ioctl+0x74/0xa7 [nouveau]
Aug 25 10:30:27 slctperciva6520 kernel: [  367.753385]  [<ffffffff811399d4>] ? do_vfs_ioctl+0x3ed/0x436
Aug 25 10:30:27 slctperciva6520 kernel: [  367.753389]  [<ffffffff8106cc36>] ? vtime_account_user+0x35/0x40
Aug 25 10:30:27 slctperciva6520 kernel: [  367.753392]  [<ffffffff810e23b0>] ? context_tracking_user_exit+0x48/0xa3
Aug 25 10:30:27 slctperciva6520 kernel: [  367.753395]  [<ffffffff81139a66>] ? SyS_ioctl+0x49/0x77
Aug 25 10:30:27 slctperciva6520 kernel: [  367.753397]  [<ffffffff813d16d8>] ? tracesys+0x7e/0xe2
Aug 25 10:30:27 slctperciva6520 kernel: [  367.753399]  [<ffffffff813d1737>] ? tracesys+0xdd/0xe2
Aug 25 10:30:27 slctperciva6520 kernel: [  367.753400] Code: 90 66 66 90 c3 c3 f3 48 0f b8 c7 c3 ff c7 50 48 89 f0 48 63 d7 be 00 02 00 00 48 89 c7 e8 27 fb 15 00 5a c3 eb 02 f3 90 f6 07 01 <75> f9 c3 41 57 31 c0 49 89 d7 41 56 49 89 ce b9 08 00 00 00 41
Comment 1 Ted Percival 2014-08-25 17:00:16 UTC
Earlier in the log I see a machine check exception

[  300.954634] mce: [Hardware Error]: Machine check events logged

Maybe this is just the result of overheating some hardware?

I didn't see the problem on my old 3.2 distro kernel (3.2.60-1+deb7u3) though.
Comment 2 Ted Percival 2014-08-27 17:06:13 UTC
I've seen this on my old 3.2 kernel too. My hardware sucks. Sorry about the noise.

Note You need to log in before you can comment on or make changes to this bug.