Bug 86301

Summary: BUG_ON triggered in nouveau driver on !CONFIG_SMP systems
Product: Drivers Reporter: dave.mueller
Component: Video(DRI - non Intel)Assignee: drivers_video-dri
Status: RESOLVED CODE_FIX    
Severity: normal CC: pachoramos1
Priority: P1    
Hardware: Intel   
OS: Linux   
Kernel Version: 3.17 Subsystem:
Regression: Yes Bisected commit-id:
Attachments: Patch I used to fix this bug

Description dave.mueller 2014-10-15 06:51:52 UTC
Since the update to 3.17 a BUG_ON is triggered in the nouveau driver as shown below. 

The problem is caused by the fact that "spin_is_locked()" is always false on !CONFIG_SMP configured kernels.

Guarding each "spin_is_locked()" in the driver with 
"if (IS_ENABLED(CONFIG_SMP))" fixes the problem for me.

 
nouveau  [  DEVICE][0000:01:00.0] BOOT0  : 0x017200a5
nouveau  [  DEVICE][0000:01:00.0] Chipset: NV17 (NV17)
nouveau  [  DEVICE][0000:01:00.0] Family : NV11
nouveau  [   VBIOS][0000:01:00.0] checking PRAMIN for image...
nouveau  [   VBIOS][0000:01:00.0] ... appears to be valid
nouveau  [   VBIOS][0000:01:00.0] using image from PRAMIN
nouveau  [   VBIOS][0000:01:00.0] BMP version 5.15
nouveau  [   VBIOS][0000:01:00.0] version 04.17.00.45.00
nouveau W[  PTIMER][0000:01:00.0] unknown input clock freq
nouveau  [     PFB][0000:01:00.0] RAM type: DDR1
nouveau  [     PFB][0000:01:00.0] RAM size: 64 MiB
nouveau  [     PFB][0000:01:00.0]    ZCOMP: 0 tags
nouveau  [     CLK][0000:01:00.0] --:   
------------[ cut here ]------------
kernel BUG at drivers/gpu/drm/nouveau/core/core/event.c:42!
invalid opcode: 0000 [#1] PREEMPT 
Modules linked in: nouveau(+) cfbfillrect cfbimgblt wmi video backlight snd_emu10k1x cfbcopyarea i2c_algo_bit snd_ac97_codec drm_kms_helper ac97_bus snd_rawmidi ttm snd_seq_device snd_pcm drm snd_timer microcode snd ohci_pci soundcore e100 psmouse ohci_hcd uhci_hcd firewire_ohci i2c_dev sr_mod ehci_pci lpc_ich i2c_i801 processor ehci_hcd firewire_core mfd_core mii evdev serio_raw thermal_sys usbcore firmware_class cdrom parport_pc crc_itu_t i2c_core hwmon usb_common intel_rng intel_agp rtc_cmos parport rng_core intel_gtt floppy agpgart button loop
CPU: 0 PID: 4 Comm: kworker/0:0 Not tainted 3.17.0 #2
Hardware name: Dell Computer Corporation Dimension 4550               /      , BIOS A08 09/23/2003
Workqueue: events nouveau_pstate_work [nouveau]
task: f604cca0 ti: f6058000 task.ti: f6058000
EIP: 0060:[<f8800d7e>] EFLAGS: 00010046 CPU: 0
EIP is at nvkm_event_get+0x0/0x2 [nouveau]
EAX: f4da0e70 EBX: 00000282 ECX: 00000000 EDX: 00000001
ESI: f575cae0 EDI: b0541620 EBP: f67ee300 ESP: f6059f18
 DS: 007b ES: 007b FS: 0000 GS: 0000 SS: 0068
CR0: 8005003b CR2: 080b572c CR3: 46329000 CR4: 000007d0
Stack:
 f8803aef f6026c80 b01390e7 b04afda6 f6059f34 b04c66b3 b0541780 00000046
 00000000 00000000 b0541664 f6026c80 b0541620 f6026c98 b0541630 b013992e
 b0541620 f604cca0 f604cca0 b0541664 b0541780 f6022240 f6026c80 b013984a
Call Trace:
 [<f8803aef>] ? nvkm_notify_get+0x4a/0x4c [nouveau]
 [<b01390e7>] ? process_one_work+0xe2/0x294
 [<b013992e>] ? worker_thread+0xe4/0x40a
 [<b013984a>] ? max_active_store+0x46/0x46
 [<b013ca38>] ? kthread+0x9c/0xb3
 [<b040f7c0>] ? ret_from_kernel_thread+0x20/0x30
 [<b013c99c>] ? kthread_worker_fn+0x10a/0x10a
Code: 51 8c f8 e8 8e 9a c0 b7 8b 03 f7 d0 8b 54 24 08 21 c2 eb cd 89 54 24 04 c7 04 24 78 51 8c f8 e8 72 9a c0 b7 eb c9 66 90 90 0f 0b <0f> 0b 55 57 56 53 83 ec 14 89 c5 89 14 24 8b 44 24 28 89 44 24
EIP: [<f8800d7e>] nvkm_event_get+0x0/0x2 [nouveau] SS:ESP 0068:f6059f18
---[ end trace 285724fbb6399f8b ]---
Comment 1 Pacho Ramos 2014-11-15 18:14:04 UTC
I can confirm this on Gentoo too with kernel 3.17.2
Comment 2 Pacho Ramos 2014-12-11 13:47:43 UTC
Still the case with 3.18

Dave, maybe attaching your patch here could help to get it reviewed/committed sooner 

Thanks
Comment 3 dave.mueller 2014-12-11 14:32:29 UTC
Created attachment 160391 [details]
Patch I used to fix this bug

I think the problem is that nobody (besides you and me) is interested in UP kernels nowadays.
Comment 4 Pacho Ramos 2014-12-11 14:52:07 UTC
Well, I still need to disable SMP either in kernel config or booting with "nosmp", it tends to crash on affected machine :S
Comment 6 Pacho Ramos 2015-01-10 13:59:13 UTC
Thanks!