Bug 86011 - [drm] "Memory manager not clean during takedown" after vga-switcheroo turn off nvidia card
Summary: [drm] "Memory manager not clean during takedown" after vga-switcheroo turn of...
Status: RESOLVED CODE_FIX
Alias: None
Product: Drivers
Classification: Unclassified
Component: Video(DRI - non Intel) (show other bugs)
Hardware: All Linux
: P1 normal
Assignee: drivers_video-other
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2014-10-10 21:58 UTC by Joaquín Aramendía
Modified: 2014-12-17 11:49 UTC (History)
3 users (show)

See Also:
Kernel Version: 3.16.4
Subsystem:
Regression: Yes
Bisected commit-id:


Attachments
dmesg for crashed/freezed X session (83.88 KB, text/plain)
2014-10-10 21:58 UTC, Joaquín Aramendía
Details
bisect log (1.98 KB, text/plain)
2014-10-16 11:03 UTC, Joaquín Aramendía
Details
fix for runtime power management in nouveau (3.13 KB, patch)
2014-12-01 04:00 UTC, Joaquín Aramendía
Details | Diff

Description Joaquín Aramendía 2014-10-10 21:58:02 UTC
Created attachment 153161 [details]
dmesg for crashed/freezed X session

Hi there.
I'm using a laptop Dell Vostro 3500 with prime-like video card (Nvidia Geforce 310M + Intel i5 integrated). vga-switcheroo turned my dedicated video card at startup with no problem and I was able to use DRI_PRIME variable to run some programs on it. The card was switched on/off well.
Since last couple of kernels the laptop freezes at X login screen. I'm using mainline for daily use (love unstable and breakable systems :) ).
I'm attaching 'dmesg' for a session that crashes and indeed it seems like a bug in nouveau driver (near line 876 in dmesg):

...
[   30.907692] ------------[ cut here ]------------
[   30.907724] WARNING: CPU: 0 PID: 97 at drivers/gpu/drm/drm_mm.c:765 drm_mm_takedown+0x2e/0x30 [drm]()
[   30.907725] Memory manager not clean during takedown.
[   30.907727] Modules linked in: ctr ccm bnep ecb btusb bluetooth joydev mousedev hid_generic usbhid hid uvcvideo videobuf2_vmalloc videobuf2_memops videobuf2_core v4l2_common videodev media coretemp intel_powerclamp kvm_intel kvm snd_hda_codec_hdmi nouveau arc4 brcmsmac cordic brcmutil crc32c_intel snd_hda_codec_idt snd_hda_codec_generic ttm hwmon iTCO_wdt mac80211 serio_raw iTCO_vendor_support microcode cfg80211 dell_wmi sparse_keymap mxm_wmi dell_laptop led_class snd_hda_intel rfkill snd_hda_controller snd_hda_codec psmouse snd_hwdep snd_pcm dcdbas r8169 atkbd libps2 snd_timer mii wmi evdev snd bcma thermal acpi_cpufreq mei_me mei tpm_tis tpm battery processor ac intel_agp mac_hid soundcore i8042 serio lpc_ich i2c_i801 dell_smo8800 intel_ips shpchp ext4 crc16 mbcache jbd2 sd_mod sr_mod cdrom crc_t10dif
[   30.907768]  crct10dif_common ahci libahci libata ehci_pci scsi_mod ehci_hcd usbcore usb_common i915 button intel_gtt i2c_algo_bit video drm_kms_helper drm i2c_core
[   30.907778] CPU: 0 PID: 97 Comm: kworker/u8:5 Not tainted 3.17.0-1-mainline #1
[   30.907780] Hardware name: Dell Inc. Vostro 3500/0NVXFV, BIOS A10 10/25/2010
[   30.907787] Workqueue: kacpi_hotplug acpi_hotplug_work_fn
[   30.907789]  0000000000000000 0000000085188984 ffff8800a630fad0 ffffffff815346e0
[   30.907792]  ffff8800a630fb18 ffff8800a630fb08 ffffffff8106e54d ffff8800a6f26628
[   30.907794]  ffff8800a6f261c0 0000000000000000 0000000000000000 ffff880037452500
[   30.907797] Call Trace:
[   30.907805]  [<ffffffff815346e0>] dump_stack+0x4d/0x6f
[   30.907810]  [<ffffffff8106e54d>] warn_slowpath_common+0x7d/0xa0
[   30.907812]  [<ffffffff8106e5cc>] warn_slowpath_fmt+0x5c/0x80
[   30.907823]  [<ffffffffa002426e>] drm_mm_takedown+0x2e/0x30 [drm]
[   30.907831]  [<ffffffffa003481b>] drm_vma_offset_manager_destroy+0x1b/0x30 [drm]
[   30.907836]  [<ffffffffa059616a>] ttm_bo_device_release+0xfa/0x130 [ttm]
[   30.907857]  [<ffffffffa06f88d8>] nouveau_ttm_fini+0x58/0x80 [nouveau]
[   30.907870]  [<ffffffffa06f3703>] nouveau_drm_unload+0x63/0xc0 [nouveau]
[   30.907879]  [<ffffffffa0020049>] drm_dev_unregister+0x29/0xb0 [drm]
[   30.907887]  [<ffffffffa0020313>] drm_put_dev+0x23/0x70 [drm]
[   30.907901]  [<ffffffffa06f2ed3>] nouveau_drm_device_remove+0x83/0xb0 [nouveau]
[   30.907913]  [<ffffffffa06f2f15>] nouveau_drm_remove+0x15/0x20 [nouveau]
[   30.907917]  [<ffffffff812e42db>] pci_device_remove+0x3b/0xc0
[   30.907922]  [<ffffffff813a9f3f>] __device_release_driver+0x7f/0xf0
[   30.907924]  [<ffffffff813a9fd3>] device_release_driver+0x23/0x30
[   30.907927]  [<ffffffff812de454>] pci_stop_bus_device+0x94/0xa0
[   30.907929]  [<ffffffff812de572>] pci_stop_and_remove_bus_device+0x12/0x20
[   30.907933]  [<ffffffff812fe637>] disable_slot+0x57/0xb0
[   30.907936]  [<ffffffff812fee18>] acpiphp_check_bridge.part.9+0xe8/0x100
[   30.907938]  [<ffffffff812ff204>] acpiphp_hotplug_notify+0xc4/0x240
[   30.907941]  [<ffffffff812ff140>] ? acpiphp_post_dock_fixup+0xd0/0xd0
[   30.907944]  [<ffffffff813266b1>] acpi_device_hotplug+0x3b2/0x40e
[   30.907947]  [<ffffffff8131fd22>] acpi_hotplug_work_fn+0x1e/0x29
[   30.907950]  [<ffffffff81086b85>] process_one_work+0x145/0x400
[   30.907953]  [<ffffffff8108714b>] worker_thread+0x6b/0x4a0
[   30.907955]  [<ffffffff810870e0>] ? init_pwq.part.22+0x10/0x10
[   30.907958]  [<ffffffff8108c06a>] kthread+0xea/0x100
[   30.907961]  [<ffffffff8108bf80>] ? kthread_create_on_node+0x1b0/0x1b0
[   30.907965]  [<ffffffff8153a5fc>] ret_from_fork+0x7c/0xb0
[   30.907968]  [<ffffffff8108bf80>] ? kthread_create_on_node+0x1b0/0x1b0
[   30.907970] ---[ end trace 3a16cce408c11c34 ]---
...

This is followed by other similar traces. I'm blacklisting nouveau as a temporary workaround since that seems not to trigger the bug.

This behaviour was observed in version 3.17-rc7 for the first time and present in 3.17 final. Before that, the laptop worked normally (tested linux 3.16.4 and it works with no issues).

Any help is much appreciated. Hope it can be fixed soon.
Comment 1 Joaquín Aramendía 2014-10-13 11:12:14 UTC
With further tests The same bug is present in version 3.16.4 but is NOT present in version 3.16.3. I'll try to bisect it.
Comment 2 Joaquín Aramendía 2014-10-16 11:03:15 UTC
Created attachment 153941 [details]
bisect log

Finally I bisected it and found first bad commit to be related to i/o 32 bug. I'm not sure how it's related but reverting it worked for me.
Comment 3 Joaquín Aramendía 2014-11-26 22:59:14 UTC
Disregard last bisect log, I got it wrong.

Last good commit is 4659be275b14d7d865573b9d82c8afdb23f875aa from stable kernel tree. First bad commit is 97d30fa3524ff60b43d450012abe8f961d280478.
Comment 4 Alex Deucher 2014-11-27 02:24:34 UTC
It looks like some a acpiphp problem.  See bug 61891.  I don't know why tat patch (97d30fa3524ff6) would trigger it other than perhaps the kernel you are using does not have the acpiphp fix.  Someone more familiar with the nouveau driver would need to comment.  Feel free to revert the commit.
Comment 5 Joaquín Aramendía 2014-12-01 04:00:49 UTC
Created attachment 159291 [details]
fix for runtime power management in nouveau

I made this patch that fixed the issue for me. Applies on top of commit a4c5f39. Should apply with offset to newer kernels but fine.
Comment 7 Joaquín Aramendía 2014-12-17 11:49:39 UTC
Confirmed. It was also backported today to 3.17.y branch. So it's fixed for good

Note You need to log in before you can comment on or make changes to this bug.