When booting my system (GeForce 6150SE nForce 430), it hangs on X-server start-up with a garbled screen:
100% reproducable, kernel and Xorg logs are empty and show nothing unusual.
This is a regression from previous kernel versions, 3.6.6 is fine.
The rest of the system is a standard installation of openSUSE 12.2.
Last good commit is
drm/nv04-nv40/instmem: remove use of nouveau_gpuobj_new_fake()
First bad commit is
drm/nouveau: port all engines to new engine module format
Can't test 16 commits inbetween (70ee6f1cd6911098ddd4c11ee21b69dbe51fb3f9 to ac1499d9573f4aadd1d2beac11fe23af8ce90c24).
Please attach dmesgs from both 3.6.x and 3.7-rcLatest.
What's the problem with bisection? If you can't continue because of build error, please attach it too.
I hope ebb945a94bba2ce8dff7b0942ff2b3f2a52a0a69 is not a culprit... (146 files changed, 14219 insertions(+), 11099 deletions(-))
Concerning bisection problems:
Starting with commit 70ee6f1cd6911098ddd4c11ee21b69dbe51fb3f9 (the first skipped commit), system starts up in VESA modes. Somewhere in the middle of those 16 commits, it changes to a kernel panic (NULL pointer dereference) very early during the boot process. No compilation problems.
Created attachment 86421 [details]
dmesg output with kernel 3.6.6
The machine dies when hitting this bug wit 3.7, so no dmesg output is possible. I can post the content of /var/log/messages, but there is nothing of interest.
I've updated to 3.7-rc5 of today and the first reboot the system started normally (!). I was able to retrieve the dmesg output before the machine died again after ~2 minutes (with the same garbled screen).
The next 5 reboots again failed on X-server start.
Created attachment 86431 [details]
dmesg output with kernel 3.7-rc5+
Created attachment 86461 [details]
does this patch help?
No change :(
Created attachment 86471 [details]
fix for bisection
Can you apply this one on top of 70ee6f1cd6911098ddd4c11ee21b69dbe51fb3f9 (the first commit which fails), verify it helps, and continue with bisection?
Are you sure you that you attached the right patch ? Seems to be the same as the last one...
Will test tomorrow.
Yes, it's the same patch, just ported to mid-rework tree.
Doesn't change anything. This is what happens with commit 70ee6f1cd6911098ddd4c11ee21b69dbe51fb3f9:
[ 3.619558] [drm] Initialized drm 1.1.0 20060810
[ 3.680948] nouveau 0000:00:0d.0: setting latency timer to 64
[ 3.681173] ACPI: PCI Interrupt Link [AIGP] enabled at IRQ 23
[ 3.681447] nouveau [ DEVICE][0000:00:0d.0] BOOT0 : 0x04c000a2
[ 3.681449] nouveau [ DEVICE][0000:00:0d.0] Chipset: NV4C
[ 3.681451] nouveau [ DEVICE][0000:00:0d.0] Family : NV40
[ 3.682383] nouveau [ VBIOS][0000:00:0d.0] checking PRAMIN for image...
[ 3.719160] nouveau [ VBIOS][0000:00:0d.0] ... appears to be valid
[ 3.719162] nouveau [ VBIOS][0000:00:0d.0] using image from PRAMIN
[ 3.719293] nouveau [ VBIOS][0000:00:0d.0] BIT signature found
[ 3.719297] nouveau [ VBIOS][0000:00:0d.0] version 05.61.32.14
[ 3.719433] nouveau [ PFB][0000:00:0d.0] RAM type: unknown
[ 3.719435] nouveau [ PFB][0000:00:0d.0] RAM size: 256 MiB
[ 3.719729] [drm] nouveau 0000:00:0d.0: Detected an NV40 generation card (0x04c000a2)
[ 3.720676] [drm] nouveau 0000:00:0d.0: BIT BIOS found
[ 3.720679] [drm] nouveau 0000:00:0d.0: Bios version 05.61.32.14
[ 3.720681] [drm] nouveau 0000:00:0d.0: TMDS table version 1.1
[ 3.720682] [drm] nouveau 0000:00:0d.0: TMDS table script pointers not stubbed
[ 3.720684] [drm] nouveau 0000:00:0d.0: MXM: no VBIOS data, nothing to do
[ 3.720686] [drm] nouveau 0000:00:0d.0: DCB version 3.0
[ 3.720688] [drm] nouveau 0000:00:0d.0: DCB outp 00: 01000310 00000023
[ 3.720690] [drm] nouveau 0000:00:0d.0: DCB outp 01: 00110204 974f0000
[ 2.718650] ACPI: Invalid Power Resource to register!
[ 3.720692] [drm] nouveau 0000:00:0d.0: DCB conn 00: 0000
[ 3.721518] [TTM] Zone kernel: Available graphics memory: 418854 kiB
[ 3.721520] [TTM] Zone highmem: Available graphics memory: 2691562 kiB
[ 3.721521] [TTM] Initializing pool allocator
[ 3.724190] [drm] nouveau 0000:00:0d.0: 512 MiB GART (aperture)
[ 3.728066] [drm] nouveau 0000:00:0d.0: Saving VGA fonts
[ 3.762935] [drm] nouveau 0000:00:0d.0: DCB type 4 not known
[ 3.762937] [drm] nouveau 0000:00:0d.0: Unknown-1 has no encoders, removing
[ 3.764808] [drm] Supports vblank timestamp caching Rev 1 (10.10.2010).
[ 3.764809] [drm] No driver support for vblank timestamp query.
[ 3.770228] [drm] nouveau 0000:00:0d.0: 1 available performance level(s)
[ 3.770231] [drm] nouveau 0000:00:0d.0: 0: core 425MHz shader 425MHz fanspeed 100%
[ 3.770233] [drm] nouveau 0000:00:0d.0: c:
[ 3.770358] [drm] nouveau 0000:00:0d.0: Failed to idle channel 0.
[ 3.770447] [drm] nouveau 0000:00:0d.0: Setting dpms mode 3 on vga encoder (output 0)
[ 3.791615] [drm] nouveau 0000:00:0d.0: Restoring VGA fonts
[ 3.794408] [drm:drm_mm_takedown] *ERROR* Memory manager not clean. Delaying takedown
[ 3.794440] [drm:drm_mm_takedown] *ERROR* Memory manager not clean. Delaying takedown
[ 3.794586] [TTM] Finalizing pool allocator
[ 3.794619] [TTM] Zone kernel: Used memory at exit: 8 kiB
[ 3.794621] [TTM] Zone highmem: Used memory at exit: 8 kiB
[ 3.794885] [drm:drm_mm_takedown] *ERROR* Memory manager not clean. Delaying takedown
[ 3.795205] ------------[ cut here ]------------
[ 3.795238] WARNING: at drivers/gpu/drm/nouveau/nouveau_gpuobj.c:241 nouveau_gpuobj_takedown+0x103/0x110 [nouveau]()
[ 3.795239] Hardware name: System Product Name
[ 3.795240] Modules linked in: nouveau(+) ttm drm_kms_helper drm i2c_algo_bit mxm_wmi video wmi fan thermal button processor thermal_sys scsi_dh_alua scsi_dh_hp_sw scsi_dh_emc scsi_dh_rdac scsi_dh ata_generic pata_amd pata_jmicron sata_nv
[ 3.795257] Pid: 32, comm: kworker/0:1 Not tainted 3.6.0-2.10-desktop+ #4
[ 3.795258] Call Trace:
[ 3.795266] [<c023726d>] warn_slowpath_common+0x6d/0xa0
[ 3.795290] [<f99b3ae3>] ? nouveau_gpuobj_takedown+0x103/0x110 [nouveau]
[ 3.795312] [<f99b3ae3>] ? nouveau_gpuobj_takedown+0x103/0x110 [nouveau]
[ 3.795316] [<c02372bd>] warn_slowpath_null+0x1d/0x20
[ 3.795339] [<f99b3ae3>] nouveau_gpuobj_takedown+0x103/0x110 [nouveau]
[ 3.795359] [<f996ab64>] ? nv40_instmem_takedown+0x54/0x70 [nouveau]
[ 3.795382] [<f99af4fe>] nouveau_card_init+0x68e/0xec0 [nouveau]
[ 3.795386] [<c022d61a>] ? ioremap_nocache+0x1a/0x20
[ 3.795409] [<f99b02fd>] nouveau_load+0x49d/0x8f0 [nouveau]
[ 3.795432] [<f99ac3aa>] nouveau_drm_load+0x21a/0x250 [nouveau]
[ 3.795445] [<f7abd330>] ? drm_get_minor+0x1f0/0x2b0 [drm]
[ 3.795455] [<f7abf6a3>] drm_get_pci_dev+0x133/0x250 [drm]
[ 3.795460] [<c04a183e>] ? pcibios_set_master+0x7e/0xb0
[ 3.795483] [<f99f631c>] nouveau_pci_probe+0xd/0xf [nouveau]
[ 3.795504] [<f99f62f8>] nouveau_drm_probe+0x95/0xac [nouveau]
[ 3.795507] [<c04a367a>] local_pci_probe+0x5a/0xd0
[ 3.795511] [<c024f3ec>] work_for_cpu_fn+0xc/0x20
[ 3.795513] [<c0251095>] process_one_work+0x115/0x420
[ 3.795516] [<c024f6a0>] ? hweight_long+0x10/0x10
[ 3.795519] [<c024f3e0>] ? move_linked_works+0x80/0x80
[ 3.795521] [<c02516a1>] worker_thread+0x111/0x3b0
[ 3.795524] [<c0262239>] ? complete+0x49/0x60
[ 3.795527] [<c0251590>] ? rescuer_thread+0x1c0/0x1c0
[ 3.795530] [<c025678d>] kthread+0x6d/0x80
[ 3.795532] [<c0256720>] ? kthread_freezable_should_stop+0x50/0x50
[ 3.795536] [<c070caf6>] kernel_thread_helper+0x6/0xd
[ 3.795538] ---[ end trace 7467d93187e1277b ]---
[ 3.795945] nouveau: probe of 0000:00:0d.0 failed with error -12
Can you confirm 70ee6f1cd6911098ddd4c11ee21b69dbe51fb3f9 really is the first commit which fails to initialize?
Sorry, it's 5787640db6ae722aeadb394d480c7ca21b603e34 (the commit before it).
Created attachment 86541 [details]
does 5787640db6ae722aeadb394d480c7ca21b603e34 initialize with this patch?
To make it short: please disregard comment #15.
My original statement was right, 5787640db6ae722aeadb394d480c7ca21b603e34 is the last good commit and 70ee6f1cd6911098ddd4c11ee21b69dbe51fb3f9 doesn't start up.
No idea what went wrong yesterday, tripple-checked now.
I applied your patch on top of 70ee6f but it doesn't help.
Created attachment 86551 [details]
debug patch 2
try this one on top of 70ee6f1cd6911098ddd4c11ee21b69dbe51fb3f9
Created attachment 86561 [details]
another fix for bisection
Ok, this should do the trick. Try it on top of 70ee6f1cd6911098ddd4c11ee21b69dbe51fb3f9 and continue bisection.
(Mainline have this fixed)
Still no change. Anything I can do to debug this further ?
I've tested with 3.8.0-rc1, and now I have a good chance that the system starts.
The criticial point is shortly before the the KDE Desktop appears (near the end of the KDE startup progress screen). If it survives this point, the machine runs stable for hours.
Maybe this gives us a hint of what could be going on ?
Could this have something to do with memory allocation ? Remember that this is an onboard device using a part of the systems RAM.
Maybe I should add: x86 32bit system with 6 GB RAM.
You use kexec for reboot or this happen on cold boot?
I am hitting a bug with very similar symptoms, probably the same bug. Reported at https://bugs.freedesktop.org/show_bug.cgi?id=61321 with bisection also pointing to 70ee6f1 as the first bad commit.
I see next trivial (but interesting to me) differences in same case:
Older (good) kernels show messages on boot:
[ 0.898887] [drm] nouveau 0000:00:0d.0: ======= misaligned reg 0x0060081D =======
[ 0.898919] [drm] nouveau 0000:00:0d.0: ======= misaligned reg 0x0060081D =======
New (broken) kernels - no. Even I do not find (fast search, sorry) even near similar code in nouveau_bios.c.
PS In my case (Gentoo, openbox, tint2, feh - minimalistic) I have sure hang (broken screen and no messages) on mozilla's start (own build seamonkey with very forces GL usage) and IMHO mplayer/xv (but times ago, unsure). Other cases not checked, but at least Gtk+ (in xfce's Terminal) is stable.
(In reply to comment #27)
> I see next trivial (but interesting to me) differences in same case:
> Older (good) kernels show messages on boot:
> [ 0.898887] [drm] nouveau 0000:00:0d.0: ======= misaligned reg 0x0060081D
> [ 0.898919] [drm] nouveau 0000:00:0d.0: ======= misaligned reg 0x0060081D
That's a separate issue and has nothing to do with this bug.
In most cases, it is no bug at all.
(In reply to comment #28)
> That's a separate issue and has nothing to do with this bug.
> See https://bugs.freedesktop.org/show_bug.cgi?id=47182
> In most cases, it is no bug at all.
OK, I just suggested since new DRI code not detect this misalignment - it can be sources of this bug, this is just visual differences.
This is -still- an issue for all kernels >= 3.7. Only kernels up to 3.6.x avoid the locking issue which makes the later ones unusable on systems with this video device, here an on-board nVidia Corporation C61 [GeForce 6100 nForce 405] (rev a2).
This still affects 3.9.9 from fedora 19.
$ lspci -nn | grep VGA
00:0d.0 VGA compatible controller : nVidia Corporation C61 [GeForce 6150SE nForce 430] [10de:03d0] (rev a2)
Indeed, all kernels > 3.6 are unusable. sooner or later the system freezes.
Similar downstream report at https://bugs.gentoo.org/show_bug.cgi?id=472200
Created attachment 131911 [details]
patch: port from <3.7
Try this patch. I have some happy uptime (I have this Gigabyte mb on work desktop and use 3.4 before), but this bug can be too unpredictable. This is port of similar aligning from <3.7, just minimized. Also I don't move "misaligned reg" warning, as nobody cares from 2012. I copy old notes in comments, so you can find original similar places (was: if (reg & 0x1) reg&=...) by this comments.
OOPS! Crushed now. No fix ;(