Bug 16492 - Kernel OOPS with Radeon KMS, bisected.
Kernel OOPS with Radeon KMS, bisected.
Status: CLOSED CODE_FIX
Product: Drivers
Classification: Unclassified
Component: Video(DRI - non Intel)
All Linux
: P1 normal
Assigned To: drivers_video-dri
:
Depends on:
Blocks: 16055
  Show dependency treegraph
 
Reported: 2010-08-02 13:48 UTC by Andrew Clayton
Modified: 2010-08-29 22:47 UTC (History)
4 users (show)

See Also:
Kernel Version: 2.6.35
Tree: Mainline
Regression: Yes


Attachments
possible fix (5.89 KB, patch)
2010-08-02 15:41 UTC, Alex Deucher
Details | Diff

Description Andrew Clayton 2010-08-02 13:48:59 UTC
Software is Fedora 12 x86_64. 
The hardware is a PCI-E Radeon X1950

01:00.0 VGA compatible controller: ATI Technologies Inc R580 [Radeon X1900]
01:00.1 Display controller: ATI Technologies Inc Device 7264

On an Intel G31 based motherboard.

Booting up a 2.6.35 kernel using Radeon KMS, the GPU seemingly locks up at the point gdm should kick in from plymouth. One screen remains on showing the plymouth boot screen and the other display powers off (dual 19" LCD's). The machine can still be accessed remotely.

We get the following OOPS

2:38 zeus kernel: divide error: 0000 [#1] SMP 
Aug  2 09:52:38 zeus kernel: last sysfs file: /sys/devices/pci0000:00/0000:00:1e.0/0000:07:03.0/local_cpus
Aug  2 09:52:38 zeus kernel: CPU 0 
Aug  2 09:52:38 zeus kernel: Pid: 946, comm: plymouthd Not tainted 2.6.35 #45 DG33BU/        
Aug  2 09:52:38 zeus kernel: RIP: 0010:[<ffffffff812e2ca4>]  [<ffffffff812e2ca4>] rv515_bandwidth_avivo_update+0x450/0x5b9
Aug  2 09:52:38 zeus kernel: RSP: 0018:ffff88007a5f3ae8  EFLAGS: 00010246
Aug  2 09:52:38 zeus kernel: RAX: 0000000000000000 RBX: ffff88007ae68000 RCX: 0000000000000000
Aug  2 09:52:38 zeus kernel: RDX: 0000000000000000 RSI: ffff88007e8da000 RDI: ffff88007ae68000
Aug  2 09:52:38 zeus kernel: RBP: 0000000000000000 R08: 000000000000e808 R09: 00000000cdcdcdcd
Aug  2 09:52:38 zeus kernel: R10: dead000000200200 R11: dead000000100100 R12: 0000000000000000
Aug  2 09:52:38 zeus kernel: R13: 0000000000000001 R14: ffff88007e93a000 R15: 0000000000000000
Aug  2 09:52:38 zeus kernel: FS:  00007fa3b25c3700(0000) GS:ffff880001a00000(0000) knlGS:0000000000000000
Aug  2 09:52:38 zeus kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Aug  2 09:52:38 zeus kernel: CR2: 0000003847647488 CR3: 000000007a4cd000 CR4: 00000000000406f0
Aug  2 09:52:38 zeus kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Aug  2 09:52:38 zeus kernel: DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Aug  2 09:52:38 zeus kernel: Process plymouthd (pid: 946, threadinfo ffff88007a5f2000, task ffff88007e875cc0)
Aug  2 09:52:38 zeus kernel: Stack:
Aug  2 09:52:38 zeus kernel: 0000000000000004 ffffffff812b77a2 ffff88007a5f3b28 ffff880000000000
Aug  2 09:52:38 zeus kernel: <0> 0000000000000000 ffff88007ae68000 0000000000000004 ffffffff812ab402
Aug  2 09:52:38 zeus kernel: <0> 000000000000e808 0000000000000282 ffff88007ae68000 ffffffff812e2e19
Aug  2 09:52:38 zeus kernel: Call Trace:
Aug  2 09:52:38 zeus kernel: [<ffffffff812b77a2>] ? atom_execute_table+0x4d/0x58
Aug  2 09:52:38 zeus kernel: [<ffffffff812ab402>] ? radeon_atom_get_memory_clock+0x18/0x20
Aug  2 09:52:38 zeus kernel: [<ffffffff812e2e19>] ? rv515_bandwidth_update+0xc/0x85
Aug  2 09:52:38 zeus kernel: [<ffffffff8130100f>] ? radeon_pm_set_clocks+0x507/0x541
Aug  2 09:52:38 zeus kernel: [<ffffffff812b77a2>] ? atom_execute_table+0x4d/0x58
Aug  2 09:52:38 zeus kernel: [<ffffffff8130125e>] ? radeon_pm_compute_clocks+0x215/0x226
Aug  2 09:52:38 zeus kernel: [<ffffffff81279eb8>] ? drm_helper_disable_unused_functions+0x122/0x15b
Aug  2 09:52:38 zeus kernel: [<ffffffff8127abc1>] ? drm_crtc_helper_set_config+0x5e2/0x77f
Aug  2 09:52:38 zeus kernel: [<ffffffff8128778b>] ? drm_framebuffer_cleanup+0x55/0xd7
Aug  2 09:52:38 zeus kernel: [<ffffffff81288505>] ? drm_mode_rmfb+0x0/0xe4
Aug  2 09:52:38 zeus kernel: [<ffffffff812c38b1>] ? radeon_user_framebuffer_destroy+0x21/0x2a
Aug  2 09:52:38 zeus kernel: [<ffffffff812885d4>] ? drm_mode_rmfb+0xcf/0xe4
Aug  2 09:52:38 zeus kernel: [<ffffffff8127e70c>] ? drm_ioctl+0x21a/0x300
Aug  2 09:52:38 zeus kernel: [<ffffffff810a4d1c>] ? unmap_region+0x103/0x144
Aug  2 09:52:38 zeus kernel: [<ffffffff810c9cae>] ? vfs_ioctl+0x23/0x93
Aug  2 09:52:38 zeus kernel: [<ffffffff810ca202>] ? do_vfs_ioctl+0x461/0x49b
Aug  2 09:52:38 zeus kernel: [<ffffffff810ca278>] ? sys_ioctl+0x3c/0x5e
Aug  2 09:52:38 zeus kernel: [<ffffffff81021dab>] ? system_call_fastpath+0x16/0x1b
Aug  2 09:52:38 zeus kernel: Code: 40 76 18 89 c0 8b 74 24 04 48 c1 e0 0d 31 d2 48 f7 f6 48 8d 48 01 48 d1 e9 eb 04 8b 4c 24 04 89 c9 31 d2 8b 44 24 24 48 c1 e0 0d <48> f7 f1 8b 54 24 0c 48 8d 48 01 8b 44 24 10 48 d1 e9 89 c6 39 
Aug  2 09:52:38 zeus kernel: RIP  [<ffffffff812e2ca4>] rv515_bandwidth_avivo_update+0x450/0x5b9
Aug  2 09:52:38 zeus kernel: RSP <ffff88007a5f3ae8>
Aug  2 09:52:38 zeus kernel: ---[ end trace d58e56ed7c958880 ]---

And in the Xorg log we have

Fatal server error:
xf86OpenConsole: VT_WAITACTIVE failed: Interrupted system call


A git bisect shows the following as the culprit

ce8f53709bf440100cb9d31b1303291551cf517f is the first bad commit
commit ce8f53709bf440100cb9d31b1303291551cf517f
Author: Alex Deucher <alexdeucher@gmail.com>
Date:   Fri May 7 15:10:16 2010 -0400

    drm/radeon/kms/pm: rework power management

    - Separate dynpm and profile based power management methods.  You can select the pm method
      by echoing the selected method ("dynpm" or "profile") to power_method in sysfs.
    - Expose basic 4 profile in profile method
      "default" - default clocks
      "auto" - select between low and high based on ac/dc state
      "low" - DC, low power mode
      "high" - AC, performance mode
      The current base profile is "default", but it should switched to "auto" once we've tested
      on more systems.  Switching the state is a matter of echoing the requested profile to
      power_profile in sysfs.  The lowest power states are selected automatically when dpms turns
      the monitors off in all states but default.
    - Remove dynamic fence-based reclocking for the moment.  We can revisit this later once we
      have basic pm in.
    - Move pm init/fini to modesetting path.  pm is tightly coupled with display state.  Make sure
      display side is initialized before pm.
    - Add pm suspend/resume functions to make sure pm state is properly reinitialized on resume.
    - Remove dynpm module option.  It's now selectable via sysfs.

    Signed-off-by: Alex Deucher <alexdeucher@gmail.com>
    Signed-off-by: Dave Airlie <airlied@redhat.com>

:040000 040000 fa3acba12b9886453c2de619577ae633abfc97bc faf940f2c2791f1382ed1abbfa54e22df7e1c936 M      drivers


Unfortunately that doesn't seem to revert cleanly, so can't actually test it...

$ git revert ce8f53709bf440100cb9d31b1303291551cf517f
warning: too many files (created: 1221 deleted: 217), skipping inexact rename detection
Automatic revert failed.  After resolving the conflicts,
mark the corrected paths with 'git add <paths>' or 'git rm <paths>'
and commit the result.

I'm stuck at that point.

Cheers,
Andrew
Comment 1 Andrew Clayton 2010-08-02 13:56:09 UTC
Should have perhaps mentioned that booting with nomodeset boots up fine.
Comment 2 Alex Deucher 2010-08-02 15:41:25 UTC
Created attachment 27319 [details]
possible fix

Does the attached patch fix the issue?
Comment 3 Andrew Clayton 2010-08-02 16:05:36 UTC
Yeah, nice one!

Cheers,
Andrew
Comment 4 Alex Deucher 2010-08-02 16:14:19 UTC
I've sent the patch to Dave.
Comment 5 Rafael J. Wysocki 2010-08-03 22:26:43 UTC
Patch : https://bugzilla.kernel.org/attachment.cgi?id=27319
Handled-By : Alex Deucher <alexdeucher@gmail.com>
Comment 6 Rafael J. Wysocki 2010-08-29 22:47:29 UTC
Fixed by commit e06b14ee91a2ddefc9a67443a6cd8ee0fa800115 .

Note You need to log in before you can comment on or make changes to this bug.