Bug 75211
Summary: | Division error in radeon_compute_pll_avivo (X hang) | ||
---|---|---|---|
Product: | Drivers | Reporter: | Darren Salt (bugspam) |
Component: | Video(DRI - non Intel) | Assignee: | drivers_video-dri |
Status: | NEW --- | ||
Severity: | normal | CC: | deathsimple, szg00000 |
Priority: | P1 | ||
Hardware: | x86-64 | ||
OS: | Linux | ||
Kernel Version: | 3.15-rc3 | Subsystem: | |
Regression: | No | Bisected commit-id: | |
Attachments: | Possible fix. |
Description
Darren Salt
2014-05-01 01:45:16 UTC
(In reply to Darren Salt from comment #0) > I'm seeing a divide error during X startup, causing X to hang (requiring > reboot to clear). The trigger is a commit in xserver git: > > commit 4c3932620c29c91dfbbc8eb09c84efcaa7ec873e > Author: Keith Packard <keithp@keithp.com> > Date: Fri Apr 25 08:22:15 2014 -0700 > > hw/xfree86: Restore API compatibility for cursor loading functions That commit broke the Xorg video driver ABI. Did you rebuild xf86-video-ati Git[0] when crossing this xserver Git commit? Apart from the X problem the kernel shouldn't run into a div zero error but return -EINVAL instead when some of the parameters are invalid. Going to take a look into that direction as well. I didn't rebuild the DDX. (May be worth doing just to see if the kernel bug is still triggered.) It appears that you do actually need mismatched builds of xserver git and xf86-video-ati git, the latter built against (ideally) 1.15.99.902, to trigger this kernel bug... I wonder why it's gone apparently unnoticed until now. ☺ (In reply to Darren Salt from comment #4) > It appears that you do actually need mismatched builds of xserver git and > xf86-video-ati git, the latter built against (ideally) 1.15.99.902, to > trigger this kernel bug... I wonder why it's gone apparently unnoticed until > now. ☺ I've reworked that function for 3.15. It's likely that with mismatched DDX and X you pass some parameters as zero into the kernel and so trigger a divide by zero error. Allready digging into this, just give me a day or two to catch up with the bugs. And please provide the output of: gdb /lib/modules/$(uname -r)/kernel/drivers/gpu/drm/radeon/radeon.ko -ex 'list *(radeon_compute_pll_avivo+0xb6)' -ex q Thanks in advance, Christian. Can't do that (radeon module is built in, and no debug symbols – should switch that on), but ksymoops output (below) is clear enough for me to determine that the marked line is where the bug makes itself known: if (pll->flags & RADEON_PLL_USE_FRAC_FB_DIV) { vco_min *= 10; vco_max *= 10; } /* here */ post_div_min = vco_min / target_clock; if ((target_clock * post_div_min) < vco_min) ++post_div_min; if (post_div_min < pll->min_post_div) post_div_min = pll->min_post_div; Which means that target_clock is zero, which means that freq is 0 or freq/10 is 0. >>RIP; ffffffff812513d0 <radeon_compute_pll_avivo+b6/324> <===== >>RAX; 00000000000927c0 <__per_cpu_end+804c0/fedd00> >>RBX; ffff880236dc8210 <phys_startup_64+ffff880235dc8210/ffffffff80000000> >>RCX; 0000000000124f80 <__per_cpu_end+112c80/fedd00> >>RSI; 00000000000927c0 <__per_cpu_end+804c0/fedd00> >>RBP; ffff8800bd6d2c00 <phys_startup_64+ffff8800bc6d2c00/ffffffff80000000> >>R08; ffff88021b08fb08 <phys_startup_64+ffff88021a08fb08/ffffffff80000000> >>R09; ffff88021b08fb00 <phys_startup_64+ffff88021a08fb00/ffffffff80000000> >>R12; ffff880236dc8000 <phys_startup_64+ffff880235dc8000/ffffffff80000000> >>R14; 0000000000004ff6 <exception_stacks+ff6/6000> Trace; ffffffff81227da9 <drm_detect_hdmi_monitor+54/69> Trace; ffffffff8123ed95 <atombios_crtc_mode_set+100/593> Trace; ffffffff812d6725 <class_dev_iter_next+c/33> Trace; ffffffff812d68ed <class_for_each_device+8a/9f> Trace; ffffffff81213ccd <drm_crtc_helper_set_mode+2a3/424> Trace; ffffffff812145a5 <drm_crtc_helper_set_config+601/853> Trace; ffffffff81250650 <radeon_crtc_set_config+3e/d9> Trace; ffffffff81222183 <drm_mode_set_config_internal+48/c0> Trace; ffffffff81224843 <drm_mode_setcrtc+3d3/487> Trace; ffffffff81250dac <radeon_crtc_load_lut+275/630> Trace; ffffffff81219f06 <drm_ioctl+337/3a6> Trace; ffffffff81224470 <drm_mode_setcrtc+0/487> Trace; ffffffff81468fcc <_raw_spin_unlock_irqrestore+f/21> Trace; ffffffff81234864 <radeon_drm_ioctl+42/6e> Trace; ffffffff810b10a7 <do_vfs_ioctl+356/420> Trace; ffffffff810b8747 <__fget+64/6c> Trace; ffffffff810b11a4 <sys_ioctl+33/58> Trace; ffffffff81469ca6 <system_call_fastpath+1a/1f> Code; ffffffff812513a5 <radeon_compute_pll_avivo+8b/324> 0000000000000000 <_RIP>: Code; ffffffff812513a5 <radeon_compute_pll_avivo+8b/324> 0: 09 8b 6b 08 89 6c or %ecx,0x6c89086b(%rbx) Code; ffffffff812513ab <radeon_compute_pll_avivo+91/324> 6: 24 20 and $0x20,%al Code; ffffffff812513ad <radeon_compute_pll_avivo+93/324> 8: eb 5f jmp 69 <_RIP+0x69> Code; ffffffff812513af <radeon_compute_pll_avivo+95/324> a: 80 e5 20 and $0x20,%ch Code; ffffffff812513b2 <radeon_compute_pll_avivo+98/324> d: 74 08 je 17 <_RIP+0x17> Code; ffffffff812513b4 <radeon_compute_pll_avivo+9a/324> f: 8b 73 1c mov 0x1c(%rbx),%esi Code; ffffffff812513b7 <radeon_compute_pll_avivo+9d/324> 12: 8b 4b 20 mov 0x20(%rbx),%ecx Code; ffffffff812513ba <radeon_compute_pll_avivo+a0/324> 15: eb 06 jmp 1d <_RIP+0x1d> Code; ffffffff812513bc <radeon_compute_pll_avivo+a2/324> 17: 8b 73 14 mov 0x14(%rbx),%esi Code; ffffffff812513bf <radeon_compute_pll_avivo+a5/324> 1a: 8b 4b 18 mov 0x18(%rbx),%ecx Code; ffffffff812513c2 <radeon_compute_pll_avivo+a8/324> 1d: 85 ff test %edi,%edi Code; ffffffff812513c4 <radeon_compute_pll_avivo+aa/324> 1f: 74 06 je 27 <_RIP+0x27> Code; ffffffff812513c6 <radeon_compute_pll_avivo+ac/324> 21: 6b f6 0a imul $0xa,%esi,%esi Code; ffffffff812513c9 <radeon_compute_pll_avivo+af/324> 24: 6b c9 0a imul $0xa,%ecx,%ecx Code; ffffffff812513cc <radeon_compute_pll_avivo+b2/324> 27: 31 d2 xor %edx,%edx Code; ffffffff812513ce <radeon_compute_pll_avivo+b4/324> 29: 89 f0 mov %esi,%eax Code; ffffffff812513d0 <radeon_compute_pll_avivo+b6/324> <===== 2b: 41 f7 f5 div %r13d <===== Code; ffffffff812513d3 <radeon_compute_pll_avivo+b9/324> 2e: 89 c2 mov %eax,%edx Code; ffffffff812513d5 <radeon_compute_pll_avivo+bb/324> 30: 8d 68 01 lea 0x1(%rax),%ebp Code; ffffffff812513d8 <radeon_compute_pll_avivo+be/324> 33: 41 0f af d5 imul %r13d,%edx Code; ffffffff812513dc <radeon_compute_pll_avivo+c2/324> 37: 39 f2 cmp %esi,%edx Code; ffffffff812513de <radeon_compute_pll_avivo+c4/324> 39: 8b 53 30 mov 0x30(%rbx),%edx Code; ffffffff812513e1 <radeon_compute_pll_avivo+c7/324> 3c: 0f 43 e8 cmovae %eax,%ebp Code; ffffffff812513e4 <radeon_compute_pll_avivo+ca/324> 3f: 89 .byte 0x89 Created attachment 134701 [details]
Possible fix.
Please try to reproduce the issue with the attached patch applied.
It would still not work correctly (specifying no clock can't work correctly), but it shouldn't crash any more.
No crash now. However, the kernel log is spammed with [drm:drm_crtc_helper_set_config] *ERROR* failed to set mode on [CRTC:14] I did notice what looked like some X spam too, but that's been lost over an X restart; fairly sure that it was ‘invalid argument’ (which would correspond to that -EINVAL). I had to kill it (telling it that it had segfaulted worked nicely) and there's no evidence of that in Xorg.0.log.old. I'd say that that DRM error needs to be rate-limited and/or X needs to give up, or at least do something different than what it currently does, if it can't set the mode. (I may be able to get more information about what X is doing if it's needed, but it seemed more important to report these findings first; and that information belongs elsewhere anyway.) (In reply to Darren Salt from comment #9) > No crash now. Good, thanks for testing. > However, the kernel log is spammed with > [drm:drm_crtc_helper_set_config] *ERROR* failed to set mode on [CRTC:14] Well, X is sending totally nonsense to the kernel. It's actually good that the logs get spammed so somebody notices that something is really going wrong here. |