Bug 15096 - Resume lock up -- bisected, commit 3a1151e3f124fd1a2c54b8153f510f1a7c715369 - ATI Technologies Inc Mobility Radeon HD 3400
Summary: Resume lock up -- bisected, commit 3a1151e3f124fd1a2c54b8153f510f1a7c715369 -...
Status: CLOSED PATCH_ALREADY_AVAILABLE
Alias: None
Product: ACPI
Classification: Unclassified
Component: Power-Video (show other bugs)
Hardware: All Linux
: P1 high
Assignee: Zhang Rui
URL:
Keywords:
Depends on:
Blocks: 7216 14230
  Show dependency tree
 
Reported: 2010-01-20 23:15 UTC by Rafał Miłecki
Modified: 2010-04-16 19:53 UTC (History)
7 users (show)

See Also:
Kernel Version: 2.6.32
Subsystem:
Regression: Yes
Bisected commit-id:


Attachments
Patch that workarounds resume issue (686 bytes, patch)
2010-01-25 09:07 UTC, Rafał Miłecki
Details | Diff
dmesg (66.80 KB, text/plain)
2010-01-25 09:09 UTC, Rafał Miłecki
Details
Patch that workadounds resume issue and adds more debugging (2.38 KB, patch)
2010-01-25 10:14 UTC, Rafał Miłecki
Details | Diff
dmesg with patch applied (67.33 KB, text/plain)
2010-01-25 10:15 UTC, Rafał Miłecki
Details
patch: init video->backlight->props.brightness before use it (1008 bytes, patch)
2010-01-26 05:53 UTC, Zhang Rui
Details | Diff
lspci -vvxxx (53.39 KB, text/plain)
2010-02-04 08:07 UTC, Rafał Miłecki
Details
ACPI: Restore backlight state at the end of resume (3.59 KB, patch)
2010-02-28 14:42 UTC, Rafael J. Wysocki
Details | Diff
ACPI / PM: Move video resume to a PM notifier (2.97 KB, patch)
2010-02-28 19:12 UTC, Rafael J. Wysocki
Details | Diff

Description Rafał Miłecki 2010-01-20 23:15:14 UTC
Resuming doesn't work for me anymore in 2.6.32. It did in 2.6.31. I test s2ram from init 3 and I bisected problem to:


3a1151e3f124fd1a2c54b8153f510f1a7c715369 is the first bad commit
commit 3a1151e3f124fd1a2c54b8153f510f1a7c715369
Author: Stefan Bader <stefan.bader@canonical.com>
Date:   Fri Aug 21 11:03:05 2009 +0200

    ACPI: video: Loosen strictness of video bus detection code

    BugLink: http://bugs.launchpad.net/bugs/333386

    Currently a video bus device must (beside other criteria) define _DOD and
    _DOS methods to be considered a video device.
    Some broken BIOSes prevented working backlight control by only defining both
    for one (non-existing bus) and only _DOD for the rest. With this patch in
    place the other bus definitions were considered too and backlight control
    started to work again.

    Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
    Acked-by: Zhang Rui <rui.zhang@intel.com>
    Signed-off-by: Len Brown <len.brown@intel.com>


This commit gratefully introduced backlight control for me but causes regression at the same time.
Comment 1 Zhang Rui 2010-01-21 01:56:45 UTC
so s2ram works well in 2.6.32 if you blacklist the acpi video driver, right?
Comment 2 Rafał Miłecki 2010-01-21 09:31:10 UTC
Zhang: even more, if I let it load and just remove it (rmmod) before s2ram, it works fine.
Comment 3 Rafał Miłecki 2010-01-21 11:01:24 UTC
Of course that commit didn't cause real regression in code, it just enabled code that doesn't work for my machine.

Diff: http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=3a1151e3f124fd1a2c54b8153f510f1a7c715369
Comment 4 Rafał Miłecki 2010-01-21 11:01:57 UTC
Well, it mostly does as I can control backlight, just suspend&resume does not :)
Comment 5 Zhang Rui 2010-01-25 01:15:33 UTC
please attach the acpidump output of your laptop.
please comment this line in driver/acpi/video.c
                .resume = acpi_video_resume,
and see if it helps.
Comment 6 Rafał Miłecki 2010-01-25 07:31:55 UTC
@Zhang can you use acpi dump from bug #11682 comment #3 ? It's exactly the same machine, without any BIOS update or any hardware change.
Comment 7 Rafał Miłecki 2010-01-25 09:07:08 UTC
Created attachment 24701 [details]
Patch that workarounds resume issue

@Zhang you were right, acpi_video_resume is guilty one. This patch workarounds problem and add some (I believe that very interesting) debugging messages.
Comment 8 Rafał Miłecki 2010-01-25 09:09:37 UTC
Created attachment 24702 [details]
dmesg

The most important part that comes from my debugging patch:
[  154.681802] ACPI: set_brightness with brightness -268374449 should be called
[  154.681803] ACPI: set_brightness with brightness 0 should be called
[  154.681805] ACPI: set_brightness with brightness -268374449 should be called
[  154.681806] ACPI: set_brightness with brightness -268374449 should be called

So we have 4 (four!) devices, 3 with uninitialized properties and left 1 wants backlight level 0... That's weird for me.
Comment 9 Zhang Rui 2010-01-25 09:17:05 UTC
(In reply to comment #8)
> Created an attachment (id=24702) [details]
> dmesg
> 
> The most important part that comes from my debugging patch:
> [  154.681802] ACPI: set_brightness with brightness -268374449 should be
> called
> [  154.681803] ACPI: set_brightness with brightness 0 should be called
> [  154.681805] ACPI: set_brightness with brightness -268374449 should be
> called
> [  154.681806] ACPI: set_brightness with brightness -268374449 should be
> called
> 
> So we have 4 (four!) devices, 3 with uninitialized properties and left 1
> wants
> backlight level 0... That's weird for me.

the 3 devices with uninitialized properties stands for CRT/TV/DVI devices that don't support backlight switching.

we only need to take care of the LCD device which has backlight support.
> [  154.681803] ACPI: set_brightness with brightness 0 should be called

0 is also a weird value to me.
can you please check your backlight values by "cat /sys/class/backlight/acpi_video0/actual_brightness", and check if the video driver wants to restore the same value during resume.
Comment 10 Rafał Miłecki 2010-01-25 10:14:29 UTC
Created attachment 24703 [details]
Patch that workadounds resume issue and adds more debugging
Comment 11 Rafał Miłecki 2010-01-25 10:15:59 UTC
Created attachment 24704 [details]
dmesg with patch applied

The most important parts:
[   29.819204] ACPI: [suspend] have video device without brightness
[   29.819348] ACPI: [suspend] have video device with brightness (get:8 props:0)
[   29.819352] ACPI: [suspend] have video device without brightness
[   29.819356] ACPI: [suspend] have video device without brightness
...
[   30.262803] ACPI: [resume] set_brightness should not be called
[   30.262805] ACPI: [resume] set_brightness with brightness 0 should be called
[   30.262806] ACPI: [resume] set_brightness should not be called
[   30.262807] ACPI: [resume] set_brightness should not be called
Comment 12 Rafał Miłecki 2010-01-25 10:17:32 UTC
@Zhang: ah sure, I dropper important condition in my first workaround, sorry.

Instead of testing "cat..." I created patch that should give more info. Attached patch and dmesg.

Do you know what we have brightness property not set correctly?
Comment 13 Rafał Miłecki 2010-01-25 11:54:42 UTC
I added more debugging messages and have following findings:

On booting, acpi_video_device_lcd_set_level is called twice:
acpi_video_device_lcd_set_level(f6775840, 100)
acpi_video_device_lcd_set_level(f6775840, 100)

Each time it loops from state 2 to state 10 (inclusive).

Condition:
> if (level == device->brightness->levels[state])
is true for state==10 only

Condition:
> if (device->backlight)
is never true. And so we never set props.brightness at boot time.



To understand reason you've to track acpi_video_device_find_cap.

This function calls "acpi_video_init_brightness" which calls "acpi_video_device_lcd_set_level(device, max_level);" which does not change props.brightness (because we don't have device->backlight).

It's later when function acpi_video_device_find_cap does:
> device->backlight = backlight_device_register(name,
>                       NULL, device, &acpi_backlight_ops);

We should somehow modify this code to call
acpi_video_device_lcd_set_level(device, max_level);
AFTER registering backlight (class) device.
Comment 14 Rafał Miłecki 2010-01-25 12:12:21 UTC
I've tried following:
echo 7 > /sys/class/backlight/acpi_video0/brightness && sleep 5s && s2ram

It gave me following:

Jan 25 12:39:30 linux-g0th kernel: [  153.313254] ACPI handle has no context!
(...)
Jan 25 12:39:30 linux-g0th kernel: [  153.570206] ACPI: [suspend] have video device without brightness
Jan 25 12:39:30 linux-g0th kernel: [  153.570346] ACPI: [suspend] have video device with brightness (get:7 props:7)
Jan 25 12:39:30 linux-g0th kernel: [  153.570350] ACPI: [suspend] have video device without brightness
Jan 25 12:39:30 linux-g0th kernel: [  153.570354] ACPI: [suspend] have video device without brightness

[suspended here]

Jan 25 12:39:30 linux-g0th kernel: [  154.014039] ACPI: [resume] set_brightness should not be called
Jan 25 12:39:30 linux-g0th kernel: [  154.014041] ACPI: [resume] set_brightness with brightness 7 should be called
Jan 25 12:39:30 linux-g0th kernel: [  154.014042] ACPI: [resume] set_brightness should not be called
Jan 25 12:39:30 linux-g0th kernel: [  154.014043] ACPI: [resume] set_brightness should not be called

but it still locks up... Should I care about context error? It's also visible in attached dmesg log.
Comment 15 Zhang Rui 2010-01-26 05:53:55 UTC
Created attachment 24711 [details]
patch: init video->backlight->props.brightness before use it

this is the patch that fixes the video->backlight->props.brightness uninitialized issue.
Thanks for find this issue! :)
Comment 16 Zhang Rui 2010-01-26 05:58:58 UTC
so the problem is:
system locks up if changing backlight during resume, right?
Comment 17 Zhang Rui 2010-01-26 06:03:58 UTC
please attach the output of "grep . /sys/class/backlight/acpi_video*/device/*".
Comment 18 Rafael J. Wysocki 2010-01-27 00:41:07 UTC
First-Bad-Commit : 3a1151e3f124fd1a2c54b8153f510f1a7c715369
Comment 19 Jérôme Glisse 2010-02-02 13:09:24 UTC
Rafael before this patch backlight didn't worked right ? I am wondering if the issue is not with kms deinit the gpu and acpi trying to bang the gpu backlight register after resume before radeon kms resume the gpu.
Comment 20 Rafał Miłecki 2010-02-02 19:08:20 UTC
(In reply to comment #16)
> so the problem is:
> system locks up if changing backlight during resume, right?

That's right. I had idea that maybe we lack some ACPI initialization on resume, decided to check it.
I used my workaround to do not restore backlight on resume and after successful resuming I tried:
# echo 5 > /sys/class/backlight/acpi_video0/brightness

It worked fine.

So this lock up happens when we change brightness while whole resume only. If we change brightness before suspending or after resume is done, it works fine.

I suspect that maybe there is some other driver doing it's resume which affects us. Maybe it does something just before or just after video.c resume? Maybe we don't have enough delay between out resume and other driver resume? Can this be the case?
Comment 21 Rafał Miłecki 2010-02-02 19:09:08 UTC
(In reply to comment #17)
> please attach the output of "grep .
> /sys/class/backlight/acpi_video*/device/*".

/sys/class/backlight/acpi_video0/device/hid:device
/sys/class/backlight/acpi_video0/device/modalias:acpi:device:
/sys/class/backlight/acpi_video0/device/path:\_SB_.PCI0.PEGP.VGA_.LCD_
/sys/class/backlight/acpi_video0/device/uevent:MODALIAS=acpi:device:
Comment 22 Rafał Miłecki 2010-02-02 19:13:13 UTC
(In reply to comment #19)
> Rafael before this patch backlight didn't worked right ? I am wondering if
> the
> issue is not with kms deinit the gpu and acpi trying to bang the gpu
> backlight
> register after resume before radeon kms resume the gpu.

Jérôme: I wrote "... I test s2ram from init 3...", but I agree I was not clear enough. I test suspend/resume in init 3, without radeon even loaded (blacklisted it for testing time). I don't have any FB (radeonfb) even compiled, I boot with "vga=normal".
Comment 23 Rafał Miłecki 2010-02-02 19:37:15 UTC
(In reply to comment #15)
> Created an attachment (id=24711) [details]
> patch: init video->backlight->props.brightness before use it
> 
> this is the patch that fixes the video->backlight->props.brightness
> uninitialized issue.
> Thanks for find this issue! :)

Tested-by: Rafał Miłecki <zajec5@gmail.com>


But as I already wrote, having props.brightness properly filled doesn't fix real issue for me.
Comment 24 Rafał Miłecki 2010-02-02 22:23:49 UTC
Zhand's patch didn't apply cleanly to my git (drm-radeon-testing) so I updated Linus's tree, patched and tried again. Bug still exists in current torvalds/linux-2.6.

I've also tried pm-suspend instead of s2ram, same result.
Comment 25 Rafał Miłecki 2010-02-03 12:08:30 UTC
Default params for my s2ram are: -p -m. I need them to have display resume working correctly (in vga=0, init 3, no radeon mode).

So I though it may be conflict between s2ram which restores GPU mode and video which restores brightness. So I've used "s2ram -f". This caused my display didn't come back (expected, because I need -p -m for that) but Caps Lock didn't work as well. So I locked up again, even without restoring GPU mode by s2ram.

No more ideas :|
Comment 26 Rafał Miłecki 2010-02-03 20:37:13 UTC
OK, I have one more idea. In case of my machine brightness in controlled by GPU (even using ACPI causes modification of GPU register).

Maybe we try to restore brightness (by indirectly touching GPU register) before we wake up GPU? Or maybe before GPU has enough time to wake up?

Could you drop "NEEDINFO" state?
Comment 27 Zhang Rui 2010-02-04 03:22:54 UTC
Are you using the external graphics or integrated Intel graphics?
please attach the acpidump output.
please attach the output of "cat /sys/class/backlight/acpi_video0/devices/path".
Comment 28 Rafał Miłecki 2010-02-04 07:21:03 UTC
(In reply to comment #27)
> Are you using the external graphics or integrated Intel graphics?

It's non-IGP GPU: ATI Mobility Radeon 34xx. You can find more details about how controlling brightness really happens in bug #11682


> please attach the acpidump output.

You already asked for this in comment #5 and I replied in comment #6 . Is linked-attached dump somehow incorrect?


> please attach the output of "cat
> /sys/class/backlight/acpi_video0/devices/path".

\_SB_.PCI0.PEGP.VGA_.LCD_


Reporter should really be allowed to change status to "ASSIGNED" :/

Zhang: do you think there may be something correct in my comment #26 ? Could we somehow make video resume happen after other resume operations?
Comment 29 Zhang Rui 2010-02-04 07:48:28 UTC
(In reply to comment #28)
> (In reply to comment #27)
> > Are you using the external graphics or integrated Intel graphics?
> 
> It's non-IGP GPU: ATI Mobility Radeon 34xx. You can find more details about
> how
> controlling brightness really happens in bug #11682
> 
> 
> > please attach the acpidump output.
> 
> You already asked for this in comment #5 and I replied in comment #6 . Is
> linked-attached dump somehow incorrect?
> 
Sorry, my bad, I mean lspci -vvxxx output.
Comment 30 Zhang Rui 2010-02-04 07:52:28 UTC
I'm wondering if we're poking the _BCM for Intel IGD while using an ATI graphics.
If yes, this doesn't sound right. what do you think?
Comment 31 Rafał Miłecki 2010-02-04 08:07:14 UTC
Created attachment 24905 [details]
lspci -vvxxx
Comment 32 Rafał Miłecki 2010-02-04 08:09:56 UTC
(In reply to comment #30)
> I'm wondering if we're poking the _BCM for Intel IGD while using an ATI
> graphics.
> If yes, this doesn't sound right. what do you think?

Sorry, but I have no understanding of how ACPI actually works :| Can not say if this is correct or not.

I can test it and I see it working great with just exception for resume. As you can see in bug #11682 I had big problems with brightness on this machine.
Comment 33 Rafael J. Wysocki 2010-02-08 19:57:10 UTC
(In reply to comment #19)
> Rafael before this patch backlight didn't worked right ? I am wondering if
> the
> issue is not with kms deinit the gpu and acpi trying to bang the gpu
> backlight
> register after resume before radeon kms resume the gpu.

That sounds plausible.
Comment 34 Rafał Miłecki 2010-02-08 20:09:25 UTC
(In reply to comment #33)
> (In reply to comment #19)
> > Rafael before this patch backlight didn't worked right ? I am wondering if
> the
> > issue is not with kms deinit the gpu and acpi trying to bang the gpu
> backlight
> > register after resume before radeon kms resume the gpu.
> 
> That sounds plausible.

It does not. See comment #22
Comment 35 Rafael J. Wysocki 2010-02-08 20:45:39 UTC
It doesn't matter in your test case, but in my opinion it may be an issue in general.

I wonder what happens if you comment out acpi_video_resume() entirely?
Comment 36 Rafał Miłecki 2010-02-15 20:41:34 UTC
(In reply to comment #35)
> I wonder what happens if you comment out acpi_video_resume() entirely?

I *suspect* that:
1) Resume will work fine
2) Brightness level won't be resumed (it will be max)
3) Driver will report bad brightness level info, until changing it (without lock up).

Let me recompile and confirm.
Comment 37 Rafał Miłecki 2010-02-15 20:59:53 UTC
Suspicions from comment #36 confirmed.

Can I somehow change order of resume, to move ACPI resume happen later than currently it does? Maybe if we do ACPI resume after GPU resume, it will solve problem?

Keep in mind, that after applying workaround (for example by commenting out acpi_video_resume()) I can resume *and* change brightness without problem.
Comment 38 Zhang Rui 2010-02-21 07:29:22 UTC
well, the devices resume order is the same as the order they are initialized.
it means:
1. ACPI video device is initialized before the ATI graphics device.
2. the ACPI video driver changes the backlight levels for several times during initialization.
so I'm wondering why the problem doesn't exist during system boot phase.
Comment 39 Rafał Miłecki 2010-02-21 10:11:20 UTC
Zhang: is this NEEDINFO for me? Not sure what else info I can provide :(

If order is the same on boot and on resume, maybe that's some timing issue? Maybe in case of booting we are slower and there is some bigger delay between loading video and loading radeon?
Comment 40 Rafael J. Wysocki 2010-02-21 22:25:10 UTC
On Sunday 21 February 2010, Rafał Miłecki wrote:
> 2010/2/21 Rafael J. Wysocki <rjw@sisk.pl>:
> > This message has been generated automatically as a part of a report
> > of regressions introduced between 2.6.31 and 2.6.32.
> >
> > The following bug entry is on the current list of known regressions
> > introduced between 2.6.31 and 2.6.32.  Please verify if it still should
> > be listed and let the tracking team know (either way).
> >
> >
> > Bug-Entry       : http://bugzilla.kernel.org/show_bug.cgi?id=15096
> > Subject         : Resume lock up -- bisected, commit
> 3a1151e3f124fd1a2c54b8153f510f1a7c715369
> > Submitter       : Rafał Miłecki <zajec5@gmail.com>
> > Date            : 2010-01-20 23:15 (33 days old)
> 
> Should be, ready for providing info if requested (current NEEDINFO
> probably incorrect).
Comment 41 Rafael J. Wysocki 2010-02-21 22:46:08 UTC
(In reply to comment #37)
> Suspicions from comment #36 confirmed.
> 
> Can I somehow change order of resume, to move ACPI resume happen later than
> currently it does? Maybe if we do ACPI resume after GPU resume, it will solve
> problem?

No, we only need to move the ACPI video driver's resume past the radeon resume.

> Keep in mind, that after applying workaround (for example by commenting out
> acpi_video_resume()) I can resume *and* change brightness without problem.

Hmm.  Perhaps we can call acpi_video_resume() directly from acpi_pm_end().  Can you try that yourself or do you want me to prepare a patch?
Comment 42 Rafał Miłecki 2010-02-22 00:02:04 UTC
Rafael: sorry, but I don't know relation between sleep.c and video.c. How can I call
acpi_video_resume(struct acpi_device *device)
from acpi_pm_end? Is that acpi_device something global, available in one instance only? Should I take it in sleep.c using acpi_bus_get_device?

I've to remove acpi_driver.ops.resume to avoid lock up, so we probably have to make acpi_video_resume not static?

Generally: can you help me with this?
Comment 43 Zhang Rui 2010-02-22 01:59:32 UTC
(In reply to comment #41)
> (In reply to comment #37)
> > Suspicions from comment #36 confirmed.
> > 
> > Can I somehow change order of resume, to move ACPI resume happen later than
> > currently it does? Maybe if we do ACPI resume after GPU resume, it will
> solve
> > problem?
> 
> No, we only need to move the ACPI video driver's resume past the radeon
> resume.
> 
does this mean setting backlight in ACPI video driver before radeon driver loaded is also dangerous...?
Comment 44 Matthew Garrett 2010-02-22 18:23:33 UTC
If you're reinitialising the GPU through userspace, there's no way that this can be expected to work - the gpu won't be reinitialised until after userspace has started. Does it work if you're using kms?
Comment 45 Rafael J. Wysocki 2010-02-28 13:46:06 UTC
(In reply to comment #43)
> (In reply to comment #41)
> > (In reply to comment #37)
> > > Suspicions from comment #36 confirmed.
> > > 
> > > Can I somehow change order of resume, to move ACPI resume happen later
> than
> > > currently it does? Maybe if we do ACPI resume after GPU resume, it will
> > > solve problem?
> > 
> > No, we only need to move the ACPI video driver's resume past the radeon
> > resume.
> > 
> does this mean setting backlight in ACPI video driver before radeon driver
> loaded is also dangerous...?

In principle it's not the right thing to do, because we have no idea what the AML involved in that is going to do.
Comment 46 Rafael J. Wysocki 2010-02-28 14:42:07 UTC
Created attachment 25276 [details]
ACPI: Restore backlight state at the end of resume

For completeness, this patch moves the ACPI video resume to the end of resume.

If the problem is reproducible with the Radeon KMS, please check if this patch helps.
Comment 47 Rafael J. Wysocki 2010-02-28 19:12:01 UTC
Created attachment 25280 [details]
ACPI / PM: Move video resume to a PM notifier

Actually, scratch that.

Please check if this patch helps without the KMS.
Comment 48 Rafał Miłecki 2010-03-04 21:36:18 UTC
(In reply to comment #47)
> Created an attachment (id=25280) [details]
> ACPI / PM: Move video resume to a PM notifier
> 
> Actually, scratch that.
> 
> Please check if this patch helps without the KMS.

OK, now that's for sure. We should not try to touch video (ACPI) before GPU is initialized.

With this patch applied it works great when using KMS. That's what happens on resume when we use KMS:
1) radeon KMS initializes GPU
2) end of resume, video (APCI) restores backlight (with Rafael's patch)
and it works great!

Now, resume without KMS
1) no kernel modules initializes GPU
2) video (ACPI) restores backlight
3) userspace (like pm-utils) initializes GPU
as a result we touch GPU before initialization, it locks up and we do not even hit step 3.

The solution would be to have info about KMS state (is is used?) or about GPU maybe (is is initialized?). We should restore backlight level when GPU is initialized only (like KMS case).


Thanks a lot Rafael for your patch. Maybe I'll start KMS status thread on dri-devel.
Comment 49 Len Brown 2010-04-16 19:53:33 UTC
From ac7729da880e742613129ee6dea0045328670d2d Mon Sep 17 00:00:00 2001
From: Rafael J. Wysocki <rjw@sisk.pl>
Date: Mon, 5 Apr 2010 01:43:51 +0200
Subject: [PATCH] ACPI / PM: Move ACPI video resume to a PM notifier

shipped in linux-2.6.34-rc4
closed

Note You need to log in before you can comment on or make changes to this bug.