Bug 21102 - MWAIT 0x30 Atom state vs USB audio playback (acpi_idle and intel_idle) - MSI Wind U110
Summary: MWAIT 0x30 Atom state vs USB audio playback (acpi_idle and intel_idle) - MSI ...
Status: CLOSED INSUFFICIENT_DATA
Alias: None
Product: Power Management
Classification: Unclassified
Component: cpuidle (show other bugs)
Hardware: All Linux
: P1 normal
Assignee: Len Brown
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2010-10-24 17:59 UTC by Dennis Jansen
Modified: 2014-06-02 20:34 UTC (History)
4 users (show)

See Also:
Kernel Version: 3.3.10
Subsystem:
Regression: No
Bisected commit-id:


Attachments
C4 Disabled grep . /sys/devices/system/cpu/cpu*/cpuidle/*/* (2.77 KB, text/plain)
2010-10-25 07:01 UTC, Dennis Jansen
Details
C4 enabled grep . /sys/devices/system/cpu/cpu*/cpuidle/*/* (3.33 KB, text/plain)
2010-10-25 13:33 UTC, Dennis Jansen
Details
side by side diff of both; c4 active on the left side (3.70 KB, text/plain)
2010-10-25 13:36 UTC, Dennis Jansen
Details
AC: grep . /sys/devices/system/cpu/cpu*/cpuidle/*/* (2.04 KB, text/plain)
2010-11-24 18:19 UTC, Dennis Jansen
Details
Battery: grep . /sys/devices/system/cpu/cpu*/cpuidle/*/* (3.33 KB, text/plain)
2010-11-24 18:20 UTC, Dennis Jansen
Details
acpidump by pmtools-20101221 (152.27 KB, text/plain)
2011-02-15 18:20 UTC, Dennis Jansen
Details

Description Dennis Jansen 2010-10-24 17:59:26 UTC
Previously I reported here (https://bugzilla.kernel.org/show_bug.cgi?id=19762) that intel_idle didn't support my Atom C4 and C6 idle states. Now that's fixed.

But now my computer has the problem that usb-audio playback is very corrupted and thus unusable. It is directly related not only to intel_idle, but exactly the C4 state:
1. I diagnosed the problem
2. I started with max_cstate=2 -> no problems
3. I started with max_cstate=4 -> extremely bad problems with sound
4. I patched intel_idle and disabled (commented out) the C4 state -> no problems.

Any idea what a prettier solution to the problem might be?
Thanks! :)
Comment 1 Len Brown 2010-10-24 19:15:37 UTC
does the problem go away if you boot with "nolapic_timer"?
Comment 2 Len Brown 2010-10-24 19:33:19 UTC
When you modify intel_idle.c's atom_cstates[]
to disable ATM-C4, yet keep ATM-C6 you do not see the problem?

Please share the output from
grep . /sys/devices/system/cpu/cpu*/cpuidle/*/*
for that working scenario.

When you disable ATM-C6 yet keep ATM-C4 enabled then
you have sound problems?  Please share the output from
grep . /sys/devices/system/cpu/cpu*/cpuidle/*/*
for that failing scenario.
Comment 3 Dennis Jansen 2010-10-25 07:01:29 UTC
Created attachment 34922 [details]
C4 Disabled grep . /sys/devices/system/cpu/cpu*/cpuidle/*/*

yes, when I disable C4 and it's just in C6 the problem does not appear. As it's *much* worse with only C4 it seems that it's just the skipping through C4 that causes the problem in the first place.

I have added the grep for C4 disabled now.
I will add the grep with C4 enabled and try nolapic_timer in a few hours.
Comment 4 Dennis Jansen 2010-10-25 13:33:09 UTC
Created attachment 34992 [details]
C4 enabled grep . /sys/devices/system/cpu/cpu*/cpuidle/*/*

I've tried nolapic_timer. The cpu is in 100% polling mode and the problem does not appear either. I've also attached the grep with C4 enabled. That should be everything you've mentioned then.
Comment 5 Dennis Jansen 2010-10-25 13:36:33 UTC
Created attachment 35002 [details]
side by side diff of both; c4 active on the left side
Comment 6 Dennis Jansen 2010-10-25 16:47:20 UTC
I think I forgot to make this clear: I think the bug exists with processor.ko as well. The difference is just that there C4 gets disabled when I plug in AC so I usually won't notice it. Maybe the assumption that during bus master there is a no op does not apply here? Just random guessing... Don't think it would make sense then that it's *only* in C4.
Comment 7 Dennis Jansen 2010-10-27 06:28:29 UTC
Any ideas?
Comment 8 Dennis Jansen 2010-11-04 22:29:30 UTC
Ok, I know this does sound crazy, but it's totally reproducable, hence there has to be a cause somewhere. Could it be that the CPU does not stay in C4 long enough or something?
Comment 9 Dennis Jansen 2010-11-17 22:37:47 UTC
I guess we can change the status to wontfix, right?
Comment 10 Len Brown 2010-11-18 06:31:48 UTC
not ready to give up on addressing this --
just haven't immediately had any good ideas or time:-(

So the problem is with MWAIT 0x30, even though MWAIT 0x52 works, yes?

Please verify that the same failure occurs when using acpi_idle.
(eg boot with intel_idle.max_cstate=0), show the output from
grep . /sys/devices/system/cpu/cpu*/cpuidle/*/*
to verify that ACPI is indeed using MWAIT 0x30, and verify
that the problem happens there too.
Comment 11 Dennis Jansen 2010-11-24 18:19:23 UTC
Created attachment 38082 [details]
AC: grep . /sys/devices/system/cpu/cpu*/cpuidle/*/*

On AC, as before, there is no problem, as it only goes up to C2.
Comment 12 Dennis Jansen 2010-11-24 18:20:15 UTC
Created attachment 38092 [details]
Battery: grep . /sys/devices/system/cpu/cpu*/cpuidle/*/*

On battery, as before, it's pretty bad every time it passes by C3.
Comment 13 Dennis Jansen 2010-11-24 18:22:45 UTC
I actually had the problem with ACPI first, but then only after unplugging AC, which activates the other modes. Remember the other bug, where someone wrote a patch because the ACPI changes didn't show up correctly? ;)

Thanks for checking this out! Maybe we should just test increasing the latency of C3 and see if the problem persists?
Comment 14 Len Brown 2010-11-26 18:55:33 UTC
Thanks for confirming that the issue is present
with ACPI on DC -- when it exposes ACPI C1/C2/C3/C4.

In that scenario, residency is much higher in the
ACPI C4 MWAIT 0x52 state than it is in the problematic
ACPI C3 MWAIT 0x30 state.

Please confirm that in ACPI mode, if you boot with
processor.max_cstate=3 to get rid of this C4 state,
that residency in ACPI C3 increases and with it
the problem is worse.
Comment 15 Dennis Jansen 2010-11-27 10:45:28 UTC
I started with Sorry, in #12 I made a wrong statement, I meant C4, as there is no C3 on my system. The list is C1 C2 C4 C6.
The problem always occurs in C4.

I just booted with intel_idle.max_cstate=0 processor.max_cstate=4
and if anything the problem was worse than with intel_idle. It was very, very bad. I heard more noise than sound.

We have now tested: 
= without intel_idle =
* up to C2 resulting in no problems 
(default without intel_idle in AC)
 
* up to C6 resulting in some residency in C4 and some problems
(default in without intel_idle in DC) 

* up to C4, resulting in very serious problems and high residency in C4.

And we tested the same with intel_idle with the same results.

This means we can definitely conclude that with or without intel_idle, the C4 state is to blame.
Comment 16 Len Brown 2010-11-27 20:12:43 UTC
Yes, the terminology can be confusing, here is a decoder:

ACPI C0 = Atom C0
ACPI C1 = Atom C1 = MWAIT 0x0
ACPI C2 = Atom C2 = MWAIT 0x10
ACPI C3 = Atom C4 = MWAIT 0x30
ACPI C4 = Atom C6 = MWAIT 0x52

Thanks for the confirmation that the MWAIT 0x30 state
causes the issue for both acpi_idle or intel_idle,
and that the MWAIT 0x52 state is not an issue.

---
I don't understand comment #3.
when you boot with "nolapic_timer" the system doesn't enter
deep c-states, but instead cpuidle chooses polling?
---
It is mysterious that MWAIT 0x30 should cause a problem
while MWAIT 0x52 does not, because 0x52 is expected to
be a higher latency C-state.

What do you see if you edit intel_idle.c atom_cstates[]
MWAIT C4 to have exit_latency = 140 and target_residency = 560
and then boot with intel_idle.max_cstate=4 to disable MWAIT 0x52?
Do you still see a lot of MWAIT 0x30 residency and bad sound?

What can you tell me about the sound device?
what does lsusb show?
when it is active, what does powertop show about its interrupt rate?
What do you see if you "watch -d cat /proc/interrupts" when it is active?
Comment 17 Dennis Jansen 2010-11-28 10:31:18 UTC
In comments #4 you probably mean. Yes that's exactly what happened. It went into polling mode. I will try it again now and report if and only if I get different results this time.

----
I know, it's very weird, but it's what happens. I can make a video if you like :)

I tried editing it like you suggested (C4 mode with C6 values). It didn't change anything. C4 still caused the same problem. I would also have thought that this is where the problem would be. Maybe, just maybe, this is a broken CPU just in my case?

The sound device is a "Creative SoundBlaster X-Fi Surround 5.1 USB":
1 [S51            ]: USB-Audio - SB X-Fi Surround 5.1
Creative Technology SB X-Fi Surround 5.1 at usb-0000:00:1d.0-1, full speed

powertop says it's 130-330.

proc interrupts shows about 
* 300 per intervall (2s) in 
  (IO-APIC-fasteoi   uhci_hcd:usb1)
* Timer interrupts go from 80-120 to about 300 per 2 seconds
  (LOC:     441745     505490   Local timer interrupts)
everything else is unchanged.
Comment 18 Dennis Jansen 2010-12-21 13:57:57 UTC
Ok, with nolapic_timer in 2.6.36:

C0 (cpu running) 1,6 %
polling 5.5ms 98,4 %
C1 mwait 0.0ms 0,0 %
C2 mwait 0.0ms 0,0 %
C6 mwait 0.0ms 0,0 %

wakes per second: 180

idle sources:
ca 800-1000 (94%) [extra timer interrupts].
Comment 19 ykzhao 2011-02-15 02:04:20 UTC
Hi, Dennis
    Does the issue still exist on the latest linux kernel(for example: 2.6.38-rc4)?

    It will be great if you can attach the output of acpidump on your box. Please use the latest acpidump tool, which can be downloaded from: PMtools-20101221
    http://www.kernel.org/pub/linux/kernel/people/lenb/acpi/utils/

Thanks.
Comment 20 Dennis Jansen 2011-02-15 18:20:02 UTC
Created attachment 47902 [details]
acpidump by pmtools-20101221
Comment 21 Dennis Jansen 2011-02-16 06:08:35 UTC
And the issue still exists in 2.6.38-rc4.
Comment 22 Len Brown 2011-08-01 16:55:05 UTC
changing category to "cpuidle" from "intel_idle",
since acpi_idle and intel_idle both see the same problem
with MWAIT 0x30 on this box.

Dennis,
just for grins...
with MWAIT 0x30 disabled,
what do you see if you change MWAIT 0x52
to 0x50 or 0x51?  Do those states work fine?
Comment 23 Zhang Rui 2012-01-18 02:22:02 UTC
It's great that kernel bugzilla is back.

can you please verify if the problem still exists in the latest upstream
kernel?
Comment 24 Dennis Jansen 2012-04-10 20:19:31 UTC
Yes, still exists in 3.3.0.

Len, let me know if this made you smile... 0x50 caused the problems to reappear. I'll try 0x51 if you want. And for some reason it told me to contact a certain lenb@kernel.org...

417:[    0.417673] intel_idle: MWAIT substates: 0x3020220
418:[    0.417678] intel_idle: v0.4 model 0x1C
419:[    0.417682] intel_idle: lapic_timer_reliable_states 0x2
420:[    0.417696] intel_idle: unaware of model 0x1c MWAIT 4 please contact lenb@kernel.org
422:[    0.417760] intel_idle: unaware of model 0x1c MWAIT 4 please contact lenb@kernel.org
594:[    9.194608] ACPI: acpi_idle yielding to intel_idle
Comment 25 Dennis Jansen 2012-04-10 20:36:42 UTC
Something interesting again. The 0x51 version was pure static, while the 0x50 version was only intermittend static. On the other hand I tested the 0x50 version in console, the 0x51 in X. So that's more likely the difference.
Comment 26 Alan 2012-08-14 11:25:34 UTC
Any reason for not just blacklisting C4 on that platform ?
Comment 27 Len Brown 2014-05-20 03:41:55 UTC
3 choices to resolve this bug:

1. blacklist deep C-states on this platform in the kernel
please attach the dmidecode, and we create a patch to do this.

2. use PM-QOS to fix this by having the sound driver,
or user-space tell Linux not to use C-states with latency
of 100usec and above.  (which the listed exit_latency for MWAIT 0x30).

3. wait a month and if Denis doesn't reply, close as Documented --
since there are cmdline workarounds available for fellow travelers.
Comment 28 Len Brown 2014-06-02 20:34:56 UTC
If this is still an issue,
please re-open and supply the output from

grep . /sys/class/dmi/id/*

Note You need to log in before you can comment on or make changes to this bug.