Bug 27532

Summary: ath9k prevents CPU from entering lower C-states
Product: Drivers Reporter: Thomas Bächler (thomas)
Component: network-wirelessAssignee: drivers_network-wireless (drivers_network-wireless)
Status: CLOSED CODE_FIX    
Severity: high CC: ath9k-devel, florian, jirislaby, linville, maciej.rutecki, mcgrof, r.schtz, rjw, senthilkumar, shafi.wireless
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 2.6.37 Subsystem:
Regression: Yes Bisected commit-id:
Bug Depends on:    
Bug Blocks: 7216, 21782    
Attachments: throughput with the patch applied
throughput without the patch applied
/proc/cpuinfo
lspci
diff patch which addresses ath9k CPU C3 state regression
final fix for ath9k-prevents CPU entering C3 states

Description Thomas Bächler 2011-01-24 22:43:03 UTC
In 2.6.37, when ath9k is active (interface up and connected to wireless), my CPU (Intel Core i5-520M) is not entering the lowest C-state anymore (according to powertop).

This is caused by commit [1] and reported on the linux-wireless list at [2]. Reverting the commit fixes the problem and returns it to 2.6.36 behaviour.


As a response to the mailing list post following [2]: I don't care about alleged throughput problems on some Pinetrail platform I do not have. Nor do I care that this may be "fixed" by making the QoS values configurable in 2.6.38.

What I do care about is that this commit is keeping my CPU in a higher C state than necessary constantly, probably diminishing my battery life significantly. This is a serious regression for me.

[1] http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=10598c124ecabbbfd7522f74de19b8f7d52a1bee
[2] http://permalink.gmane.org/gmane.linux.kernel.wireless.general/63430
Comment 1 shafi 2011-01-25 14:16:24 UTC
(In reply to comment #0)
> In 2.6.37, when ath9k is active (interface up and connected to wireless), my
> CPU (Intel Core i5-520M) is not entering the lowest C-state anymore
> (according
> to powertop).
> 
> This is caused by commit [1] and reported on the linux-wireless list at [2].
> Reverting the commit fixes the problem and returns it to 2.6.36 behaviour.
> 
> 
> As a response to the mailing list post following [2]: I don't care about
> alleged throughput problems on some Pinetrail platform I do not have. Nor do
> I
> care that this may be "fixed" by making the QoS values configurable in
> 2.6.38.
> 
> What I do care about is that this commit is keeping my CPU in a higher C
> state
> than necessary constantly, probably diminishing my battery life
> significantly.
> This is a serious regression for me.
> 
> [1]
>
> http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=10598c124ecabbbfd7522f74de19b8f7d52a1bee
> [2] http://permalink.gmane.org/gmane.linux.kernel.wireless.general/63430

We will look into it that, but reverting the commit may not be a correct solution.
Comment 2 Thomas Bächler 2011-01-25 16:59:23 UTC
I would think it is (for now). However, I did that locally anyway, so I am happy for the moment.
Comment 3 shafi 2011-01-26 12:53:17 UTC
(In reply to comment #2)
> I would think it is (for now). However, I did that locally anyway, so I am
> happy for the moment.

Thanks! If you need any help please free to ask.
Comment 4 John W. Linville 2011-01-26 15:50:07 UTC
Nevertheless, that only buys a little time.  When can we expect to see patches to address this?  Reverting the patch is still on the table -- we generally don't trade one bug for another.
Comment 5 Richard Schütz 2011-01-28 19:11:34 UTC
"Throughput was severely affected in Intel Pinetrail platforms
because of a DMA problem in C3 state. This patch fixes this
issue."

Was there a discussion/bugreport about this so-called several performance issue? My netbook is based on Pinetrail and I haven't noticed it so far. So I reverted the patch locally to save power, too.
Comment 6 John W. Linville 2011-01-28 19:13:34 UTC
I am inclined to revert it as well -- I was trying to give the ath9k guys a chance to fix both issues together...
Comment 7 Luis Chamberlain 2011-01-28 19:45:04 UTC
Um I think this is an issue not with ath9k but with certain BIOSes which do not have a fix for C3 state. Did you upgrade your BIOS?
Comment 8 Richard Schütz 2011-01-28 19:51:18 UTC
It is an Samsung N145 Plus. I am still using the original BIOS version. It's timestamp is 20100818.
Comment 9 Thomas Bächler 2011-01-28 20:01:34 UTC
(In reply to comment #7)
> Um I think this is an issue not with ath9k but with certain BIOSes which do
> not
> have a fix for C3 state. Did you upgrade your BIOS?

Luis, I am not sure I can follow (and Richard and I are in disagreement about what you actually meant by that statement). Can you elaborate a bit more?
Comment 10 Rafael J. Wysocki 2011-01-28 20:03:30 UTC
(In reply to comment #7)
> Um I think this is an issue not with ath9k but with certain BIOSes which do
> not
> have a fix for C3 state. Did you upgrade your BIOS?

I hope you're asking because that information would help you fix the problem
and you're not going to say that a BIOS upgrade is necessary to avoid the
problem.
Comment 11 shafi 2011-01-29 09:16:35 UTC
(In reply to comment #5)
> "Throughput was severely affected in Intel Pinetrail platforms
> because of a DMA problem in C3 state. This patch fixes this
> issue."
> 
> Was there a discussion/bugreport about this so-called several performance
> issue? My netbook is based on Pinetrail and I haven't noticed it so far. So I
> reverted the patch locally to save power, too.

Please look at the performance/throughput when the battery is discharging(ie the power adapter is not connected to power supply).
Comment 12 Richard Schütz 2011-01-29 11:59:12 UTC
Created attachment 45442 [details]
throughput with the patch applied
Comment 13 Richard Schütz 2011-01-29 12:04:38 UTC
Created attachment 45452 [details]
throughput without the patch applied

The troughput is somewhat lower, but that is nothing I would call severe. I think that is quite normal, because the CPU is changing between the C-States all the time and resuming from C4 (lowest C-state of my Atom) takes quite long in comparison to other C-state transitions.
Comment 14 John W. Linville 2011-02-03 16:02:16 UTC
10% is a significant difference -- not sure if it is "severe" or not.

Can you quantify the difference in power consumption?
Comment 15 Richard Schütz 2011-02-03 17:12:27 UTC
(In reply to comment #14)
> 10% is a significant difference -- not sure if it is "severe" or not.
> 
> Can you quantify the difference in power consumption?


powertop measures the following:

without the patch:
idle: 6.4W
wireless traffic: 9.1W

with the patch:
idle: 7.5W
wireless traffic: 9.2W
Comment 16 shafi 2011-02-04 05:29:35 UTC
(In reply to comment #15)
> (In reply to comment #14)
> > 10% is a significant difference -- not sure if it is "severe" or not.
> > 
> > Can you quantify the difference in power consumption?
> 
> 
> powertop measures the following:
> 
> without the patch:
> idle: 6.4W
> wireless traffic: 9.1W
> 
> with the patch:
> idle: 7.5W
> wireless traffic: 9.2W
Richard I do accept the results are going to be inconsistent, but it was reported up-to 50% drop in throughput and thats why PM-QOS patch was sent.
Comment 17 Richard Schütz 2011-02-04 12:03:14 UTC
(In reply to comment #16)
> Richard I do accept the results are going to be inconsistent, but it was
> reported up-to 50% drop in throughput and thats why PM-QOS patch was sent.

Could this be tracked down to a certain hardware combination?
Comment 18 Senthil Balasubramanian 2011-02-04 12:52:31 UTC
I believe we should address this cleanly as obviously, we don't want to restrict the lowest C-state for all the laptops. The patch was supposed to address only the problematic laptops.. not all. We shall analyze this and shall keep you posted.
Comment 19 Luis Chamberlain 2011-02-04 17:18:24 UTC
The C3 issue on Pinetrail is fixed by a BIOS upgrade, if no BIOS upgrade is available then the patch addresses the issue through a workaround by preventing the CPU from going into C3. If such a hack is desired upstream then a module parameter should likely be the thing to do but I do wonder -- ath9k should not be the only device affected.. so instead is it too much wishful thinking for this to be done properly through some C3-workaround hack for Pinetrail specifically designed to detect the poo somehow?

If you are seeing this issue please post the output of lspci and cat /proc/cpuinfo
Comment 20 Thomas Bächler 2011-02-04 22:07:46 UTC
Created attachment 46262 [details]
/proc/cpuinfo

As you can see, not a Pinetrail platform at all, still your "fix" affects me.
Comment 21 Thomas Bächler 2011-02-04 22:08:57 UTC
Created attachment 46272 [details]
lspci
Comment 22 Luis Chamberlain 2011-02-04 22:10:21 UTC
IMHO this "fix" and other on other drivers should be reverted and instead userspace / distributions should figure out a way to let users enable a polling mechanism for cases where this is necessary, hopefully from userspace.

http://johannes.sipsolutions.net/files/netlatency.c.txt

like that but for CPU dma latency. Or foo /dev/cpu_dma_latency
Comment 23 shafi 2011-02-07 05:59:50 UTC
(In reply to comment #20)
> Created an attachment (id=46262) [details]
> /proc/cpuinfo
> 
> As you can see, not a Pinetrail platform at all, still your "fix" affects me.

Hi, can you please try this when you load the ath9k driver
sudo modprobe -r ath9k
sudo modprobe ath9k pmqos=0
(or)
sudo modprobe ath9k pmqos=-1
*with this please see whether the CPU is entering lower C-states.
*If it works we can have the pmqos value by default 0/-1
Comment 24 Thomas Bächler 2011-02-08 12:17:43 UTC
Interesting suggestion:

$ modinfo ath9k|grep ^parm:
parm:           debug:Debugging mask (uint)
parm:           nohwcrypt:Disable hardware encryption (int)
parm:           blink:Enable LED blink on activity (int)
Comment 25 shafi 2011-02-08 15:18:02 UTC
(In reply to comment #24)
> Interesting suggestion:
> 
> $ modinfo ath9k|grep ^parm:
> parm:           debug:Debugging mask (uint)
> parm:           nohwcrypt:Disable hardware encryption (int)
> parm:           blink:Enable LED blink on activity (int)

Looks like the patch to make pmqos value as modparam is not there. Can you please try with the latest compat wireless/wireless testing.
Any way I will soon send a RFC patch to address this issue
Comment 26 shafi 2011-02-08 15:31:09 UTC
Created attachment 46842 [details]
diff patch which addresses ath9k CPU C3 state regression
Comment 27 Jiri Slaby 2011-02-15 10:05:48 UTC
Is there any progress here? This problem is seen on AMD Brazos with latest BIOS too.
Comment 28 Jiri Slaby 2011-02-15 10:07:23 UTC
(In reply to comment #26)
> Created an attachment (id=46842) [details]
> diff patch which addresses ath9k CPU C3 state regression

BTW this fixes it...
Comment 29 John W. Linville 2011-02-15 14:36:57 UTC
A version of that patch was posted, there were some comments, and I don't recall seeing the follow-up yet...?
Comment 30 shafi 2011-02-15 14:48:14 UTC
(In reply to comment #29)
> A version of that patch was posted, there were some comments, and I don't
> recall seeing the follow-up yet...?

John/Jiri,
    The likely decision is to completely move it to the user space and remove it completely remove it from the driver code.
Comment 31 Jiri Slaby 2011-02-15 14:51:21 UTC
(In reply to comment #30)
>     The likely decision is to completely move it to the user space and remove
> it completely remove it from the driver code.

Ok, could you send a patch to John, so that we can merge it?

Or do you expect people to disable it by the module parameter?
Comment 32 shafi 2011-02-15 15:06:58 UTC
(In reply to comment #31)
> (In reply to comment #30)
> >     The likely decision is to completely move it to the user space and
> remove
> > it completely remove it from the driver code.
> 
> Ok, could you send a patch to John, so that we can merge it?
> 
> Or do you expect people to disable it by the module parameter?

Jiri for the time being please use that temporary patch which by default disable pm-qos. I will be sending an RFC patch to completely remove the pm-qos thing from the driver. We are taking time as the platform which had this issue had to be validated by the script alone.
Comment 33 John W. Linville 2011-02-15 15:33:18 UTC
I think this has taken long enough...I'm going to revert commit 10598c124ecabbbfd7522f74de19b8f7d52a1bee and the related commit 4dc3530df7c0428b41c00399a7ee8c929406d181.
Comment 34 shafi 2011-02-15 15:41:01 UTC
(In reply to comment #33)
> I think this has taken long enough...I'm going to revert commit
> 10598c124ecabbbfd7522f74de19b8f7d52a1bee and the related commit
> 4dc3530df7c0428b41c00399a7ee8c929406d181.

John please wait I will send a RFC patch now.
Comment 35 John W. Linville 2011-02-15 15:51:51 UTC
Reverting commit 98c316e348bedffa730e6f1e4baeb8a3c3e0f28b as well...
Comment 36 shafi 2011-02-15 16:06:40 UTC
(In reply to comment #35)
> Reverting commit 98c316e348bedffa730e6f1e4baeb8a3c3e0f28b as well...

John already sent an RFC patch.
Comment 37 shafi 2011-02-16 12:34:56 UTC
(In reply to comment #31)
> (In reply to comment #30)
> >     The likely decision is to completely move it to the user space and
> remove
> > it completely remove it from the driver code.
> 
> Ok, could you send a patch to John, so that we can merge it?
> 
> Or do you expect people to disable it by the module parameter?

the patch which completely disable pm-qos in ath9k is merged in wireless-testing. Attaching the patch. Can some one please close the bug.
Comment 38 shafi 2011-02-16 12:36:02 UTC
Created attachment 47952 [details]
final fix for ath9k-prevents CPU entering C3  states
Comment 39 Florian Mickler 2011-02-20 15:08:25 UTC
Patch: https://bugzilla.kernel.org/attachment.cgi?id=47952
Comment 40 Rafael J. Wysocki 2011-02-21 22:19:42 UTC
Handled-By : Mohammed Shafi Shajakhan <mshajakhan@atheros.com>
Comment 41 Thomas Bächler 2011-02-27 13:48:27 UTC
Can this be submitted to stable, too? Thanks.
Comment 42 shafi 2011-02-28 05:24:55 UTC
(In reply to comment #41)
> Can this be submitted to stable, too? Thanks.

Ok.
Comment 43 shafi 2011-02-28 10:34:03 UTC
(In reply to comment #41)
> Can this be submitted to stable, too? Thanks.

Hi there, patch to stable is submitted today.
Comment 44 Florian Mickler 2011-03-04 23:55:43 UTC
merged for .38-rc7: 
commit 0f5cd45960173ba5b36727decbb4a241cbd35ef9
Author: Mohammed Shafi Shajakhan <mshajakhan@atheros.com>
Date:   Tue Feb 15 21:29:32 2011 +0530

    ath9k: Fix ath9k prevents CPU to enter C3 states