Bug 11398

Summary: hda_intel: IRQ timing workaround is activated for card #0. Suggest a bigger bdl_pos_adj.
Product: Drivers Reporter: Frans Pop (elendil)
Component: Sound(ALSA)Assignee: Jaroslav Kysela (perex)
Status: CLOSED WILL_NOT_FIX    
Severity: normal CC: bunk, cadu, rjw
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 2.6.27-rc4 Subsystem:
Regression: Yes Bisected commit-id:
Bug Depends on:    
Bug Blocks: 11167    

Description Frans Pop 2008-08-21 17:17:24 UTC
Latest working kernel version: 2.6.26
Earliest failing kernel version: 2.6.27-rc1
Distribution: Debian
Hardware Environment: Desktop, Intel D945GCZ mainboard, Pentium D processor
Software Environment: Debian unstable; self-compiled x86_64 kernel

Problem Description:
Ever since 2.6.27-rc1 I get the following message in my kernel log after booting the system:
hda-intel: IRQ timing workaround is activated for card #0. Suggest a bigger bdl_pos_adj.

As the soundcard works correctly and has not shown any problems either with 2.6.26 or 2.6.27 I wonder if this is a false positive.
If it is not a false positive, I wonder if this isn't something that should be fixed in the driver itself anyway as the message is totally meaningless to an end user.

My sound controller is:
00:1b.0 Audio device [0403]: Intel Corporation 82801G (ICH7 Family)
        High Definition Audio Controller [8086:27d8] (rev 01)
    Kernel driver in use: HDA Intel
    Kernel modules: snd-hda-intel
Comment 1 Takashi Iwai 2008-08-26 06:11:01 UTC
If you have this message frequently, it means that something is really wrong.

When this message appears, the driver tries to delay the irq handling via workq, and it results in a more CPU load although you don't see any issue in the sound quality (actually this hack is to improve the quality).

You can increase the size of bdl_pos_adj option value (as default 1 for Intel chipset, 32 for others) to check whether the warning disappears.
Comment 2 Frans Pop 2008-08-26 06:39:39 UTC
Thanks for the reply.

I only see this message exactly once: just after the system is booted. But I do always get the message after each boot.
It could possibly be that it is generated while KDE is starting up (after loggin in) and playing its "welcome" sound. I'll check the time of the message to see if that is true or not.

Should the message be repeated if it was a structural problem, or is it only reported during initialization?

I have now created the following file:
$ cat /etc/modprobe.d/sound.local
options snd_hda_intel   bdl_pos_adj=5

Is a value of 5 high enough or should I try with something much higher?
Comment 3 Takashi Iwai 2008-08-26 07:08:18 UTC
The warning appears only once.  To reset it, you'd need to reload the driver.
For testing the bdl_pos_adj option, I'd choose a number in power-of-two, i.e. 2, 4, 8, 16, 32.  Try one of them.

A bit more about the background: the HD-audio hardware tends to raise the irq before actually the data is processed.  The bdl_pos_adj option specifies the samples (at the sample rate of 48kHz) to delay the irq handling.
As far as I've tested, Intel boards show all delay=1, while Nvidia and ATI show delay=32, likely the FIFO size.  These are the default values the driver chooses when no value is given to that option.

I'm a bit surprised that this happens on your Intel machine.  But, this could be dependent rather on a chipset.

BTW, the higher bdl_pos_adj is basically safer.  It means, however, an artificial latency if the given value is larger than the real delay.
Comment 4 Frans Pop 2008-08-26 10:07:39 UTC
Thanks for the explanation. The message is definitely related to logging in to KDE. Not 100% sure if it is the activation of the sound server in KDE or the playing of the first sound that triggers it.

I tried increasing the bdl_pos_adj setting, but the only thing that happens is that sometimes the message appears a bit later (not immediately during login); it will still appear within about a minute (I've been playing music and a DVD to test).
I went up to 256; not sure if testing even higher values makes sense.

I did check each time that the value was set correctly; for example for the last test:
$ cat /sys/module/snd_hda_intel/parameters/bdl_pos_adj
256,-1,-1,-1,-1,-1,-1,-1


I also have a laptop (HP 2510p) that has an Intel ICH8 chipset (sound controller is [8086:284b]). On that system I don't see the problem with normal use. However, I do see the message there in the syslog while the system is being suspended to RAM. And I was also able to "trigger" the message once while switching from wired to wireless networking while playing music in amarok from a network file system (nfs4). Probably there was a moment when amarok was getting starved.

Please ask if you need any additional information.
Comment 5 Rafael J. Wysocki 2008-08-31 04:26:50 UTC
On Sunday, 31 of August 2008, Frans Pop wrote:
> On Saturday 30 August 2008, Rafael J. Wysocki wrote:
> > Bug-Entry   : http://bugzilla.kernel.org/show_bug.cgi?id=11398
> > Subject     : hda_intel: IRQ timing workaround is activated for card #0.
> >               Suggest a bigger bdl_pos_adj.
> > Submitter   : Frans Pop <elendil@planet.nl> 
> > Date                : 2008-08-21 17:17 (10 days old)
> 
> Still there.
Comment 6 Takashi Iwai 2008-08-31 23:10:36 UTC
Sorry for the late response.

bdl_pos_adj=256 is definitely large enough (even too large).  The fact that the message still appears with such a value means that the problem is rather in reading the DMA position.  As default, the driver reads from the position buffer mapped on the memory.  On some devices, reading from a register seems more robust.  They can be switched via position_fix option.

Could you check whether passing position_fix=1 option changes anything?
If it's stable, we can add this in a white-list in hda_intel.c.
Comment 7 Frans Pop 2008-09-02 10:34:30 UTC
position_fix=1 does not make any difference.
I've tried it both without bdl_pos_adj and with bdl_pos_adj=8, but I still get the "IRQ timing workaround" message.
Comment 8 Carlos Dyonisio 2008-09-06 08:36:01 UTC
"When this message appears, the driver tries to delay the irq handling via
workq, and it results in a more CPU load although you don't see any issue in
the sound quality (actually this hack is to improve the quality)."

I'm sorry, but this is not true on my machine. When this happens, the sound quality drops really bad, and it looks like it is loosing frames or something like that... every 15 minutes or so the sound pauses for some ms, like an old vinyl disc. It is completely impossible to listen to any songs with this driver as it is.

I'm using git sources since 2.6.24 and I update almost daily, and it is still happening today, with the kernel I built yesterday.

Please fix this before the 2.6.27 release.

If you need any information from my machine, feel free to ask. :)

Thanks!
Comment 9 Carlos Dyonisio 2008-09-06 08:37:28 UTC
Oh, and it was NOT happening 2 or 3 weeks ago... so it was something recently added to the kernel.
Comment 10 Takashi Iwai 2008-09-06 09:17:26 UTC
Hmmm, your hardware must be a bad IRQ handling.  It's really a pain to add workarounds for workarounds for broken hardwares...  Could you give more details of your hardware?

Anyway, please try the methods in the comments above.  If none of them changes the behavior, after checking that you passed module options correctly via /sys/modules/snd_hda_intel/parameters/* files, try sound-2.6.git master branch, and pass bdl_pos_adj=0.
    git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound-2.6.git
Comment 11 Carlos Dyonisio 2008-09-06 10:04:31 UTC
I'm sorry, I think I was not including support to my codec, because after enabling all codecs, it didn't happen anymore... it's weird that I was not having this problem before, and I don't think I changed my kernel configuration.

Sorry, my bad... if it happens again, I will let you know.

Thanks for your prompt reply anyway! :)
Comment 12 Rafael J. Wysocki 2008-09-14 17:06:28 UTC
On Saturday, 13 of September 2008, Takashi Iwai wrote:
> At Sat, 13 Sep 2008 09:37:51 +0200,
> Frans Pop wrote:
> > 
> > On Friday 12 September 2008, Rafael J. Wysocki wrote:
> > > The following bug entry is on the current list of known regressions
> > > from 2.6.26.  Please verify if it still should be listed and let me
> > > know (either way).
> > >
> > > Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=11398
> > > Subject   : hda_intel: IRQ timing workaround is activated for card #0.
> > >             Suggest a bigger bdl_pos_adj.
> > 
> > Still there.
> 
> Yeah, the driver wasn't changed about this.
> 
> Basically it's a warning message that CPU usage got higher due to
> somehow wrongly behaving hardware.  The driver behavior itself didn't
> do anything wrong.  That is, if the driver didn't show it, you
> wouldn't have noticed any change (or noticed improvements in some apps
> :)
> 
> Of course, it would be ideal if we can add a perfect workaround for
> it, but right now, I have no idea what to do better.  So, I don't
> think it's worth to keeping this open as a regression.
Comment 13 Frans Pop 2008-09-15 02:29:27 UTC
Normally I would have said to remove the "regression" status and the block instead of closing the report altogether. But the issue seems to have spontaneously disappeared, so closing is fine.

For a while, based on Takashi Iwai's suggestions, I had the following set:
   options snd_hda_intel  position_fix=1 bdl_pos_adj=8
And with that the message still appeared.

A few days ago I removed that and I have not seen the message since!
From my logs it looks as if something _has_ changed between rc5 and rc6 to fix the issue. Could the cause have been something outside ALSA?

To avoid misunderstandings: when I first saw the message I was not using any module parameters at all. It showed up with default settings. It looks like now it only shows up if you _add_ the settings shown above.

Strange...
Comment 14 Takashi Iwai 2008-09-15 14:31:03 UTC
Hmm, no, there is no relevant changes in ALSA side since rc5...