Bug 13911

Summary: High IRQ latencies with AMD C1E enabled
Product: Platform Specific/Hardware Reporter: Michael Laß (bevan)
Component: OtherAssignee: Mark Langsdorf (mark.langsdorf)
Status: RESOLVED OBSOLETE    
Severity: normal CC: alan, czoccolo, elemc, fmhstar, h.judt, jonie, mark.langsdorf, rui.zhang, yakui.zhao
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 2.6.32 Subsystem:
Regression: No Bisected commit-id:
Bug Depends on:    
Bug Blocks: 56331    
Attachments: kernel.log after system startup
Proposed workaround: patch 35743 plus pm_qos in bttv, Ubuntu Lucid (2.6.32) kernel

Description Michael Laß 2009-08-04 11:25:55 UTC
My PCI tv card (Pinnacle PCTV) stops working correctly when AMD C1E support ist activated in BIOS and kernel is booted with ACPI support.

dmesg shows the following:
bttv0: timeout: drop=11 irq=50/50, risc=cfb5901c, bits: OFLOW
bttv0: timeout: drop=23 irq=104/104, risc=cf4f3bb4, bits: HSYNC OFLOW
bttv0: timeout: drop=34 irq=143/143, risc=cfa8dbb4, bits: HSYNC OFLOW
bttv0: timeout: drop=45 irq=183/183, risc=cf4f4bb4, bits: HSYNC OFLOW
and so on...

And here are some debug information:

bttv0: irq: skipped frame [main=cf6a2000,o_vbi=cf6a2018,o_field=cfac6000,rc=cfac601c]
bttv0: Uhm. Looks like we have unusual high IRQ latencies.
bttv0: Lets try to catch the culpit red-handed ...
Pid: 0, comm: swapper Tainted: P           2.6.30-ARCH #1
Call Trace:
 <IRQ>  [<ffffffffa0f2b6a7>] ? bttv_irq+0x3c7/0x930 [bttv]
 [<ffffffff802708cb>] ? hrtimer_get_next_event+0xcb/0x120
 [<ffffffff8025ce75>] ? get_next_timer_interrupt+0x1d5/0x250
 [<ffffffff802a3c5b>] ? handle_IRQ_event+0x6b/0x210
 [<ffffffff8027127c>] ? ktime_get+0x1c/0x70
 [<ffffffff802a6192>] ? handle_fasteoi_irq+0x92/0x120
 [<ffffffff8020fa89>] ? handle_irq+0x29/0x50
 [<ffffffff8020f0e2>] ? do_IRQ+0x72/0x110
 [<ffffffff8020ce53>] ? ret_from_intr+0x0/0x11
 <EOI>  [<ffffffff80225400>] ? lapic_next_event+0x0/0x50
 [<ffffffff8022e512>] ? native_safe_halt+0x2/0x10
 [<ffffffff8021619a>] ? default_idle+0x5a/0x180
 [<ffffffff8027b07a>] ? clockevents_notify+0x3a/0xc0
 [<ffffffff80216608>] ? c1e_idle+0x68/0x150
 [<ffffffff8020b52a>] ? cpu_idle+0xba/0x120


lspci -vvv of the device:

04:05.0 Multimedia video controller: Brooktree Corporation Bt878 Video Capture (rev 11)
	Subsystem: Pinnacle Systems Inc. PCTV pro (TV + FM stereo receiver)
	Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- DisINTx-
	Status: Cap+ 66MHz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
	Latency: 64 (4000ns min, 10000ns max)
	Interrupt: pin A routed to IRQ 20
	Region 0: Memory at f6ffe000 (32-bit, prefetchable) [size=4K]
	Capabilities: [44] Vital Product Data
		No end tag found
	Capabilities: [4c] Power Management version 2
		Flags: PMEClk- DSI+ D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
		Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME-
	Kernel driver in use: bttv
	Kernel modules: bttv

04:05.1 Multimedia controller: Brooktree Corporation Bt878 Audio Capture (rev 11)
	Subsystem: Pinnacle Systems Inc. PCTV pro (TV + FM stereo receiver, audio section)
	Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- DisINTx-
	Status: Cap+ 66MHz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
	Latency: 64 (1000ns min, 63750ns max)
	Interrupt: pin A routed to IRQ 10
	Region 0: Memory at f6fff000 (32-bit, prefetchable) [size=4K]
	Capabilities: [44] Vital Product Data
		No end tag found
	Capabilities: [4c] Power Management version 2
		Flags: PMEClk- DSI+ D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
		Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME-


When I disable C1E support in BIOS or boot with acpi=off parameter, everything works fine.

My System:
AMD Phenom II X4 955 BE
ASUS M4A78T-E (newest BIOS ver. 1604)
Arch Linux (same problem on other distros)

Please let me know, if I can help with any further information.
Comment 1 Michael Laß 2009-08-09 17:12:25 UTC
Problem still exists after BIOS update to newest version 1610.
Comment 2 Michael Laß 2009-08-09 20:56:18 UTC
Created attachment 22653 [details]
kernel.log after system startup
Comment 3 Michael Laß 2009-08-18 10:14:08 UTC
Do you have any evidence that the problem is related to cpufreq? Disabling Cool&Quiet in BIOS and unloading powernow_k8 module does not help.

It seems that whenever the CPU goes into C1E state the tv card stops working. If at least one core is under heavy load, everything works fine.
Comment 4 Corrado Zoccolo 2009-08-19 08:23:24 UTC
C states are managed by cpuidle, not cpufreq. If the problem persists with 'performance' cpufreq governor, when C1E is enabled, then cpufreq is definitely not guilty.

C1E state is a deep sleep state that is selected automatically in some occasions, when entering C1, so the OS has very low control on it.
In particular, the latency control usually adopted to select the proper C state to enter is bypassed, since the hardware exports a single state, with varying latency.
Comment 5 Michael Laß 2009-08-19 09:25:05 UTC
The problem definitely persists with 'performance' governor. So should this bug be assigned back to acpi_power-processor@kernel-bugs.osdl.org?

Do you see any chance to get this fixed or is it just impossible to use certain hardware with C1E enabled?
Comment 6 Corrado Zoccolo 2009-08-19 10:50:48 UTC
Yes.

I think the way to go is to disable C1E in bios, and implement a driver that can enable the additional power savings given by C1E only when latency requirement is satisfied.
Comment 7 Michael Laß 2009-08-23 22:23:57 UTC
Could anyone with sufficient rights assign this back to ACPI/Power-Processor, please?
Comment 8 ykzhao 2009-08-31 13:52:46 UTC
Will you please try the boot option of "idle=poll" and see whether the issue still exists?

Thanks.
Comment 9 Michael Laß 2009-08-31 14:16:42 UTC
With boot option "idle=poll" everything works fine. Do you need any log file?
Comment 10 Michael Laß 2009-10-10 13:55:21 UTC
Same problem with kernel ver. 2.6.31.
Comment 11 jonie 2010-01-18 19:47:58 UTC
Same here with 2.6.32.3 Leadtek Winfast TV2000/XP and ASUS M3N78-EM unless I disable C1E.
Comment 12 Venkatesh Pallipadi 2010-01-18 19:53:34 UTC
Looks like some hw issues with c1e? Copying Mark.
Comment 13 jonie 2010-01-19 14:13:06 UTC
It looks like the bug affects only the PCI add-on cards on the PCI bridge that use PCI latency timer. Intergrated and PCI-E latencies are not impacted, so this might be a chipset issue. Mine is NVIDIA 8300.
Comment 14 Michael Laß 2010-01-19 18:18:58 UTC
Mine is AMD 790GX / AMD SB750.
Comment 15 jonie 2010-01-20 11:13:32 UTC
You can also disable C1E state by adding idle=mwait to the boot options, though it might not apply to the family 11 AMD CPUs, it's kinda workaround if you're dual booting or the BIOS does not provide an option to disable C1E. The c1e idle routine defined in arch/x86/kernel/process.c disables and enables local interrupts (many times a second then) and it's the only difference, I only wonder if it conflicts with PCI interrupts, or just bttv driver?
Comment 16 jonie 2010-01-22 12:52:21 UTC
I get more and more towards a conclusion, that there's nothing wrong in power infrastructure, perhaps just this (quite old) bttv driver needs to make use of pm_qos_requirement to be C1E aware. My PCI sound card works fine with C1E enabled, but the PCM layer does use pm_qos.
Comment 17 jonie 2010-01-23 09:29:18 UTC
I found a way to have c1e enabled in BIOS and usable in kernel but not interrupting the operation of bttv driver. I added the pm_qos mechanism in similar way like in ipw2100 and then completely disable C1 whenever the bttv device file is opened by issuing pm_qos_update_requirement(PM_QOS_CPU_DMA_LATENCY, "bttv", 0). The default latency requirement is then restored with pm_qos_update_requirement(PM_QOS_CPU_DMA_LATENCY, "bttv", PM_QOS_DEFAULT_VALUE) once the file is closed. But to have it working, there must be some kind of C1 idle accounting like there is one for C2 & C3 states. I used the instructions from patch 35743 (I don't know if it is still synced, but it's simple enough to apply manually). Two questions however remain: what's the real delay of C1E, a delay we will never get from ACPI FSDT and is this really important (I set it to "1" like in the patch) and what's the real requiement of bttv to work smoothly. (This needs just trial and error testing). I tested the proposed workaround on my Ubuntu box and found it working.
Comment 18 Zhang Rui 2010-03-12 07:30:54 UTC
re-assign to Mark.
Comment 19 Mark Langsdorf 2010-03-15 16:14:47 UTC
Could I get the family and model numbers for the processors?  I need to track down some information.
Comment 20 jonie 2010-03-15 18:20:07 UTC
processor	: 0
vendor_id	: AuthenticAMD
cpu family	: 16
model		: 4
model name	: AMD Athlon(tm) 5000 Dual-Core Processor
stepping	: 2
cpu MHz		: 800.000
cache size	: 512 KB
physical id	: 0
siblings	: 2
core id		: 0
cpu cores	: 2
apicid		: 0
initial apicid	: 0
fpu		: yes
fpu_exception	: yes
cpuid level	: 5
wp		: yes
flags		: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm 3dnowext 3dnow constant_tsc rep_good nonstop_tsc extd_apicid pni monitor cx16 lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs skinit wdt
bogomips	: 4400.23
TLB size	: 1024 4K pages
clflush size	: 64
cache_alignment	: 64
address sizes	: 48 bits physical, 48 bits virtual
power management: ts ttp tm stc 100mhzsteps hwpstate

processor	: 1
...
Comment 21 Michael Laß 2010-03-15 19:13:25 UTC
Same family and model numbers here:

processor	: 0
vendor_id	: AuthenticAMD
cpu family	: 16
model		: 4
model name	: AMD Phenom(tm) II X4 955 Processor
stepping	: 2
cpu MHz		: 800.000
cache size	: 512 KB
physical id	: 0
siblings	: 4
core id		: 0
cpu cores	: 4
apicid		: 0
initial apicid	: 0
fpu		: yes
fpu_exception	: yes
cpuid level	: 5
wp		: yes
flags		: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm 3dnowext 3dnow constant_tsc rep_good nonstop_tsc extd_apicid pni monitor cx16 popcnt lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs skinit wdt
bogomips	: 6423.81
TLB size	: 1024 4K pages
clflush size	: 64
cache_alignment	: 64
address sizes	: 48 bits physical, 48 bits virtual
power management: ts ttp tm stc 100mhzsteps hwpstate

[...]
Comment 22 jonie 2010-07-08 13:53:10 UTC
Created attachment 27040 [details]
Proposed workaround: patch 35743 plus pm_qos in bttv, Ubuntu Lucid (2.6.32) kernel
Comment 23 jonie 2010-07-08 13:54:50 UTC
Vanilla 2.6.35-rc1 still too much latency
Comment 24 Alan 2012-08-29 17:16:57 UTC
If this is still seen with modern kernels please update the bug and version info. Thanks
Comment 25 Harald Judt 2012-10-20 19:09:50 UTC
linux-3.6.2: I experience problems which can be solved by setting idle=mwait. There are no timeout messages in dmesg, but the captured image shows parts of lines that are messed up, probably remainders from previous images. The more action/movement there is, the more of these lines are visible.

This is on a HP Compaq Elite 8100 CMT with an Intel processor. I'm not sure this belongs to this bug, but since idle=mwait solves it I believe it isn't far off. Before I knew of idle=mwait, I simply started enough cpuburn processes to cause heavy load, then the strange lines would go away.

/proc/cpuinfo:
processor       : 3
vendor_id       : GenuineIntel
cpu family      : 6
model           : 37
model name      : Intel(R) Core(TM) i3 CPU         540  @ 3.07GHz
stepping        : 2
microcode       : 0x9
cpu MHz         : 1199.000
cache size      : 4096 KB
physical id     : 0
siblings        : 4
core id         : 2
cpu cores       : 2
apicid          : 5
initial apicid  : 5
fpu             : yes
fpu_exception   : yes
cpuid level     : 11
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf pni dtes64 monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr pdcm sse4_1 sse4_2 popcnt lahf_lm arat dtherm tpr_shadow vnmi flexpriority ept vpid
bogomips        : 6118.11
clflush size    : 64
cache_alignment : 64
address sizes   : 36 bits physical, 48 bits virtual
power management:

lspci -vvv:
0d:00.0 Multimedia video controller: Brooktree Corporation Bt878 Video Capture (rev 11)
        Subsystem: Pinnacle Systems, Inc. (Wrong ID) Device ff00
        Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- DisINTx-
        Status: Cap+ 66MHz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
        Latency: 32 (4000ns min, 10000ns max)
        Interrupt: pin A routed to IRQ 20
        Region 0: Memory at f0200000 (32-bit, prefetchable) [size=4K]
        Capabilities: [44] Vital Product Data
                No end tag found
        Capabilities: [4c] Power Management version 2
                Flags: PMEClk- DSI+ D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
                Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME-
        Kernel driver in use: bttv

0d:00.1 Multimedia controller: Brooktree Corporation Bt878 Audio Capture (rev 11)
        Subsystem: Pinnacle Systems, Inc. (Wrong ID) Device ff00
        Control: I/O- Mem- BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- DisINTx-
        Status: Cap+ 66MHz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
        Latency: 32 (1000ns min, 63750ns max)
        Interrupt: pin A routed to IRQ 255
        Region 0: Memory at f0201000 (32-bit, prefetchable) [disabled] [size=4K]
        Capabilities: [44] Vital Product Data
                No end tag found
        Capabilities: [4c] Power Management version 2
                Flags: PMEClk- DSI+ D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
                Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME-

Is there an updated patch for the issue somewhere?