Bug 197769

Summary: kernel panic IO-APIC + timer (AMD CPU) from 4.13 onwards
Product: Timers Reporter: p_c_chan
Component: Interval TimersAssignee: timers_interval-timers
Status: RESOLVED CODE_FIX    
Severity: high CC: ecm4, iissmart, matzes, perdigao1, rvelascog
Priority: P1    
Hardware: x86-64   
OS: Linux   
Kernel Version: 4.13 onwards Subsystem:
Regression: Yes Bisected commit-id:
Attachments: This the good dmesg from 4.12.14
this is hwinfo, also from 4.12.14
Crash Screenshot 4.13.12
screenshot on 4.19.63

Description p_c_chan 2017-11-03 23:03:44 UTC
Created attachment 260503 [details]
This the good dmesg from 4.12.14

My boxes is working fine with 4.12.14 and older kernel.  After upgrading the kernel to 4.13.x.  Hit kernel panic- not syncing IO-APIC + timer doesn't work! Line before that showed ...trying to set up timer as ExtINT IRQ... ..... failed :(.

Setup timer as ExtINT IRQ worked in 4.12.x up to the last 4.12.14.  Please see dmesg from 4.12.14 attached below.

Tried a number of 4.13.x and 4.14.RCx, failed the same way.  The problem seemed introduced in the very first 4.13.  It probably is inside the big commit:

https://github.com/torvalds/linux/commit/03ffbcdd7898c0b5299efeb9f18de927487ec1cf#diff-bfdbce36d815c6829b32b5eebc17923f

I could boot it up with noapic, but usually the box would freeze staying interrupt x disabled within a few hours.  apic=debug did not get me away as panic happened very early in the boot process before logging was available. 

I'v downgraded my box back to 4.12.14.  It works without noapic and no freezing.  This is what I am using now.  Please fix it otherwise there would be no future for  my box.

media:~# uname -a
Linux media 4.12.14 #1 SMP Tue Oct 17 21:10:16 EDT 2017 x86_64 GNU/Linux


media:~# lspci
00:00.0 RAM memory: NVIDIA Corporation C51 Host Bridge (rev a2)
00:00.1 RAM memory: NVIDIA Corporation C51 Memory Controller 0 (rev a2)
00:00.2 RAM memory: NVIDIA Corporation C51 Memory Controller 1 (rev a2)
00:00.3 RAM memory: NVIDIA Corporation C51 Memory Controller 5 (rev a2)
00:00.4 RAM memory: NVIDIA Corporation C51 Memory Controller 4 (rev a2)
00:00.5 RAM memory: NVIDIA Corporation C51 Host Bridge (rev a2)
00:00.6 RAM memory: NVIDIA Corporation C51 Memory Controller 3 (rev a2)
00:00.7 RAM memory: NVIDIA Corporation C51 Memory Controller 2 (rev a2)
00:02.0 PCI bridge: NVIDIA Corporation C51 PCI Express Bridge (rev a1)
00:04.0 PCI bridge: NVIDIA Corporation C51 PCI Express Bridge (rev a1)
00:09.0 RAM memory: NVIDIA Corporation MCP51 Host Bridge (rev a2)
00:0a.0 ISA bridge: NVIDIA Corporation MCP51 LPC Bridge (rev a3)
00:0a.1 SMBus: NVIDIA Corporation MCP51 SMBus (rev a3)
00:0a.2 RAM memory: NVIDIA Corporation MCP51 Memory Controller 0 (rev a3)
00:0b.0 USB controller: NVIDIA Corporation MCP51 USB Controller (rev a3)
00:0b.1 USB controller: NVIDIA Corporation MCP51 USB Controller (rev a3)
00:0d.0 IDE interface: NVIDIA Corporation MCP51 IDE (rev a1)
00:0e.0 IDE interface: NVIDIA Corporation MCP51 Serial ATA Controller (rev a1)
00:0f.0 IDE interface: NVIDIA Corporation MCP51 Serial ATA Controller (rev a1)
00:10.0 PCI bridge: NVIDIA Corporation MCP51 PCI Bridge (rev a2)
00:10.1 Audio device: NVIDIA Corporation MCP51 High Definition Audio (rev a2)
00:14.0 Bridge: NVIDIA Corporation MCP51 Ethernet Controller (rev a3)
00:18.0 Host bridge: Advanced Micro Devices, Inc. [AMD] K8 [Athlon64/Opteron] HyperTransport Technology Configuration
00:18.1 Host bridge: Advanced Micro Devices, Inc. [AMD] K8 [Athlon64/Opteron] Address Map
00:18.2 Host bridge: Advanced Micro Devices, Inc. [AMD] K8 [Athlon64/Opteron] DRAM Controller
00:18.3 Host bridge: Advanced Micro Devices, Inc. [AMD] K8 [Athlon64/Opteron] Miscellaneous Control
02:00.0 VGA compatible controller: NVIDIA Corporation GK107 [GeForce GTX 650] (rev a1)
02:00.1 Audio device: NVIDIA Corporation GK107 HDMI Audio Controller (rev a1)
03:05.0 FireWire (IEEE 1394): LSI Corporation FW322/323 [TrueFire] 1394a Controller (rev 70)
03:09.0 SCSI storage controller: Initio Corporation INI-950 SCSI Adapter (rev 02)
03:0a.0 Communication controller: Conexant Systems, Inc. HSF 56k Data/Fax Modem


media:~# cat /proc/interrupts
           CPU0       CPU1       
  0:        131          0    XT-PIC   0  timer
  8:          0          0   IO-APIC   8-edge      rtc0
  9:          0          0   IO-APIC   9-fasteoi   acpi
 14:       3889      61217   IO-APIC  14-edge      pata_amd
 15:          0          0   IO-APIC  15-edge      pata_amd
 16:       3356    3160950   IO-APIC  16-fasteoi   snd_hda_intel:card1
 17:          0          7   IO-APIC  17-fasteoi   i91u
 20:          0          0   IO-APIC  20-fasteoi   sata_nv
 21:     749758    2707837   IO-APIC  21-fasteoi   sata_nv
 22:        951     810480   IO-APIC  22-fasteoi   snd_hda_intel:card0, ehci_hcd:usb1
 23:   10974858     187733   IO-APIC  23-fasteoi   ohci_hcd:usb2, eth0
 26:     948691    3863958   PCI-MSI 1048576-edge      nvidia
NMI:          0          0   Non-maskable interrupts
LOC:  138383605   70830622   Local timer interrupts
SPU:          0          0   Spurious interrupts
PMI:          0          0   Performance monitoring interrupts
IWI:     282099     335788   IRQ work interrupts
RTR:          0          0   APIC ICR read retries
RES:    6884223    5211313   Rescheduling interrupts
CAL:     178864     196368   Function call interrupts
TLB:     176751     171041   TLB shootdowns
THR:          0          0   Threshold APIC interrupts
DFR:          0          0   Deferred Error APIC interrupts
MCE:          0          0   Machine check exceptions
MCP:        207        207   Machine check polls
ERR:          1
MIS:          0
PIN:          0          0   Posted-interrupt notification event
PIW:          0          0   Posted-interrupt wakeup event
media:~# cat /proc/cpuinfo
processor       : 0
vendor_id       : AuthenticAMD
cpu family      : 15
model           : 75
model name      : AMD Athlon(tm) 64 X2 Dual Core Processor 4600+
stepping        : 2
cpu MHz         : 2400.000
cache size      : 512 KB
physical id     : 0
siblings        : 2
core id         : 0
cpu cores       : 2
apicid          : 0
initial apicid  : 0
fpu             : yes
fpu_exception   : yes
cpuid level     : 1
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt rdtscp lm 3dnowext 3dnow rep_good nopl cpuid pni cx16 lahf_lm cmp_legacy svm extapic cr8_legacy 3dnowprefetch vmmcall
bugs            : fxsave_leak sysret_ss_attrs null_seg swapgs_fence amd_e400
bogomips        : 4809.82
TLB size        : 1024 4K pages
clflush size    : 64
cache_alignment : 64
address sizes   : 40 bits physical, 48 bits virtual
power management: ts fid vid ttp tm stc

processor       : 1
vendor_id       : AuthenticAMD
cpu family      : 15
model           : 75
model name      : AMD Athlon(tm) 64 X2 Dual Core Processor 4600+
stepping        : 2
cpu MHz         : 2400.000
cache size      : 512 KB
physical id     : 0
siblings        : 2
core id         : 1
cpu cores       : 2
apicid          : 1
initial apicid  : 1
fpu             : yes
fpu_exception   : yes
cpuid level     : 1
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt rdtscp lm 3dnowext 3dnow rep_good nopl cpuid pni cx16 lahf_lm cmp_legacy svm extapic cr8_legacy 3dnowprefetch vmmcall
bugs            : fxsave_leak sysret_ss_attrs null_seg swapgs_fence amd_e400
bogomips        : 4809.82
TLB size        : 1024 4K pages
clflush size    : 64
cache_alignment : 64
address sizes   : 40 bits physical, 48 bits virtual
power management: ts fid vid ttp tm stc
Comment 1 p_c_chan 2017-11-03 23:08:05 UTC
Comment on attachment 260503 [details]
This the good dmesg from 4.12.14

In particular the dmesg from 4.12.14 shows 

[    0.015180] ..TIMER: vector=0x30 apic1=0 pin1=0 apic2=-1 pin2=-1
[    0.016000] ..MP-BIOS bug: 8254 timer not connected to IO-APIC
[    0.016000] ...trying to set up timer (IRQ0) through the 8259A ...
[    0.016000] ..... (found apic 0 pin 0) ...
[    0.018000] ....... failed.
[    0.018000] ...trying to set up timer as Virtual Wire IRQ...
[    0.018000] ..... failed.
[    0.018000] ...trying to set up timer as ExtINT IRQ...
[    0.028870] ..... works.

For 4.13 or newer, trying to set up timer as ExtINT IRQ fails, giving us the kernel panic.
Comment 2 ecm4 2017-11-17 22:28:50 UTC
I am having the exact same problem as of 4.13.x. 4.12 and before worked fine.

I am able to boot with noapic option, but would prefer not to have to do that.  Also have only been running this way for a matter of hours, so not sure how stable it will be.

From kernel panic:

...trying to set up timer as ExtINT IRQ...
.....failed :(.
Kernel panic - not syncing: IO-APIC + timer doesn't work!  Boot with apic=debug and send a report.  Then try booting with the 'noapic' option.
CPU: 0 PID: 1  Comm: swapper/0 Not tainted 4.13.12-100.fc25.x86_64 #1
Hardware name: HP Pavilion 061 EW172AV-ABA a1530e/NAGAMI2, BIOS 3.11 09/19/2006

CPU Info:
$ cat /proc/cpuinfo
processor       : 0
vendor_id       : AuthenticAMD
cpu family      : 15
model           : 47
model name      : AMD Athlon(tm) 64 Processor 3500+
stepping        : 2
cpu MHz         : 1000.000
cache size      : 512 KB
physical id     : 0
siblings        : 1
core id         : 0
cpu cores       : 1
apicid          : 0
initial apicid  : 0
fpu             : yes
fpu_exception   : yes
cpuid level     : 1
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 syscall nx mmxext fxsr_opt lm 3dnowext 3dnow rep_good nopl cpuid pni lahf_lm 3dnowprefetch vmmcall
bugs            : fxsave_leak sysret_ss_attrs null_seg swapgs_fence
bogomips        : 2004.03
TLB size        : 1024 4K pages
clflush size    : 64
cache_alignment : 64
address sizes   : 40 bits physical, 48 bits virtual
power management: ts fid vid ttp tm stc
Comment 3 p_c_chan 2017-11-18 01:25:25 UTC
Good to know I am not the only one.  My desktop is also an HP Pavillion. It is called a1677c.
Comment 4 p_c_chan 2017-11-18 01:44:23 UTC
Created attachment 260707 [details]
this is hwinfo, also from 4.12.14
Comment 5 p_c_chan 2017-11-19 13:56:31 UTC
Hit panic same way using 4.14.0.  The problem is still there since introduced in 4.13.0.
Comment 6 ecm4 2017-11-19 16:39:16 UTC
Created attachment 260717 [details]
Crash Screenshot 4.13.12
Comment 7 p_c_chan 2017-12-15 00:29:08 UTC
Problem also exists in 4.15-rc3.
Comment 8 p_c_chan 2018-01-12 16:24:22 UTC
Wonder when a fix for this would be available.
Comment 9 Luis 2018-01-22 14:43:22 UTC
I am having the same problem. In previous Kernel, the system will give a error message and continue, but now it throws a kernel panic message and hangs. My computer has a Asus M2N4-SLI motherboard. I have tried 'noapic' option and many variations using nolapic, acpi=off/[others], pci=off/biosirq and other similar options, and sometimes it boots but mouse and keyboard don't work. Reverting to a previous kernel boots ok.
Comment 10 p_c_chan 2018-01-23 04:21:58 UTC
Problem still exists in 4.14.14.  There are some apic related commits but do not fix this issue.
Comment 11 ecm4 2018-01-26 19:20:29 UTC
I am still facing this issue as well.... unable to upgrade past version 4.12 on my system due to the bug.
Comment 12 p_c_chan 2018-03-12 02:03:27 UTC
Four months later, still nobody looks at this issue.  We just keep adding features, not even care if it works. :(
Comment 13 p_c_chan 2018-04-01 23:18:08 UTC
5 months after, status is still new!???
Comment 14 p_c_chan 2018-06-29 19:53:52 UTC
Ubuntu live DVDs are using the new kernels and I can't boot those any more.  

Status is still new!???
Comment 15 Matzes 2018-07-02 09:38:19 UTC
Seems FSC D2461 aka FSC Esprimo P5615 is also affected.
Comment 16 Matzes 2018-07-02 09:51:38 UTC
(In reply to p_c_chan from comment #14)
> Ubuntu live DVDs are using the new kernels and I can't boot those any more.  
> 
> Status is still new!???

Did you manage to identify/contact one of the developers of the code changes that you've isolated?
Comment 17 p_c_chan 2018-07-03 14:35:46 UTC
No, I don't know about any developers that I can contact.  It looked like nobody cares.    The problem was reported last year and the status is still new. 

My desktop got a bad update last weekend, nvidia module wouldn't load.  Then I did a bad move, tried recompiling the kernel.  The problem tried out to be that kernel modules would be built with the wrong format.  No modules would modprobe' after recompilation! I could not even log in with command line after the reboot.   I was running 4.12.14. Hence it is not really the fault of this bug yet.

Then when I decided to reinstall the desktop with ubuntu, this bug hit me hard.  Ubuntu is using 4.13 even for the oldest available 2016 LTS!  I burnt a pile of DVD's and none of them could get through this kernel panic.    

(Some of the new gcc's probably use pic or something in compiling kernel modules.  I could not identify which one and it was getting late.  Ended up I downgraded the desktop from unstable Debian to stable.  Anyway luckily sshd, sftpd, ethernet and apt still worked.) 

I am surprised ubuntu did not catch this bug. LTS are supposed to be stable for everybody.  

Perhaps we have to wait for redhat to catch it.  I guess redhat should be big enough to come up with a fix or ask for a fix.
Comment 18 Matzes 2018-07-05 07:12:00 UTC
Maybe this bug is limited to some older systems with no longer common (at least by kernel developers and distributions makers) used hardware/chipset/bios. I think one (who is interested in solving the regression) has to do some more analysis and try to identify and contact the developer who made the offending changes.

Unfortunately I'm short on time ;-) and at the moment content (to some extent) with one of the latest LTS-Kernels (4.9.110).
Comment 19 p_c_chan 2018-07-05 16:27:10 UTC
Debian is still using 4.9.  That works.  Ubuntu is on 4.13 and above already.  that fails to boot unless we modify the iso before burning it onto DVD.

It was a single big change from upstream.  Not sure how I can debug it, as it fails so early in boot time, to find out what and who did the damage.

My desktop is a HP with a standalone nvidia graphic card, not so noname, still has enough horses for running linux for web and youtube.  I actually downgraded it to stable debian, staying away from the leading edge after what happened last weekend.   
  
Anyway, updating is not always safe.  I virtually bricked my old Samsung tab 10.1 last week trying to bring it to 7.1. It failed to load the 7.1 zip.  I haven't found a way to put the old image or the official stock image back on from outside, in spite of CWM is still working.  :(
Comment 20 Matzes 2018-07-05 19:39:57 UTC
Seems this Debian Bug report is at least very similiar:
Debian Bug report logs - #883294
linux-image-4.13.0-1-amd64: Kernel panic prevents boot: regression (apic)
https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=883294

---

@ p_c_chan: Since you've already isolated the offending commit (did you use git bisect and have you checked it or is it only a suspicion?) there is some contact for this commit (and APIC):
Thomas Gleixner <tglx@linutronix.de> (maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT))
Comment 21 p_c_chan 2018-07-05 19:53:29 UTC
Looks similar.

"> > - Instead of not using IO-APIC completly, could you try to boot with
> > kernel parameter "no_timer_check" ?"

Did no_timer_check work?

I don't know about git bisect.  I tried narrowing it down manually bit by bit, the diff is way too big.
Comment 22 Matzes 2018-07-05 22:40:04 UTC
(In reply to p_c_chan from comment #21)
> Looks similar.
> 
> "> > - Instead of not using IO-APIC completly, could you try to boot with
> > > kernel parameter "no_timer_check" ?"
> 
> Did no_timer_check work?

No, does not work for me (4.15) - but perhaps due to other bugs who knows. 

> 
> I don't know about git bisect.  I tried narrowing it down manually bit by
> bit, the diff is way too big.


So how did you find out that this commit is the one? Did you check out one kernel with the suspicious commit and one before that commit, compile them both and can proof that from one is working and the other not, that this is the offending commit?
Comment 23 p_c_chan 2018-07-05 22:56:11 UTC
The very last 4.12.14 is ok. The very first 4.13-rc I found failed.  I think I had tried rc1.
Comment 24 Matzes 2018-07-09 07:29:02 UTC
(In reply to p_c_chan from comment #23)
> The very last 4.12.14 is ok. The very first 4.13-rc I found failed.  I think
> I had tried rc1.

For me (on FSC Esprimo P5615)
The last working mainline kernel:

commit 1b044f1cfc65a7d90b209dfabd57e16d98b58c5b
Date:   Mon Jul 3 16:14:51 2017 -0700
Merge branch 'timers-core-for-linus'

The first one with the bug (the commit p_c_chan reported for this bug-report):

commit 03ffbcdd7898c0b5299efeb9f18de927487ec1cf
Date:   Mon Jul 3 16:50:31 2017 -0700
Merge branch 'irq-core-for-linus'
Comment 25 Luis 2018-07-09 14:58:24 UTC
Interesting info. I am not very good spotting changes in git submissions. Maybe someone is. Looking through the latest kernel and searching for "MP-BIOS" I found it in the following file, which suggests it could be the culprit. I remember in the past that this is surely related to the apic, consistent with the filename.

linux-4.17.5\arch\x86\kernel\apic\io_apic.c
Line 2153:
		if (!no_pin1)
			apic_printk(APIC_QUIET, KERN_ERR "..MP-BIOS bug: "
				    "8254 timer not connected to IO-APIC\n");
Comment 26 p_c_chan 2018-07-13 23:51:30 UTC
Just tried 4.17.6.  Still fails, i.e. isn't fixed.
Comment 27 Matzes 2018-07-14 07:02:27 UTC
(In reply to p_c_chan from comment #26)
> Just tried 4.17.6.  Still fails, i.e. isn't fixed.

Same is true for FSC Esprimo P5615.
Comment 28 ecm4 2018-07-26 16:49:50 UTC
I tried using the no_timer_check kernel option today on my system and it didn't work; it hangs early in the boot process with:

[ 0.002000] Spectre V2 : Spectre v2 mitigation: Filling RSB on context switch

So I had to go back to the noapic option. This is on Fedora 27 with 4.17.7.
Comment 29 p_c_chan 2018-11-28 23:49:28 UTC
Failed in 4.20-rc3 too.
Comment 30 p_c_chan 2019-04-17 21:52:14 UTC
Failed in 5.1-rc5 too.
Comment 31 ecm4 2019-04-18 13:48:03 UTC
I'm guessing this only affects a small subset of machines, but is there any way to get this bug corrected??
Comment 32 p_c_chan 2019-08-01 13:47:13 UTC
Ah, it has been amost 2 years since I raised this issue in another report before raising this for AMD. The problem still exists in any kernel after 4.12.  The report still has the new status. :( It looks like bugzilla is dead.  

I spent some time adding printk's to timer_irq_works() and kernel compiles last night.  It showed that we really do not have any increase of jiffies from ExtINT in 4.19.63.  In this new 4.19 it complaint something about missing vector in using ExtINT right before it fails. That's something new and may point us to the bug.   I'll dig into the set up for using ExtINT.    

In comparison we receive expected (within +/- 2) jiffies from ExtINT in 4.9.186.  Something is broken going across 4.13.
Comment 33 p_c_chan 2019-08-01 22:33:27 UTC
Created attachment 284079 [details]
screenshot on 4.19.63

From 4.19.63.  Added printk to show jiffies.
Comment 34 p_c_chan 2019-08-01 22:35:01 UTC
That error was comming from irq.c. Not sure why yet.
Comment 35 p_c_chan 2019-08-01 23:58:11 UTC
Rework of vector management?
Comment 36 p_c_chan 2019-09-30 22:06:49 UTC
Any chance of having someone looking into this before the problem turns 2 years old?
Comment 37 Rodrigo 2020-09-05 23:50:51 UTC
Hello,
I am trying to use an old box with new linux.
My MB is Asus m2n4-sli with AMD Athlon(tm) 64 X2 Dual Core Processor 4200+
GPU Asus Nvidia Gforce 7600 GT.
Pretty old but this box was working very good with older distros.
No I am trying to load a new linux via usb or cd and nothing worked. I've tried the newest stable Debian, Ubuntu, Lubuntu, Kali, OpenSuse, Fedora and Centos.

All failed.

Got black screen or at most error message: 8254 timer not connected to IO-APIC

Now my already instaled Debian is working in this box, but I can't load now another distro.

I guess this is a problem of old bios and hardware with new kernel.
I've seen this problem in several linux sites and forums with no useful solutions.

Do you have any idea about what could be done to solve this issue?
Thanks a bunch
Rod
Comment 38 p_c_chan 2020-09-08 03:34:16 UTC
Unfortunately this bug just sits here forever.  

I have to downngrade the kernel to 4.9.x, the highest stream still receiving updates.
Comment 39 Thomas Gleixner 2020-09-21 08:30:33 UTC
On Mon, Sep 30 2019 at 22:06, bugzilla-daemon wrote:
> https://bugzilla.kernel.org/show_bug.cgi?id=197769
>
> --- Comment #36 from p_c_chan@hotmail.com ---
> Any chance of having someone looking into this before the problem turns 2
> years
> old?

Staring at it rihgt now...
Comment 40 Thomas Gleixner 2020-09-21 10:04:10 UTC
On Mon, Sep 21 2020 at 08:30, bugzilla-daemon wrote:
> https://bugzilla.kernel.org/show_bug.cgi?id=197769
>
> --- Comment #39 from Thomas Gleixner (tglx@linutronix.de) ---
> On Mon, Sep 30 2019 at 22:06, bugzilla-daemon wrote:
>> https://bugzilla.kernel.org/show_bug.cgi?id=197769
>>
>> --- Comment #36 from p_c_chan@hotmail.com ---
>> Any chance of having someone looking into this before the problem turns 2
>> years
>> old?
>
> Staring at it rihgt now...

can anyone please test the patch below?

Thanks,

        tglx
---
--- a/arch/x86/kernel/apic/io_apic.c
+++ b/arch/x86/kernel/apic/io_apic.c
@@ -2243,6 +2243,7 @@ static inline void __init check_timer(vo
 	legacy_pic->init(0);
 	legacy_pic->make_irq(0);
 	apic_write(APIC_LVT0, APIC_DM_EXTINT);
+	legacy_pic->unmask(0);
 
 	unlock_ExtINT_logic();
Comment 41 Rodrigo 2020-09-21 11:23:15 UTC
Created attachment 292553 [details]
attachment-22006-0.html

Ok, I will check.

El lun., 21 sept. 2020 7:04, <bugzilla-daemon@bugzilla.kernel.org> escribió:

> https://bugzilla.kernel.org/show_bug.cgi?id=197769
>
> --- Comment #40 from Thomas Gleixner (tglx@linutronix.de) ---
> On Mon, Sep 21 2020 at 08:30, bugzilla-daemon wrote:
> > https://bugzilla.kernel.org/show_bug.cgi?id=197769
> >
> > --- Comment #39 from Thomas Gleixner (tglx@linutronix.de) ---
> > On Mon, Sep 30 2019 at 22:06, bugzilla-daemon wrote:
> >> https://bugzilla.kernel.org/show_bug.cgi?id=197769
> >>
> >> --- Comment #36 from p_c_chan@hotmail.com ---
> >> Any chance of having someone looking into this before the problem turns
> 2
> >> years
> >> old?
> >
> > Staring at it rihgt now...
>
> can anyone please test the patch below?
>
> Thanks,
>
>         tglx
> ---
> --- a/arch/x86/kernel/apic/io_apic.c
> +++ b/arch/x86/kernel/apic/io_apic.c
> @@ -2243,6 +2243,7 @@ static inline void __init check_timer(vo
>         legacy_pic->init(0);
>         legacy_pic->make_irq(0);
>         apic_write(APIC_LVT0, APIC_DM_EXTINT);
> +       legacy_pic->unmask(0);
>
>         unlock_ExtINT_logic();
>
> --
> You are receiving this mail because:
> You are on the CC list for the bug.
Comment 42 Matzes 2020-09-22 08:46:47 UTC
(In reply to Thomas Gleixner from comment #40)
> On Mon, Sep 21 2020 at 08:30, bugzilla-daemon wrote:
> > https://bugzilla.kernel.org/show_bug.cgi?id=197769
> >
> > --- Comment #39 from Thomas Gleixner (tglx@linutronix.de) ---
> > On Mon, Sep 30 2019 at 22:06, bugzilla-daemon wrote:
> >> https://bugzilla.kernel.org/show_bug.cgi?id=197769
> >>
> >> --- Comment #36 from p_c_chan@hotmail.com ---
> >> Any chance of having someone looking into this before the problem turns 2
> >> years
> >> old?
> >
> > Staring at it rihgt now...
> 
> can anyone please test the patch below?
> 
> Thanks,
> 
>         tglx
> ---
> --- a/arch/x86/kernel/apic/io_apic.c
> +++ b/arch/x86/kernel/apic/io_apic.c
> @@ -2243,6 +2243,7 @@ static inline void __init check_timer(vo
>       legacy_pic->init(0);
>       legacy_pic->make_irq(0);
>       apic_write(APIC_LVT0, APIC_DM_EXTINT);
> +     legacy_pic->unmask(0);
>  
>       unlock_ExtINT_logic();

TEST:

On FSC D2461 aka FSC Esprimo P5615
Testet with your patch (not yet testet w/o patch for this kernel version):
[    0.000000] Linux version 5.9.0-rc6-custom
[    0.000000] DMI: FUJITSU SIEMENS ESPRIMO P                     /D2461-A2, BIOS 6.00 R1.15.2461.A2               10/22/2007
[    0.280852] APIC: Switch to symmetric I/O mode setup
[    0.281533] ..TIMER: vector=0x30 apic1=0 pin1=0 apic2=-1 pin2=-1
[    0.334662] ..MP-BIOS bug: 8254 timer not connected to IO-APIC
[    0.334665] ...trying to set up timer (IRQ0) through the 8259A ...
[    0.334668] ..... (found apic 0 pin 0) ...
[    0.387795] ....... failed.
[    0.387797] ...trying to set up timer as Virtual Wire IRQ...
[    0.440896] ..... failed.
[    0.440897] ...trying to set up timer as ExtINT IRQ...
[    0.656870] ..... works.
Comment 43 Thomas Gleixner 2020-09-22 11:30:06 UTC
On Tue, Sep 22 2020 at 08:46, bugzilla-daemon wrote:
> TEST:
>
> On FSC D2461 aka FSC Esprimo P5615
> Testet with your patch (not yet testet w/o patch for this kernel
> version):

It should be the same problem with an unmodified 5.9-rc kernel, but it
would be nice if you could confirm.

Thanks,

        tglx
Comment 44 Matzes 2020-09-22 12:09:33 UTC
(In reply to Thomas Gleixner from comment #43)
> On Tue, Sep 22 2020 at 08:46, bugzilla-daemon wrote:
> > TEST:
> >
> > On FSC D2461 aka FSC Esprimo P5615
> > Testet with your patch (not yet testet w/o patch for this kernel
> > version):
> 
> It should be the same problem with an unmodified 5.9-rc kernel, but it
> would be nice if you could confirm.
> 
> Thanks,
> 
>         tglx

Not shure if you get me right. So to make it more clear:

[    0.656870] ..... works.

Test with patched 5.9-rc6 kernel was successfull for me (despite the "failed" messages - which were there before our problem occured -  see  "This the good dmesg from 4.12.14" in the post from 2017-11-03 23:08:05 UTC ).

-> With patch: No kernel panic, no need for noapic boot parameter and system still running. Just had to confirm that the problem without the patch is still there.
Comment 45 Thomas Gleixner 2020-09-22 13:24:39 UTC
On Tue, Sep 22 2020 at 12:09, bugzilla-daemon wrote:
> Not shure if you get me right. So to make it more clear:

I did.

> -> With patch: No kernel panic, no need for noapic boot parameter and system
> still running. Just had to confirm that the problem without the patch is
> still
> there.

That's what I was asking for:

>> It should be the same problem with an unmodified 5.9-rc kernel, but it
>> would be nice if you could confirm.

unmodified == not patched
Comment 46 Matzes 2020-09-22 20:25:46 UTC
> 
> >> It should be the same problem with an unmodified 5.9-rc kernel, but it
> >> would be nice if you could confirm.
> 
> unmodified == not patched
TEST:

On FSC D2461 aka FSC Esprimo P5615
Testet without your patch =
TEST of unmodified 5.9-rc kernel for confirmation:

Only boots with "noapic"

[    0.000000] Linux version 5.9.0-rc6-unpatched ...
[    0.000000] Command line: BOOT_IMAGE=/boot/vmlinuz-5.9.0-rc6-unpatched  ro noapic ...
[    0.000000] DMI: FUJITSU SIEMENS ESPRIMO P                     /D2461-A2, BIOS 6.00 R1.15.2461.A2               10/22/2007
[    0.277636] APIC: Switch to symmetric I/O mode setup
[    0.277639] Not enabling interrupt remapping due to skipped IO-APIC setup

Thanks for your patch, good job!
Comment 47 p_c_chan 2020-09-23 03:02:26 UTC
Patched 5.9.0-RC6, it did work finally.  Thank you very much. 

My HP boots OK, but can't build nvidia-kernel-dkms.  Dkma from Debian stable is likely too old for 5.9.   Proabably would settle with 4.19 or 5.4 for long term.
Comment 48 p_c_chan 2020-09-23 04:37:01 UTC
Good patched 4.19.146.  It works.  Good job.
Comment 49 p_c_chan 2020-09-23 15:08:51 UTC
Please commit patch to longterm releases as well.

Thanks.
Comment 50 p_c_chan 2020-10-03 02:52:51 UTC
4.19.149 showing up with fix.  Works fine.

Thank you.