Bug 188181

Summary: irq 16: nobody cared (try booting with the "irqpoll" option)
Product: Drivers Reporter: Sudaraka Wijesinghe (sudaraka)
Component: PCIAssignee: drivers_pci (drivers_pci)
Status: RESOLVED PATCH_ALREADY_AVAILABLE    
Severity: normal CC: regressions, tj
Priority: P1    
Hardware: x86-64   
OS: Linux   
Kernel Version: v4.9 Subsystem:
Regression: Yes Bisected commit-id:
Attachments: kernel config
lspci output
full dmesg
dmesg-4.8.10.txt
attachment-11200-0.sig

Description Sudaraka Wijesinghe 2016-11-19 14:52:03 UTC
Created attachment 245131 [details]
kernel config

Following kernel dump happens when performing long disk IO operation, and the performance of the system slows down from there on.

--- dmesg ----
[  197.909313] irq 16: nobody cared (try booting with the "irqpoll" option)
[  197.909318] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.9.0-rc5-sw+ #20
[  197.909319] Hardware name: Acer Aspire E5-574/Zoro_SL , BIOS V1.10 11/27/2015
[  197.909320]  ffff880372403eb8 ffffffff812e4b1a ffff880363343000 ffff88036334309c
[  197.909323]  ffff880372403ee8 ffffffff810a7b75 ffff880363343000 0000000000000000
[  197.909325]  ffffffff818d9e20 0000000000000051 ffff880372403f20 ffffffff810a7ef4
[  197.909326] Call Trace:
[  197.909328]  <IRQ> 
[  197.909332]  [<ffffffff812e4b1a>] dump_stack+0x4d/0x63
[  197.909335]  [<ffffffff810a7b75>] __report_bad_irq+0x35/0xc0
[  197.909336]  [<ffffffff810a7ef4>] note_interrupt+0x234/0x270
[  197.909338]  [<ffffffff810a5284>] handle_irq_event_percpu+0x54/0x80
[  197.909339]  [<ffffffff810a52e9>] handle_irq_event+0x39/0x60
[  197.909340]  [<ffffffff810a88d3>] handle_fasteoi_irq+0x93/0x180
[  197.909342]  [<ffffffff810178ca>] handle_irq+0x1a/0x30
[  197.909344]  [<ffffffff8156b23b>] do_IRQ+0x4b/0xd0
[  197.909346]  [<ffffffff8156983f>] common_interrupt+0x7f/0x7f
[  197.909346]  <EOI> 
[  197.909349]  [<ffffffff81448f2c>] ? cpuidle_enter_state+0x12c/0x2d0
[  197.909350]  [<ffffffff81449107>] cpuidle_enter+0x17/0x20
[  197.909352]  [<ffffffff81096b86>] cpu_startup_entry+0x1b6/0x210
[  197.909353]  [<ffffffff81562e3f>] rest_init+0x7f/0x90
[  197.909355]  [<ffffffff818feef7>] start_kernel+0x38b/0x3ac
[  197.909357]  [<ffffffff818fe28e>] x86_64_start_reservations+0x2a/0x2c
[  197.909358]  [<ffffffff818fe408>] x86_64_start_kernel+0x178/0x18b
[  197.909359] handlers:
[  197.909361] [<ffffffff8141a4e0>] ahci_single_level_irq_intr
[  197.909363] Disabling IRQ #16
--- dmesg ----


I have tracked down the issue to commit 0b9e2988ab2261fd6d4a0039edf81ed1e3662be8. Issue occurs after that.
Comment 1 Sudaraka Wijesinghe 2016-11-19 14:52:51 UTC
Created attachment 245141 [details]
lspci output
Comment 2 Sudaraka Wijesinghe 2016-11-19 14:53:27 UTC
Created attachment 245151 [details]
full dmesg
Comment 3 The Linux kernel's regression tracker (Thorsten Leemhuis) 2016-11-20 14:09:07 UTC
JFYI: I added below report to the list of regressions for Linux 4.9.
I'll watch this place for further updates on this issue to document
progress in my weekly reports. Please let me know via
regressions@leemhuis.info in case the discussion moves somewhere else (a bugzilla entry, a mailing list, …).

If nothing happens here in the next two das please report the problem to linux-ide@vger.kernel.org and the two people listed in this commit:
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=0b9e2988ab2261fd6d4a0039edf81ed1e3662be8
Comment 4 The Linux kernel's regression tracker (Thorsten Leemhuis) 2016-11-20 14:17:44 UTC
Tejun, can you please take a look at this?
Comment 5 Tejun Heo 2016-11-28 19:05:27 UTC
Sorry about the delay. Can you please post dmesg output from a kernel which doesn't show the issue. ahci recently had msi related updates and there were some issues with enabling msi, which is now fixed. Thanks.
Comment 6 Sudaraka Wijesinghe 2016-11-29 02:01:31 UTC
Created attachment 246111 [details]
dmesg-4.8.10.txt

Attached the dmesg from kernel 4.8.10 which is currently on Archlinux.
Related .config is on
https://git.archlinux.org/svntogit/packages.git/tree/trunk/config.x86_64?h=packages/linux
Comment 7 Sudaraka Wijesinghe 2016-11-29 02:01:32 UTC
Created attachment 246121 [details]
attachment-11200-0.sig
Comment 8 Sudaraka Wijesinghe 2016-11-29 02:37:23 UTC
I just ran the test (running something that thrashes the disk) on latest
kernel (4.9.0-rc7), and it did NOT throw the error nor make the system
effected the performance afterwards.

Hopefully the issue is fixed.

Please leave the bug open, I will monitor for couple of days and report
back.

Thanks.
Comment 9 Sudaraka Wijesinghe 2016-12-03 06:44:21 UTC
Every thing have been running smoothly. Thanks everyone for looking into this matter.