Bug 41132

Summary: [BISECTED][REGRESSION] Regression with the IRQ subsystem introduced in 2.6.39 (and present in the 3.x version)
Product: Other Reporter: Rogério Brito (rbrito)
Component: OtherAssignee: Edward Donovan (edward.donovan)
Status: CLOSED CODE_FIX    
Severity: normal CC: edward.donovan, rbrito, tglx
Priority: P1    
Hardware: All   
OS: Linux   
URL: http://git.kernel.org/?p=linux/kernel/git/torvalds/linux.git;a=commit;h=c75d720fca8a91ce99196d33adea383621027bf2
Kernel Version: 2.6.39+ Subsystem:
Regression: Yes Bisected commit-id:
Attachments: dmesg that works
contents of /proc/cpuinfo
contents of /proc/interrupts
output of lspci
output of lscpi -v
output of lspci -vv
output of lspci -vx
output of lspci -vvxx
log of Xorg when it works fine
output of acpidump on this computer
dmesg with kernel 3.1.0-rc4+
configuration of the kernel 3.1.0-rc4+

Description Rogério Brito 2011-08-14 10:58:05 UTC
Hi, first of all, the categorization here in Bugzilla is not fine-grained enough (or, at least, I could not find a category that dealt with the IRQ subsistem).

Let me also say that I am a newbie, but that I am willing to provide copious amounts of debugging information about my system, if I am directed what to do. I may, depending on the circumstances, even provide remote/SSH to this system of mine to ease debugging this nasty issue.

OK, now, to the description, which is mostly a copy-and-paste of things that I sent to Thomas Gleixner, Dave Arlier and the LKML:


After experiencing problems with many post 2.6.38 kernels (including the 3.x kernels) with X when I used Firefox to see any webpage that used some of the new HTML5 features (Firefox freeze completely X and I could only log in remotely with SSH), I took some time to see where things went wrong and, to cut a long story short, I bisected the kernel and I found that a commit of yours was the first one that introduced a problem with my system:

    fa27271bc8d230355c1f24ddea103824fdc12de6: genirq: Fixup poll handling

Originally, I thought that the only affected part of the system was the video, but it didn't take me long to test the system and discover that my USB printer was also foobar'ed.

In particular, when I accidentally booted with kernel 3.0 (I have many kernels in GRUB2): I printed a LaTeX document with only 2 pages of pure text.  My printer is an inkjet printer connected via USB to my computer and it took about 6 minutes to print those 2 pages. The printer head moved, printing something, then it paused a bit, then it printed some more, then it paused some more, and so on.

After just rebooting with 2.6.38, the very same document (with same userland etc.) was printed in 15 at most.

My question now: is there anything that I could do to rectify this situation? I can give you loads of dumps from here, as long as you tell me what would be useful.


Thanks,
Rogério Brito.
Comment 1 Rogério Brito 2011-08-16 05:49:04 UTC
Updating the title of this bug to make it clear that this is a regression *and* that I took the time to `git bisect` the trees and find the first culprit commit.
Comment 2 Rogério Brito 2011-08-16 05:51:23 UTC
Adding Thomas Gleixner to the CC list, as he is the author of the first bad commit.
Comment 3 Rogério Brito 2011-08-16 05:54:47 UTC
Add note that this bug was first reported on 2011-07-04

    https://lkml.org/lkml/2011/7/24/54

As stated on that e-mail, I am willing to perform any tests to help fix this bug. Just ask me and I will do my best to give you copious amounts of data.


Thanks,
Rogério Brito.
Comment 4 Rogério Brito 2011-08-18 05:05:12 UTC
Hi.

I am attaching here diverse logs from the kernel that works. As nobody has told me what can be useful, I am trying to err on the excess side instead of lack of details.

Please, help me make more detailed bugreports, as I don't really know what could be more useful than these logs that I have provided you *and* having taken the time to git-bisect the kernel to find the first problematic commit.

Thanks.
Comment 5 Rogério Brito 2011-08-18 05:06:30 UTC
Created attachment 69152 [details]
dmesg that works
Comment 6 Rogério Brito 2011-08-18 05:06:59 UTC
Created attachment 69162 [details]
contents of /proc/cpuinfo
Comment 7 Rogério Brito 2011-08-18 05:07:26 UTC
Created attachment 69172 [details]
contents of /proc/interrupts
Comment 8 Rogério Brito 2011-08-18 05:07:56 UTC
Created attachment 69182 [details]
output of lspci
Comment 9 Rogério Brito 2011-08-18 05:08:28 UTC
Created attachment 69192 [details]
output of lscpi -v
Comment 10 Rogério Brito 2011-08-18 05:08:51 UTC
Created attachment 69202 [details]
output of lspci -vv
Comment 11 Rogério Brito 2011-08-18 05:09:24 UTC
Created attachment 69212 [details]
output of lspci -vx
Comment 12 Rogério Brito 2011-08-18 05:09:54 UTC
Created attachment 69222 [details]
output of lspci -vvxx
Comment 13 Rogério Brito 2011-08-18 05:10:25 UTC
Created attachment 69232 [details]
log of Xorg when it works fine
Comment 14 Rogério Brito 2011-08-23 22:03:06 UTC
Just for the record:

1 - I tried to change the asignee to Drivers/PCI here in bugzilla, but it seems that I don't have the required privileges.

2 - I just tested with Linus' tree as of right now and the problem persists with 3.1-rc3-18-g35a177a.

If there is any information else that I can provide here, please let me know.


Thanks for any help.
Comment 15 Rogério Brito 2011-09-07 01:40:06 UTC
Created attachment 71802 [details]
output of acpidump on this computer
Comment 16 Rogério Brito 2011-09-07 01:43:06 UTC
Created attachment 71812 [details]
dmesg with kernel 3.1.0-rc4+

This kernel has been compiled with the option "irqpoll" passed and with following diff applied, as per a private suggestion that I received:

diff --git a/kernel/irq/spurious.c b/kernel/irq/spurious.c
index aa57d5d..edcc8c1 100644
--- a/kernel/irq/spurious.c
+++ b/kernel/irq/spurious.c
@@ -84,7 +84,7 @@ static int try_one_irq(int irq, struct irq_desc *desc, bool force)
         */
        action = desc->action;
        if (!action || !(action->flags & IRQF_SHARED) ||
-           (action->flags & __IRQF_TIMER) || !action->next)
+           (action->flags & __IRQF_TIMER))
                goto out;
 
        /* Already running on another processor */
Comment 17 Rogério Brito 2011-09-07 01:45:48 UTC
Created attachment 71822 [details]
configuration of the kernel 3.1.0-rc4+
Comment 18 Rogério Brito 2011-09-07 01:48:32 UTC
(In reply to comment #14)
> Just for the record:
> 
> 1 - I tried to change the asignee to Drivers/PCI here in bugzilla, but it
> seems
> that I don't have the required privileges.

I have taken the liberty of including drivers_pci@kernel-bugs.osdl.org to the list of CC'ed people, as I don't seem to have the required privileges cited above.

Thanks for any help.
Comment 19 Edward Donovan 2012-02-11 07:44:19 UTC
Rogério, from our conversation, I think that this particular bug is fixed, the regressions in 2.6.39 that broke irqfixup & irqpoll.  Every user I've heard from, who had them working in 2.6.38, has them working again now, in 3.2, and 3.0 and 3.1 updates.

For reference, here are the git commits:

  http://git.kernel.org/?p=linux/kernel/git/torvalds/linux.git;a=commit;h=c75d720fca8a91ce99196d33adea383621027bf2

  http://git.kernel.org/?p=linux/kernel/git/torvalds/linux.git;a=commit;h=52553ddffad76ccf192d4dd9ce88d5818f57f62a

And here's one archive of our conversation with Linus:

  https://lkml.org/lkml/2011/11/27/226

Can you mark this as CLOSED?  I assigned it to myself (after the fact), but can only make it RESOLVED.  Thank you!

Edward
Comment 20 Edward Donovan 2012-02-14 04:08:32 UTC
Oops, I *tried* to put myself in the 'assigned' field, but it won't take.
Comment 21 Edward Donovan 2012-02-14 04:10:29 UTC
Ok, I managed to change the assignment, and mark it as closed.