Bug 8187

Summary: 2.6.20 "PCI: Quirks" patch breaks X11 on I82801
Product: Drivers Reporter: Kris Karas (bugs-a21)
Component: PCIAssignee: Greg Kroah-Hartman (greg)
Status: REJECTED WILL_NOT_FIX    
Severity: normal CC: alan, bunk, bzolnier, jbarnes, protasnb
Priority: P2    
Hardware: i386   
OS: Linux   
Kernel Version: 2.6.20 Subsystem:
Regression: --- Bisected commit-id:

Description Kris Karas 2007-03-12 13:29:57 UTC
Most recent kernel where this bug did *NOT* occur:
Any 2.6.20-pre prior to commit 368c73d4f689dae0807d0a2aa74c61fd2b9b075f

Distribution:  Slackware 11.0
Hardware Environment:  HP/Compaq dc5000S (P4, 82801, 82865)
Software Environment:  Xorg 6.9.0
Problem Description:

Alan Cox introduced a "PCI: Quirks" patch (git commit
368c73d4f689dae0807d0a2aa74c61fd2b9b075f) in 2.6.20 that breaks X11 on this
I82801 platform.  Specifically, it causes the PCI initialisation to become
buggered; Xorg 6.9.0 dumps the following to the console:
	(EE) end of block range 0x177 < begin 0x3f0
	(EE) end of block range 0x177 < begin 0x3f0
	(WW) ****INVALID IO ALLOCATION**** b: 0x14d0 e: 0x14d7 correcting
[...]
	Backtrace:
	0: X(xf86SigHandler+0x8a) [0x8088b2a]
	1: [0xb7f2b420]
	2: /usr/X11R6/lib/modules/drivers/i810_drv.so [0xb797f592]
	3: X(InitOutput+0xb83) [0x8072713]
	4: X(main+0x226) [0x80d4496]
	5: /lib/tls/libc.so.6(__libc_start_main+0xd4) [0xb7da7e14]
	6: X [0x806ff61]

	Fatal server error:
	Caught signal 11.  Server aborting

Steps to reproduce:

Reverting the git commit mentioned above fixes the issue.  Apparently, this may
be limited to certain combinations of on-motherboard chipsets, as I haven't seen
many bug reports.  Googling shows some people having X11 segfault issues with
2.6.20 (e.g. freedesktop.org bug #9956) but in most of those cases it's due to
the evdev driver and not PCI initialisation.

I wrote to Alan (cc'ed Greg as he signed off on the patch) nearly two weeks ago
but have heard nothing, so I'm leaving a bug here instead.
Comment 1 Anonymous Emailer 2007-03-12 22:19:57 UTC
Reply-To: akpm@linux-foundation.org

> On Mon, 12 Mar 2007 13:30:05 -0700 bugme-daemon@bugzilla.kernel.org wrote:
> http://bugzilla.kernel.org/show_bug.cgi?id=8187
> 
>            Summary: 2.6.20 "PCI: Quirks" patch breaks X11 on I82801
>     Kernel Version: 2.6.20
>             Status: NEW
>           Severity: normal
>              Owner: greg@kroah.com
>          Submitter: ktk@bigfoot.com
> 
> 
> Most recent kernel where this bug did *NOT* occur:
> Any 2.6.20-pre prior to commit 368c73d4f689dae0807d0a2aa74c61fd2b9b075f
> 
> Distribution:  Slackware 11.0
> Hardware Environment:  HP/Compaq dc5000S (P4, 82801, 82865)
> Software Environment:  Xorg 6.9.0
> Problem Description:
> 
> Alan Cox introduced a "PCI: Quirks" patch (git commit
> 368c73d4f689dae0807d0a2aa74c61fd2b9b075f) in 2.6.20 that breaks X11 on this
> I82801 platform.  Specifically, it causes the PCI initialisation to become
> buggered; Xorg 6.9.0 dumps the following to the console:
> 	(EE) end of block range 0x177 < begin 0x3f0
> 	(EE) end of block range 0x177 < begin 0x3f0
> 	(WW) ****INVALID IO ALLOCATION**** b: 0x14d0 e: 0x14d7 correcting
> [...]
> 	Backtrace:
> 	0: X(xf86SigHandler+0x8a) [0x8088b2a]
> 	1: [0xb7f2b420]
> 	2: /usr/X11R6/lib/modules/drivers/i810_drv.so [0xb797f592]
> 	3: X(InitOutput+0xb83) [0x8072713]
> 	4: X(main+0x226) [0x80d4496]
> 	5: /lib/tls/libc.so.6(__libc_start_main+0xd4) [0xb7da7e14]
> 	6: X [0x806ff61]
> 
> 	Fatal server error:
> 	Caught signal 11.  Server aborting
> 
> Steps to reproduce:
> 
> Reverting the git commit mentioned above fixes the issue.  Apparently, this may
> be limited to certain combinations of on-motherboard chipsets, as I haven't seen
> many bug reports.  Googling shows some people having X11 segfault issues with
> 2.6.20 (e.g. freedesktop.org bug #9956) but in most of those cases it's due to
> the evdev driver and not PCI initialisation.
> 
> I wrote to Alan (cc'ed Greg as he signed off on the patch) nearly two weeks ago
> but have heard nothing, so I'm leaving a bug here instead.
> 

argh.

Would we break more machines than we fix if we just revert that?

Comment 2 Greg Kroah-Hartman 2007-03-12 23:00:07 UTC
On Mon, Mar 12, 2007 at 10:19:52PM -0800, Andrew Morton wrote:
> > On Mon, 12 Mar 2007 13:30:05 -0700 bugme-daemon@bugzilla.kernel.org wrote:
> > http://bugzilla.kernel.org/show_bug.cgi?id=8187
> > 
> >            Summary: 2.6.20 "PCI: Quirks" patch breaks X11 on I82801
> >     Kernel Version: 2.6.20
> >             Status: NEW
> >           Severity: normal
> >              Owner: greg@kroah.com
> >          Submitter: ktk@bigfoot.com
> > 
> > 
> > Most recent kernel where this bug did *NOT* occur:
> > Any 2.6.20-pre prior to commit 368c73d4f689dae0807d0a2aa74c61fd2b9b075f
> > 
> > Distribution:  Slackware 11.0
> > Hardware Environment:  HP/Compaq dc5000S (P4, 82801, 82865)
> > Software Environment:  Xorg 6.9.0
> > Problem Description:
> > 
> > Alan Cox introduced a "PCI: Quirks" patch (git commit
> > 368c73d4f689dae0807d0a2aa74c61fd2b9b075f) in 2.6.20 that breaks X11 on this
> > I82801 platform.  Specifically, it causes the PCI initialisation to become
> > buggered; Xorg 6.9.0 dumps the following to the console:
> > 	(EE) end of block range 0x177 < begin 0x3f0
> > 	(EE) end of block range 0x177 < begin 0x3f0
> > 	(WW) ****INVALID IO ALLOCATION**** b: 0x14d0 e: 0x14d7 correcting
> > [...]
> > 	Backtrace:
> > 	0: X(xf86SigHandler+0x8a) [0x8088b2a]
> > 	1: [0xb7f2b420]
> > 	2: /usr/X11R6/lib/modules/drivers/i810_drv.so [0xb797f592]
> > 	3: X(InitOutput+0xb83) [0x8072713]
> > 	4: X(main+0x226) [0x80d4496]
> > 	5: /lib/tls/libc.so.6(__libc_start_main+0xd4) [0xb7da7e14]
> > 	6: X [0x806ff61]
> > 
> > 	Fatal server error:
> > 	Caught signal 11.  Server aborting
> > 
> > Steps to reproduce:
> > 
> > Reverting the git commit mentioned above fixes the issue.  Apparently, this may
> > be limited to certain combinations of on-motherboard chipsets, as I haven't seen
> > many bug reports.  Googling shows some people having X11 segfault issues with
> > 2.6.20 (e.g. freedesktop.org bug #9956) but in most of those cases it's due to
> > the evdev driver and not PCI initialisation.
> > 
> > I wrote to Alan (cc'ed Greg as he signed off on the patch) nearly two weeks ago
> > but have heard nothing, so I'm leaving a bug here instead.
> > 
> 
> argh.
> 
> Would we break more machines than we fix if we just revert that?

I don't know, Alan?

thanks,

greg k-h

Comment 3 Bartlomiej Zolnierkiewicz 2007-03-13 04:11:30 UTC
On Tuesday 13 March 2007, Andrew Morton wrote:
> > On Mon, 12 Mar 2007 13:30:05 -0700 bugme-daemon@bugzilla.kernel.org wrote:
> > http://bugzilla.kernel.org/show_bug.cgi?id=8187
> > 
> >            Summary: 2.6.20 "PCI: Quirks" patch breaks X11 on I82801
> >     Kernel Version: 2.6.20
> >             Status: NEW
> >           Severity: normal
> >              Owner: greg@kroah.com
> >          Submitter: ktk@bigfoot.com
> > 
> > 
> > Most recent kernel where this bug did *NOT* occur:
> > Any 2.6.20-pre prior to commit 368c73d4f689dae0807d0a2aa74c61fd2b9b075f
> > 
> > Distribution:  Slackware 11.0
> > Hardware Environment:  HP/Compaq dc5000S (P4, 82801, 82865)
> > Software Environment:  Xorg 6.9.0
> > Problem Description:
> > 
> > Alan Cox introduced a "PCI: Quirks" patch (git commit
> > 368c73d4f689dae0807d0a2aa74c61fd2b9b075f) in 2.6.20 that breaks X11 on this
> > I82801 platform.  Specifically, it causes the PCI initialisation to become
> > buggered; Xorg 6.9.0 dumps the following to the console:
> > 	(EE) end of block range 0x177 < begin 0x3f0
> > 	(EE) end of block range 0x177 < begin 0x3f0
> > 	(WW) ****INVALID IO ALLOCATION**** b: 0x14d0 e: 0x14d7 correcting
> > [...]
> > 	Backtrace:
> > 	0: X(xf86SigHandler+0x8a) [0x8088b2a]
> > 	1: [0xb7f2b420]
> > 	2: /usr/X11R6/lib/modules/drivers/i810_drv.so [0xb797f592]
> > 	3: X(InitOutput+0xb83) [0x8072713]
> > 	4: X(main+0x226) [0x80d4496]
> > 	5: /lib/tls/libc.so.6(__libc_start_main+0xd4) [0xb7da7e14]
> > 	6: X [0x806ff61]
> > 
> > 	Fatal server error:
> > 	Caught signal 11.  Server aborting
> > 
> > Steps to reproduce:
> > 
> > Reverting the git commit mentioned above fixes the issue.  Apparently, this may
> > be limited to certain combinations of on-motherboard chipsets, as I haven't seen
> > many bug reports.  Googling shows some people having X11 segfault issues with
> > 2.6.20 (e.g. freedesktop.org bug #9956) but in most of those cases it's due to
> > the evdev driver and not PCI initialisation.
> > 
> > I wrote to Alan (cc'ed Greg as he signed off on the patch) nearly two weeks ago
> > but have heard nothing, so I'm leaving a bug here instead.
> > 
> 
> argh.
> 
> Would we break more machines than we fix if we just revert that?

this should be fixed in 2.6.21-rc3,
commit ed8ccee0918ad063a4741c0656fda783e02df627

Bart

Comment 4 Kris Karas 2007-03-13 10:42:11 UTC
Confirmed.
commit ed8ccee0918ad063a4741c0656fda783e02df627 in 2.6.21-rc3 fixes this issue.

Thanks all...
Kris
Comment 5 Kris Karas 2007-04-26 10:43:40 UTC
Although fixed in 2.6.21-rc7, this bug has reappeared in 2.6.21 final.
Comment 6 Greg Kroah-Hartman 2007-04-26 10:55:48 UTC
How?  What fix broke it again?
Comment 7 Kris Karas 2007-04-27 09:55:41 UTC
The culprit appears to be this entry from the 2.6.21 changelog:

----------------------------------------------
commit 01abc2aa0f447bce2f6beb06dd0607ba0f01c5bb
Author: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
Date:   Mon Apr 23 23:19:36 2007 +0200

    Revert "adjust legacy IDE resource setting (v2)"
    
    This reverts commit ed8ccee0918ad063a4741c0656fda783e02df627.
    
    It causes hang on boot for some users and we don't yet know why:
    
    http://bugzilla.kernel.org/show_bug.cgi?id=7562
    
    http://lkml.org/lkml/2007/4/20/404
    http://lkml.org/lkml/2007/3/25/113
    
    Just reverse it for 2.6.21-final, having broken X server is somehow
    better than unbootable system.
    
    Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
-----------------------------------------

That said, the /original/ "bug" (er, feature?) was in commit
368c73d4f689dae0807d0a2aa74c61fd2b9b075f from Alan.
Comment 8 Natalie Protasevich 2007-07-06 17:53:38 UTC
Any update on this? Is there a way to fix the quirk not to break X server?
Thanks.
Comment 9 Kris Karas 2007-07-09 13:05:37 UTC
No, I haven't seen any progress yet; the last production kernel I can run is 2.6.19.7

Is anybody else having this issue too?  It occurs with HP/Compaq dc5000 workstations, of which I assume there are quite a few (mostly as corporate desktops I'd guess).

As for the comment immediately above from Bartlomiej, I note that in bug 7562, most of those folks are running tainted kernels.  We've made the tainted kernels happy at the expense of a stock, vanilla kernel?
Comment 10 Adrian Bunk 2007-07-09 17:48:39 UTC
(In reply to comment #9)
> No, I haven't seen any progress yet; the last production kernel I can run is
> 2.6.19.7
> 
> Is anybody else having this issue too?  It occurs with HP/Compaq dc5000
> workstations, of which I assume there are quite a few (mostly as corporate
> desktops I'd guess).

AFAIR we never figured out why the reverted commit had any effect at all.

> As for the comment immediately above from Bartlomiej, I note that in bug
> 7562,
> most of those folks are running tainted kernels.  We've made the tainted
> kernels happy at the expense of a stock, vanilla kernel?

That can't be true - the boot hangs resulting in this commit being reverted happen long before any module gets loaded, and are therefore obviously on untainted kernels.
Comment 11 Bartlomiej Zolnierkiewicz 2008-02-16 11:29:27 UTC
IIRC this has been fixed?
Comment 12 Kris Karas 2008-02-18 07:47:35 UTC
Bug was fixed momentarily, but the fix was reverted because it prevented some laptop owners from being able to boot.  Apparently, only a few people are susceptible to this bug; nobody else seems to have added a "me too".

Greg, Andrew, Bart...  Shall we just mark this as WILL_NOT_FIX?  I mean, we could certainly #ifdef code sections for just those few people who are bitten by this, but it seems almost silly given the presumably small userbase.  I'm already used to hand-patching every kernel I compile, so it can't get any worse for me.
Comment 13 Jesse Barnes 2008-03-14 12:02:51 UTC
Kris, does this still happen with more recent versions of X, specifically the 1.5 pre-releases?  We ripped out much of the PCI code in that version...
Comment 14 Kris Karas 2008-03-14 14:32:37 UTC
Good question, Jesse...
OK, I took some time to hack my box into an amalgam between xorg 6.9.0 and 1.3 and tried with a vanilla kernel.  As Alan Cox had surmised, the bug no longer appears.  (Tested against a vanilla kernel 2.6.24.3 and then Slackware's xorg-server-1.3.0.0 from the slackware-12.0 distro.)

So I'm going to take the liberty of closing this bug with Will-Not-Fix; it seems pointless to keep it open given how little impact it seems to have amongst the userbase...

Kris