Bug 8187 - 2.6.20 "PCI: Quirks" patch breaks X11 on I82801
Summary: 2.6.20 "PCI: Quirks" patch breaks X11 on I82801
Status: REJECTED WILL_NOT_FIX
Alias: None
Product: Drivers
Classification: Unclassified
Component: PCI (show other bugs)
Hardware: i386 Linux
: P2 normal
Assignee: Greg Kroah-Hartman
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2007-03-12 13:29 UTC by Kris Karas
Modified: 2008-03-14 14:32 UTC (History)
5 users (show)

See Also:
Kernel Version: 2.6.20
Subsystem:
Regression: ---
Bisected commit-id:


Attachments

Description Kris Karas 2007-03-12 13:29:57 UTC
Most recent kernel where this bug did *NOT* occur:
Any 2.6.20-pre prior to commit 368c73d4f689dae0807d0a2aa74c61fd2b9b075f

Distribution:  Slackware 11.0
Hardware Environment:  HP/Compaq dc5000S (P4, 82801, 82865)
Software Environment:  Xorg 6.9.0
Problem Description:

Alan Cox introduced a "PCI: Quirks" patch (git commit
368c73d4f689dae0807d0a2aa74c61fd2b9b075f) in 2.6.20 that breaks X11 on this
I82801 platform.  Specifically, it causes the PCI initialisation to become
buggered; Xorg 6.9.0 dumps the following to the console:
	(EE) end of block range 0x177 < begin 0x3f0
	(EE) end of block range 0x177 < begin 0x3f0
	(WW) ****INVALID IO ALLOCATION**** b: 0x14d0 e: 0x14d7 correcting
[...]
	Backtrace:
	0: X(xf86SigHandler+0x8a) [0x8088b2a]
	1: [0xb7f2b420]
	2: /usr/X11R6/lib/modules/drivers/i810_drv.so [0xb797f592]
	3: X(InitOutput+0xb83) [0x8072713]
	4: X(main+0x226) [0x80d4496]
	5: /lib/tls/libc.so.6(__libc_start_main+0xd4) [0xb7da7e14]
	6: X [0x806ff61]

	Fatal server error:
	Caught signal 11.  Server aborting

Steps to reproduce:

Reverting the git commit mentioned above fixes the issue.  Apparently, this may
be limited to certain combinations of on-motherboard chipsets, as I haven't seen
many bug reports.  Googling shows some people having X11 segfault issues with
2.6.20 (e.g. freedesktop.org bug #9956) but in most of those cases it's due to
the evdev driver and not PCI initialisation.

I wrote to Alan (cc'ed Greg as he signed off on the patch) nearly two weeks ago
but have heard nothing, so I'm leaving a bug here instead.
Comment 1 Anonymous Emailer 2007-03-12 22:19:57 UTC
Reply-To: akpm@linux-foundation.org

> On Mon, 12 Mar 2007 13:30:05 -0700 bugme-daemon@bugzilla.kernel.org wrote:
> http://bugzilla.kernel.org/show_bug.cgi?id=8187
> 
>            Summary: 2.6.20 "PCI: Quirks" patch breaks X11 on I82801
>     Kernel Version: 2.6.20
>             Status: NEW
>           Severity: normal
>              Owner: greg@kroah.com
>          Submitter: ktk@bigfoot.com
> 
> 
> Most recent kernel where this bug did *NOT* occur:
> Any 2.6.20-pre prior to commit 368c73d4f689dae0807d0a2aa74c61fd2b9b075f
> 
> Distribution:  Slackware 11.0
> Hardware Environment:  HP/Compaq dc5000S (P4, 82801, 82865)
> Software Environment:  Xorg 6.9.0
> Problem Description:
> 
> Alan Cox introduced a "PCI: Quirks" patch (git commit
> 368c73d4f689dae0807d0a2aa74c61fd2b9b075f) in 2.6.20 that breaks X11 on this
> I82801 platform.  Specifically, it causes the PCI initialisation to become
> buggered; Xorg 6.9.0 dumps the following to the console:
> 	(EE) end of block range 0x177 < begin 0x3f0
> 	(EE) end of block range 0x177 < begin 0x3f0
> 	(WW) ****INVALID IO ALLOCATION**** b: 0x14d0 e: 0x14d7 correcting
> [...]
> 	Backtrace:
> 	0: X(xf86SigHandler+0x8a) [0x8088b2a]
> 	1: [0xb7f2b420]
> 	2: /usr/X11R6/lib/modules/drivers/i810_drv.so [0xb797f592]
> 	3: X(InitOutput+0xb83) [0x8072713]
> 	4: X(main+0x226) [0x80d4496]
> 	5: /lib/tls/libc.so.6(__libc_start_main+0xd4) [0xb7da7e14]
> 	6: X [0x806ff61]
> 
> 	Fatal server error:
> 	Caught signal 11.  Server aborting
> 
> Steps to reproduce:
> 
> Reverting the git commit mentioned above fixes the issue.  Apparently, this may
> be limited to certain combinations of on-motherboard chipsets, as I haven't seen
> many bug reports.  Googling shows some people having X11 segfault issues with
> 2.6.20 (e.g. freedesktop.org bug #9956) but in most of those cases it's due to
> the evdev driver and not PCI initialisation.
> 
> I wrote to Alan (cc'ed Greg as he signed off on the patch) nearly two weeks ago
> but have heard nothing, so I'm leaving a bug here instead.
> 

argh.

Would we break more machines than we fix if we just revert that?

Comment 2 Greg Kroah-Hartman 2007-03-12 23:00:07 UTC
On Mon, Mar 12, 2007 at 10:19:52PM -0800, Andrew Morton wrote:
> > On Mon, 12 Mar 2007 13:30:05 -0700 bugme-daemon@bugzilla.kernel.org wrote:
> > http://bugzilla.kernel.org/show_bug.cgi?id=8187
> > 
> >            Summary: 2.6.20 "PCI: Quirks" patch breaks X11 on I82801
> >     Kernel Version: 2.6.20
> >             Status: NEW
> >           Severity: normal
> >              Owner: greg@kroah.com
> >          Submitter: ktk@bigfoot.com
> > 
> > 
> > Most recent kernel where this bug did *NOT* occur:
> > Any 2.6.20-pre prior to commit 368c73d4f689dae0807d0a2aa74c61fd2b9b075f
> > 
> > Distribution:  Slackware 11.0
> > Hardware Environment:  HP/Compaq dc5000S (P4, 82801, 82865)
> > Software Environment:  Xorg 6.9.0
> > Problem Description:
> > 
> > Alan Cox introduced a "PCI: Quirks" patch (git commit
> > 368c73d4f689dae0807d0a2aa74c61fd2b9b075f) in 2.6.20 that breaks X11 on this
> > I82801 platform.  Specifically, it causes the PCI initialisation to become
> > buggered; Xorg 6.9.0 dumps the following to the console:
> > 	(EE) end of block range 0x177 < begin 0x3f0
> > 	(EE) end of block range 0x177 < begin 0x3f0
> > 	(WW) ****INVALID IO ALLOCATION**** b: 0x14d0 e: 0x14d7 correcting
> > [...]
> > 	Backtrace:
> > 	0: X(xf86SigHandler+0x8a) [0x8088b2a]
> > 	1: [0xb7f2b420]
> > 	2: /usr/X11R6/lib/modules/drivers/i810_drv.so [0xb797f592]
> > 	3: X(InitOutput+0xb83) [0x8072713]
> > 	4: X(main+0x226) [0x80d4496]
> > 	5: /lib/tls/libc.so.6(__libc_start_main+0xd4) [0xb7da7e14]
> > 	6: X [0x806ff61]
> > 
> > 	Fatal server error:
> > 	Caught signal 11.  Server aborting
> > 
> > Steps to reproduce:
> > 
> > Reverting the git commit mentioned above fixes the issue.  Apparently, this may
> > be limited to certain combinations of on-motherboard chipsets, as I haven't seen
> > many bug reports.  Googling shows some people having X11 segfault issues with
> > 2.6.20 (e.g. freedesktop.org bug #9956) but in most of those cases it's due to
> > the evdev driver and not PCI initialisation.
> > 
> > I wrote to Alan (cc'ed Greg as he signed off on the patch) nearly two weeks ago
> > but have heard nothing, so I'm leaving a bug here instead.
> > 
> 
> argh.
> 
> Would we break more machines than we fix if we just revert that?

I don't know, Alan?

thanks,

greg k-h

Comment 3 Bartlomiej Zolnierkiewicz 2007-03-13 04:11:30 UTC
On Tuesday 13 March 2007, Andrew Morton wrote:
> > On Mon, 12 Mar 2007 13:30:05 -0700 bugme-daemon@bugzilla.kernel.org wrote:
> > http://bugzilla.kernel.org/show_bug.cgi?id=8187
> > 
> >            Summary: 2.6.20 "PCI: Quirks" patch breaks X11 on I82801
> >     Kernel Version: 2.6.20
> >             Status: NEW
> >           Severity: normal
> >              Owner: greg@kroah.com
> >          Submitter: ktk@bigfoot.com
> > 
> > 
> > Most recent kernel where this bug did *NOT* occur:
> > Any 2.6.20-pre prior to commit 368c73d4f689dae0807d0a2aa74c61fd2b9b075f
> > 
> > Distribution:  Slackware 11.0
> > Hardware Environment:  HP/Compaq dc5000S (P4, 82801, 82865)
> > Software Environment:  Xorg 6.9.0
> > Problem Description:
> > 
> > Alan Cox introduced a "PCI: Quirks" patch (git commit
> > 368c73d4f689dae0807d0a2aa74c61fd2b9b075f) in 2.6.20 that breaks X11 on this
> > I82801 platform.  Specifically, it causes the PCI initialisation to become
> > buggered; Xorg 6.9.0 dumps the following to the console:
> > 	(EE) end of block range 0x177 < begin 0x3f0
> > 	(EE) end of block range 0x177 < begin 0x3f0
> > 	(WW) ****INVALID IO ALLOCATION**** b: 0x14d0 e: 0x14d7 correcting
> > [...]
> > 	Backtrace:
> > 	0: X(xf86SigHandler+0x8a) [0x8088b2a]
> > 	1: [0xb7f2b420]
> > 	2: /usr/X11R6/lib/modules/drivers/i810_drv.so [0xb797f592]
> > 	3: X(InitOutput+0xb83) [0x8072713]
> > 	4: X(main+0x226) [0x80d4496]
> > 	5: /lib/tls/libc.so.6(__libc_start_main+0xd4) [0xb7da7e14]
> > 	6: X [0x806ff61]
> > 
> > 	Fatal server error:
> > 	Caught signal 11.  Server aborting
> > 
> > Steps to reproduce:
> > 
> > Reverting the git commit mentioned above fixes the issue.  Apparently, this may
> > be limited to certain combinations of on-motherboard chipsets, as I haven't seen
> > many bug reports.  Googling shows some people having X11 segfault issues with
> > 2.6.20 (e.g. freedesktop.org bug #9956) but in most of those cases it's due to
> > the evdev driver and not PCI initialisation.
> > 
> > I wrote to Alan (cc'ed Greg as he signed off on the patch) nearly two weeks ago
> > but have heard nothing, so I'm leaving a bug here instead.
> > 
> 
> argh.
> 
> Would we break more machines than we fix if we just revert that?

this should be fixed in 2.6.21-rc3,
commit ed8ccee0918ad063a4741c0656fda783e02df627

Bart

Comment 4 Kris Karas 2007-03-13 10:42:11 UTC
Confirmed.
commit ed8ccee0918ad063a4741c0656fda783e02df627 in 2.6.21-rc3 fixes this issue.

Thanks all...
Kris
Comment 5 Kris Karas 2007-04-26 10:43:40 UTC
Although fixed in 2.6.21-rc7, this bug has reappeared in 2.6.21 final.
Comment 6 Greg Kroah-Hartman 2007-04-26 10:55:48 UTC
How?  What fix broke it again?
Comment 7 Kris Karas 2007-04-27 09:55:41 UTC
The culprit appears to be this entry from the 2.6.21 changelog:

----------------------------------------------
commit 01abc2aa0f447bce2f6beb06dd0607ba0f01c5bb
Author: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
Date:   Mon Apr 23 23:19:36 2007 +0200

    Revert "adjust legacy IDE resource setting (v2)"
    
    This reverts commit ed8ccee0918ad063a4741c0656fda783e02df627.
    
    It causes hang on boot for some users and we don't yet know why:
    
    http://bugzilla.kernel.org/show_bug.cgi?id=7562
    
    http://lkml.org/lkml/2007/4/20/404
    http://lkml.org/lkml/2007/3/25/113
    
    Just reverse it for 2.6.21-final, having broken X server is somehow
    better than unbootable system.
    
    Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
-----------------------------------------

That said, the /original/ "bug" (er, feature?) was in commit
368c73d4f689dae0807d0a2aa74c61fd2b9b075f from Alan.
Comment 8 Natalie Protasevich 2007-07-06 17:53:38 UTC
Any update on this? Is there a way to fix the quirk not to break X server?
Thanks.
Comment 9 Kris Karas 2007-07-09 13:05:37 UTC
No, I haven't seen any progress yet; the last production kernel I can run is 2.6.19.7

Is anybody else having this issue too?  It occurs with HP/Compaq dc5000 workstations, of which I assume there are quite a few (mostly as corporate desktops I'd guess).

As for the comment immediately above from Bartlomiej, I note that in bug 7562, most of those folks are running tainted kernels.  We've made the tainted kernels happy at the expense of a stock, vanilla kernel?
Comment 10 Adrian Bunk 2007-07-09 17:48:39 UTC
(In reply to comment #9)
> No, I haven't seen any progress yet; the last production kernel I can run is
> 2.6.19.7
> 
> Is anybody else having this issue too?  It occurs with HP/Compaq dc5000
> workstations, of which I assume there are quite a few (mostly as corporate
> desktops I'd guess).

AFAIR we never figured out why the reverted commit had any effect at all.

> As for the comment immediately above from Bartlomiej, I note that in bug
> 7562,
> most of those folks are running tainted kernels.  We've made the tainted
> kernels happy at the expense of a stock, vanilla kernel?

That can't be true - the boot hangs resulting in this commit being reverted happen long before any module gets loaded, and are therefore obviously on untainted kernels.
Comment 11 Bartlomiej Zolnierkiewicz 2008-02-16 11:29:27 UTC
IIRC this has been fixed?
Comment 12 Kris Karas 2008-02-18 07:47:35 UTC
Bug was fixed momentarily, but the fix was reverted because it prevented some laptop owners from being able to boot.  Apparently, only a few people are susceptible to this bug; nobody else seems to have added a "me too".

Greg, Andrew, Bart...  Shall we just mark this as WILL_NOT_FIX?  I mean, we could certainly #ifdef code sections for just those few people who are bitten by this, but it seems almost silly given the presumably small userbase.  I'm already used to hand-patching every kernel I compile, so it can't get any worse for me.
Comment 13 Jesse Barnes 2008-03-14 12:02:51 UTC
Kris, does this still happen with more recent versions of X, specifically the 1.5 pre-releases?  We ripped out much of the PCI code in that version...
Comment 14 Kris Karas 2008-03-14 14:32:37 UTC
Good question, Jesse...
OK, I took some time to hack my box into an amalgam between xorg 6.9.0 and 1.3 and tried with a vanilla kernel.  As Alan Cox had surmised, the bug no longer appears.  (Tested against a vanilla kernel 2.6.24.3 and then Slackware's xorg-server-1.3.0.0 from the slackware-12.0 distro.)

So I'm going to take the liberty of closing this bug with Will-Not-Fix; it seems pointless to keep it open given how little impact it seems to have amongst the userbase...

Kris

Note You need to log in before you can comment on or make changes to this bug.