Bug 20432

Summary: NVIDIA GPU doesn't work if loaded at last after all other device drivers have already been loaded
Product: Drivers Reporter: Artem S. Tashkinov (aros)
Component: Video(DRI - non Intel)Assignee: drivers_video-dri
Status: RESOLVED INVALID    
Severity: high CC: alan, andi-bz, bmaly, dmitry.torokhov, lenb, rui.zhang
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 2.6.36 Subsystem:
Regression: Yes Bisected commit-id:
Bug Depends on:    
Bug Blocks: 56331    
Attachments: My 2.6.36-rc8 .config-uration and miscellaneous info
pci=biosirq dmesg where I cannot use NVIDIA GPU
dmesg with no options and NVIDIA GPU *working* normally

Description Artem S. Tashkinov 2010-10-17 07:42:16 UTC
Created attachment 33782 [details]
My 2.6.36-rc8 .config-uration and miscellaneous info

I haven't been running NVIDIA GPU on this computer for almost a year, but I do remember that this option wasn't necessary last time I ran this video accelerator.

Without this option my GPU doesn't get assigned any interrupts and I'm met with this message upon trying to run X server:

(EE) NVIDIA(0): The NVIDIA kernel module does not appear to be receiving
(EE) NVIDIA(0):     interrupts generated by the NVIDIA graphics device
(EE) NVIDIA(0):     PCI:5:0:0.  Please see Chapter 8: Common Problems in the
(EE) NVIDIA(0):     README for additional information.
(EE) NVIDIA(0): Failed to initialize the NVIDIA graphics device!

/proc/interrupts inspection reveals that indeed NVIDIA is not listed there.
Comment 1 Artem S. Tashkinov 2010-10-17 12:10:10 UTC
Created attachment 33832 [details]
pci=biosirq dmesg where I cannot use NVIDIA GPU

Quite randomly I can reproduce the problem even with pci=biosirq option.

I'm gonna try running with irqpoll.
Comment 2 Artem S. Tashkinov 2010-10-17 12:22:10 UTC
Created attachment 33842 [details]
dmesg with no options and NVIDIA GPU *working* normally

It doesn't really matter whether I use pci=biosirq or irqpoll options.

It only matters in which order modules and devices are being initialized.

If NVIDIA GPU is to be initialized at last it will certainly fail. If NVIDIA module loads before some other devices it runs OK without any special options.
Comment 3 Dmitry Torokhov 2010-10-18 23:49:26 UTC
This issue is better reported to NVIDIA developers.
Comment 4 Zhang Rui 2010-10-19 01:43:15 UTC
why can I always find this bug when search ACPI bugs?
And it shows this is still an ACPI config bug...
Comment 5 Artem S. Tashkinov 2010-10-19 07:58:21 UTC
(In reply to comment #3)
> This issue is better reported to NVIDIA developers.

I'm not sure they will respond (positively):

Here's what their documentation says on this matter:
___________
 My X server fails to start, and my X log file contains the error:

 (EE) NVIDIA(0): The NVIDIA kernel module does not appear to
 (EE) NVIDIA(0):      be receiving interrupts generated by the NVIDIA graphics
 (EE) NVIDIA(0):      device PCI:x:x:x. Please see the COMMON PROBLEMS
 (EE) NVIDIA(0):      section in the README for additional information.
 This can be caused by a variety of problems, such as PCI IRQ routing errors,
 I/O APIC problems or conflicts with other devices sharing the IRQ (or their
 drivers).

 If possible, configure your system such that your graphics card does not share
 its IRQ with other devices (try moving the graphics card to another slot if
 applicable, unload/disable the driver(s) for the device(s) sharing the card's
 IRQ, or remove/disable the device(s)).

 Depending on the nature of the problem, one of (or a combination of) these
 kernel parameters might also help:

  Parameter                        Behavior
 pci=noacpi  don't use ACPI for PCI IRQ routing
 pci=biosirq use PCI BIOS calls to retrieve the IRQ routing table
 noapic      don't use I/O APICs present in the system
 acpi=off    disable ACPI
___________
In fact options listed as possible solutions don't help.

I tend to think it's a Linux kernel problem, because when I load nvidia.ko module it doesn't get listed in /proc/interrupts.
Comment 6 Alan 2010-10-19 11:01:25 UTC
It's a non free driver only they have th e source to all the parts so only they can debug it
Comment 7 Artem S. Tashkinov 2010-10-19 17:17:11 UTC
Aaron Plattner said I just should disable MSI as it's "problematic": "I talked to our kernel guy and he said that MSI is notoriously problematic throughout the hardware and software stack, and he recommended that you just stick with traditional interrupts."

OK, it seems like I won't get any help from any side of this bug. Probably sharing interrupts isn't that bad idea in 2010.