Bug 8472

Summary: atl1 module APIC error when MSI enabled in kernel 2.6.21
Product: Drivers Reporter: Gregory Krzystek (ninex)
Component: NetworkAssignee: Tejun Heo (htejun)
Status: CLOSED CODE_FIX    
Severity: high CC: dj, jacliburn
Priority: P2    
Hardware: i386   
OS: Linux   
Kernel Version: 2.6.21 Subsystem:
Regression: --- Bisected commit-id:
Attachments: .config
dmesg
lspci
lsmod

Description Gregory Krzystek 2007-05-12 14:46:13 UTC
Distribution:Gentoo/amd64
Problem Description: when try to up eth0 gernel produces errors:

after /etc/init.d/net.eth0 start
i can't establish any connection
in /var/log/messages i see:
May 13 19:16:49 localhost APIC error on CPU0: 04(08)
May 13 19:16:49 localhost atl1: eth0 link is up 100 Mbps full duplex
May 13 19:16:49 localhost APIC error on CPU0: 08(08)
May 13 19:16:49 localhost APIC error on CPU0: 08(08)
...
sometimes system rebooting without any error in logs
there is workaround:
add pci=nomsi into kernel boot comand line.
Comment 1 Gregory Krzystek 2007-05-12 14:47:56 UTC
Created attachment 11484 [details]
.config
Comment 2 Gregory Krzystek 2007-05-12 14:48:32 UTC
Created attachment 11485 [details]
dmesg
Comment 3 Gregory Krzystek 2007-05-12 14:49:22 UTC
Created attachment 11486 [details]
lspci
Comment 4 Gregory Krzystek 2007-05-12 14:50:27 UTC
Created attachment 11487 [details]
lsmod
Comment 5 Gregory Krzystek 2007-05-12 14:51:19 UTC
My motherboard is Asus M2V
Comment 6 Eric W. Biederman 2007-05-13 00:00:43 UTC
Andrew Morton <akpm@linux-foundation.org> writes:

> On Sat, 12 May 2007 14:46:17 -0700 bugme-daemon@bugzilla.kernel.org wrote:
>
>> http://bugzilla.kernel.org/show_bug.cgi?id=8472
>> 
>>            Summary: atl1 module APIC error when MSI enabled in kernel 2.6.21
>>     Kernel Version: 2.6.21
>>             Status: NEW
>>           Severity: high
>>              Owner: jgarzik@pobox.com
>>          Submitter: ninex@o2.pl
>> 
>> 
>> Distribution:Gentoo/amd64
>> Problem Description: when try to up eth0 gernel produces errors:
>> 
>> after /etc/init.d/net.eth0 start
>> i can't establish any connection
>> in /var/log/messages i see:
>> May 13 19:16:49 localhost APIC error on CPU0: 04(08)
>> May 13 19:16:49 localhost atl1: eth0 link is up 100 Mbps full duplex
>> May 13 19:16:49 localhost APIC error on CPU0: 08(08)
>> May 13 19:16:49 localhost APIC error on CPU0: 08(08)
>> ...
>> sometimes system rebooting without any error in logs
>> there is workaround:
>> add pci=nomsi into kernel boot comand line.

Interesting...

> argh, MSI again.  Seems more trouble than it's worth.


> Eric, Andi: do we have anything in the pipeline which is likely to
> address this?

Not exactly.  I actually haven't heard of this symptom before.  If
I can ever log into bugzilla.kernel.org and get the full bug report I
might have to see if I can understand what hardware is at work.

The practical problem (I think) is that we assume that msi works and
then go about blacklisting things.  Instead of just enabling MSI when
the hardware supports it and is setup properly.

Although this instance actually looks like msi is at least half
working so it may be something generic going wrong.

I'm really confused because at least in Intel's documentation of
the local apic the reported error only ever happens if you are using
the old serial apic bus, which nothing has used since the P4 and the
K8 were introduced.  All current products are front side bus based.

Eric

Comment 7 Eric W. Biederman 2007-05-13 01:27:28 UTC
Andrew Morton <akpm@linux-foundation.org> writes:

> On Sun, 13 May 2007 00:59:10 -0600 ebiederm@xmission.com (Eric W. Biederman)
> wrote:
>
>> If
>> I can ever log into bugzilla.kernel.org and get the full bug report I
>> might have to see if I can understand what hardware is at work.
>
> I think the DNS thingy broke.  bugme.osdl.org works.

Thanks.

Of the full boot trace this appears to be the interesting bit.

> Attansic L1 Ethernet Network Driver - version 2.0.7
> Copyright(c) 2005-2006 Attansic Corporation.
> ACPI: PCI Interrupt 0000:04:00.0[A] -> GSI 36 (level, low) -> IRQ 36
> PCI: Setting latency timer of device 0000:04:00.0 to 64
> APIC error on CPU0: 04(08)
> APIC error on CPU0: 08(08)
> APIC error on CPU0: 08(08)
> APIC error on CPU0: 08(08)
> APIC error on CPU0: 08(08)
> APIC error on CPU0: 08(08)
> APIC error on CPU0: 08(08)
> APIC error on CPU0: 08(08)
> APIC error on CPU0: 08(08)
> APIC error on CPU0: 08(08)
> do_IRQ: 0.213 No irq handler for vector
> Uhhuh. NMI received for unknown reason 3c.
> Do you have a strange power saving mode enabled?
> Dazed and confused, but trying to continue
> APIC error on CPU0: 08(08)

The no irq handler for vector is worrisome, 

The atl1 network driver appears new in 2.6.21.  Has anyone gotten msi
working on this network adapter, and in which situations?

The chipset is a via hypertransport chipset and it does have a msi
mapping capability.  If I have decoded the bus layout properly we
don't have a msi to hypertransport interrupt mapping capability on the
direct path from this pci-express nic to the hypertransport bus.
We seem to have the mapping only for bus 0x80 and 0x07.  Which
could explain why msi doesn't work on this hardware.

IRQ 32 is our non msi irq number.
We don't print the msi irq number.

Since I can't see how a msi would properly transfer I wonder if VIA
saw that their msi mapping capability was borked, and disabled it,
and we are seeing how it is borked when we try and use it
unconditionally.

Does this system have a different pci-express slot that could
correspond to the mysterious bus 7?

Eric

Comment 8 Jay Cliburn 2007-05-13 05:19:59 UTC
Andrew Morton wrote:
> (add Jay to cc)

Thanks Andrew.

I've been whining about this problem for some time now.

http://marc.info/?l=linux-netdev&m=117469855329113&w=2

I'm *really* happy to have some help!  :)  Thank you

> 
> On Sun, 13 May 2007 02:25:56 -0600 ebiederm@xmission.com (Eric W. Biederman) wrote:
> 
>> Andrew Morton <akpm@linux-foundation.org> writes:
>>
>>> On Sun, 13 May 2007 00:59:10 -0600 ebiederm@xmission.com (Eric W. Biederman)
>>> wrote:
>>>
>>>> If
>>>> I can ever log into bugzilla.kernel.org and get the full bug report I
>>>> might have to see if I can understand what hardware is at work.
>>> I think the DNS thingy broke.  bugme.osdl.org works.
>> Thanks.
>>
>> Of the full boot trace this appears to be the interesting bit.
>>
>>> Attansic L1 Ethernet Network Driver - version 2.0.7
>>> Copyright(c) 2005-2006 Attansic Corporation.
>>> ACPI: PCI Interrupt 0000:04:00.0[A] -> GSI 36 (level, low) -> IRQ 36
>>> PCI: Setting latency timer of device 0000:04:00.0 to 64
>>> APIC error on CPU0: 04(08)
>>> APIC error on CPU0: 08(08)
>>> APIC error on CPU0: 08(08)
>>> APIC error on CPU0: 08(08)
>>> APIC error on CPU0: 08(08)
>>> APIC error on CPU0: 08(08)
>>> APIC error on CPU0: 08(08)
>>> APIC error on CPU0: 08(08)
>>> APIC error on CPU0: 08(08)
>>> APIC error on CPU0: 08(08)
>>> do_IRQ: 0.213 No irq handler for vector
>>> Uhhuh. NMI received for unknown reason 3c.
>>> Do you have a strange power saving mode enabled?
>>> Dazed and confused, but trying to continue
>>> APIC error on CPU0: 08(08)
>> The no irq handler for vector is worrisome, 
>>
>> The atl1 network driver appears new in 2.6.21.  Has anyone gotten msi
>> working on this network adapter, and in which situations?

Yes, MSI on this network adapter works flawlessly on my Intel-based ASUS P5B-MX 
board.  Luca Tattamanti has a P5B-E board, and MSI works for him there, too.  I 
also have an Asus M2V (like the reporter in the instant case), and MSI works for 
me there so long as I stick with a 64-bit kernel.  I get the APIC error only 
when I run a 32-bit kernel.  However, I do have one user report of the error on 
the M2V under a Debian x86_64 kernel, so I might just be lucky in having avoided 
it for so long.

>>
>> The chipset is a via hypertransport chipset and it does have a msi
>> mapping capability.  If I have decoded the bus layout properly we
>> don't have a msi to hypertransport interrupt mapping capability on the
>> direct path from this pci-express nic to the hypertransport bus.
>> We seem to have the mapping only for bus 0x80 and 0x07.  Which
>> could explain why msi doesn't work on this hardware.
>>
>> IRQ 32 is our non msi irq number.
>> We don't print the msi irq number.
>>
>> Since I can't see how a msi would properly transfer I wonder if VIA
>> saw that their msi mapping capability was borked, and disabled it,
>> and we are seeing how it is borked when we try and use it
>> unconditionally.
>>
>> Does this system have a different pci-express slot that could
>> correspond to the mysterious bus 7?

The only other PCIe device on this board is the video adapter.

>>
>> Eric
> 
> 

I've pretty much concluded that MSI is broken on this board, and I tested a pci 
quirk patch last night that simply turns it off globally when it discovers the 
VIA VT3351 bridge.  I was going to submit it atop Tejun's similar patch here: 
http://lkml.org/lkml/2007/5/9/213.

If there's something else I need to try, please let me know.

Jay




Comment 9 Anonymous Emailer 2007-05-13 05:33:17 UTC
Reply-To: ninex@NineX.eu.org



Jay Cliburn pisze:
> Andrew Morton wrote:
>> (add Jay to cc)
>
> Thanks Andrew.
>
> I've been whining about this problem for some time now.
>
> http://marc.info/?l=linux-netdev&m=117469855329113&w=2
>
> I'm *really* happy to have some help!  :)  Thank you
>
>>
>> On Sun, 13 May 2007 02:25:56 -0600 ebiederm@xmission.com (Eric W.
>> Biederman) wrote:
>>
>>> Andrew Morton <akpm@linux-foundation.org> writes:
>>>
>>>> On Sun, 13 May 2007 00:59:10 -0600 ebiederm@xmission.com (Eric W.
>>>> Biederman)
>>>> wrote:
>>>>
>>>>> If
>>>>> I can ever log into bugzilla.kernel.org and get the full bug report I
>>>>> might have to see if I can understand what hardware is at work.
>>>> I think the DNS thingy broke.  bugme.osdl.org works.
>>> Thanks.
>>>
>>> Of the full boot trace this appears to be the interesting bit.
>>>
>>>> Attansic L1 Ethernet Network Driver - version 2.0.7
>>>> Copyright(c) 2005-2006 Attansic Corporation.
>>>> ACPI: PCI Interrupt 0000:04:00.0[A] -> GSI 36 (level, low) -> IRQ 36
>>>> PCI: Setting latency timer of device 0000:04:00.0 to 64
>>>> APIC error on CPU0: 04(08)
>>>> APIC error on CPU0: 08(08)
>>>> APIC error on CPU0: 08(08)
>>>> APIC error on CPU0: 08(08)
>>>> APIC error on CPU0: 08(08)
>>>> APIC error on CPU0: 08(08)
>>>> APIC error on CPU0: 08(08)
>>>> APIC error on CPU0: 08(08)
>>>> APIC error on CPU0: 08(08)
>>>> APIC error on CPU0: 08(08)
>>>> do_IRQ: 0.213 No irq handler for vector
>>>> Uhhuh. NMI received for unknown reason 3c.
>>>> Do you have a strange power saving mode enabled?
>>>> Dazed and confused, but trying to continue
>>>> APIC error on CPU0: 08(08)
>>> The no irq handler for vector is worrisome,
>>> The atl1 network driver appears new in 2.6.21.  Has anyone gotten msi
>>> working on this network adapter, and in which situations?
>
> Yes, MSI on this network adapter works flawlessly on my Intel-based
> ASUS P5B-MX board.  Luca Tattamanti has a P5B-E board, and MSI works
> for him there, too.  I also have an Asus M2V (like the reporter in the
> instant case), and MSI works for me there so long as I stick with a
> 64-bit kernel.  I get the APIC error only when I run a 32-bit kernel. 
> However, I do have one user report of the error on the M2V under a
> Debian x86_64 kernel, so I might just be lucky in having avoided it
> for so long.

>>> The chipset is a via hypertransport chipset and it does have a msi
>>> mapping capability.  If I have decoded the bus layout properly we
>>> don't have a msi to hypertransport interrupt mapping capability on the
>>> direct path from this pci-express nic to the hypertransport bus.
>>> We seem to have the mapping only for bus 0x80 and 0x07.  Which
>>> could explain why msi doesn't work on this hardware.
>>>
>>> IRQ 32 is our non msi irq number.
>>> We don't print the msi irq number.
>>>
>>> Since I can't see how a msi would properly transfer I wonder if VIA
>>> saw that their msi mapping capability was borked, and disabled it,
>>> and we are seeing how it is borked when we try and use it
>>> unconditionally.
>>>
>>> Does this system have a different pci-express slot that could
>>> correspond to the mysterious bus 7?
>
> The only other PCIe device on this board is the video adapter.
>
>>>
>>> Eric
>>
>>
>
> I've pretty much concluded that MSI is broken on this board, and I
> tested a pci quirk patch last night that simply turns it off globally
> when it discovers the VIA VT3351 bridge.  I was going to submit it
> atop Tejun's similar patch here: http://lkml.org/lkml/2007/5/9/213.
>
> If there's something else I need to try, please let me know.
hmm  but i see this bug when running 64bit kernel....
> Jay
>
>
>
>
>

-- 



GRZEGORZ {NineX} KRZYSTEK
NineX Inc.
ninex@ninex.eu.org
Krak
Comment 10 Jay Cliburn 2007-05-13 05:44:52 UTC
Grzegorz Krzystek wrote:

> hmm  but i see this bug when running 64bit kernel....

Then you're the second person to see it under x86_64.  Congratulations!  :)

I vote for quirking MSI off on this board, unless Andi, Eric, and Andrew have 
alternate ideas.

Comment 11 Eric W. Biederman 2007-05-13 08:27:19 UTC
Jay Cliburn <jacliburn@bellsouth.net> writes:

> Grzegorz Krzystek wrote:
>
>> hmm  but i see this bug when running 64bit kernel....
>
> Then you're the second person to see it under x86_64.  Congratulations!  :)

Interesting.  Is the failure mode really apic errors on all kernels?

The apic error feels like we tried to send an invalid apic interrupt
to ourselves, and things croaked.

> I vote for quirking MSI off on this board, unless Andi, Eric, and Andrew have
> alternate ideas.

I have a hypothesis that the appropriate msi mapping capability is 
simply programmed wrong.  If that is the case we can really fix
this issue.

On the non-failing instances of this board can we use lspci
to find the working msi mapping capability that is on the path
between the pci-express bus and the upstream hypertransport bus.
Can we then get a complete register dump of the msi-mapping capability.

Can we then please repeat the process on a failing instance of
the board in question.

Can we also please compare the pci revision fields in the chipset
between working and non-working versions of this chipset.

I'm also curious about these apic errors, and the "no irq for vector"
error that happened on x86_64.  Even the bit about trigger an NMI 
I'm curious about.  This suggests we may actually have bad hardware
somewhere in the mix that we need to weed out.

So getting details on other similar failures would also be
interesting.

If we are actually going to spend some time on this I am inclined to
figure out how our quirk layer works for x86 msi and turn it inside
out.  Defaulting it to off.   With a quirk that says if you are an
intel pci-express chipset turn it on, and with another quirk that
if you are a properly setup msi mapping capability turn it on.
It can't be that hard and at least then it will be safe to enable
MSI support in the kernel by default.

The practical question for me is can we use sweeping generalizations
(like the presence of the standard hypertransport msi mapping
capability) when turning things on, or do we need to go chipset by
chipset?  Or can some combination of both work.

Andrew my current feeling on this part of the MSI code is that we
are currently violating our current best practices for enabling
a hardware capability.  We try and use MSI even without knowing
it works, instead of enabling MSI only in know good configurations.
As far as I can tell we have never seriously tried a MSI whitelist
approach.  Now that the MSI core is pretty solid I think I can spare a
cycle or two so we can straighten out the MSI enable criteria, at
which point MSI should be much less hassle. 

Eric

Comment 12 Jay Cliburn 2007-05-13 18:45:24 UTC
On Sun, 13 May 2007 09:25:41 -0600
ebiederm@xmission.com (Eric W. Biederman) wrote:

> Jay Cliburn <jacliburn@bellsouth.net> writes:
> 
> > Grzegorz Krzystek wrote:
> >
> >> hmm  but i see this bug when running 64bit kernel....
> >
> > Then you're the second person to see it under x86_64.
> > Congratulations!  :)
> 
> Interesting.  Is the failure mode really apic errors on all kernels?

I have results from a working x86_64 kernel.  Again, unlike Grzegorz, it
works for me under x86_64, but fails under i386 kernels.

> On the non-failing instances of this board can we use lspci
> to find the working msi mapping capability that is on the path
> between the pci-express bus and the upstream hypertransport bus.

Here's a pointer to the files for working and non-working instances:

x86_64 with working MSI:
ftp://ftp.hogchain.net/pub/linux/m2v/apic-problem/m2v-x86_64-msi-working/
m2v-x86_64-dmesg.txt
m2v-x86_64-dmidecode.txt
m2v-x86_64-lsmod.txt
m2v-x86_64-lspci.txt
m2v-x86_64-ping.txt
m2v-x86_64-proc-interrupts.txt
m2v-x86_64-uname.txt

i386 with failing MSI:
ftp://ftp.hogchain.net/pub/linux/m2v/apic-problem/m2v-i386-msi-failing/
m2v-i386-dmesg.txt
m2v-i386-dmidecode.txt
m2v-i386-lsmod.txt
m2v-i386-lspci.txt
m2v-i386-proc-interrupts.txt
m2v-i386-uname.txt

> Can we then get a complete register dump of the msi-mapping
> capability.

How do I obtain a register dump?

> 
> Can we then please repeat the process on a failing instance of
> the board in question.
> 
> Can we also please compare the pci revision fields in the chipset
> between working and non-working versions of this chipset.

How do I obtain the pci revision fields?

> 
> I'm also curious about these apic errors, and the "no irq for vector"
> error that happened on x86_64.  Even the bit about trigger an NMI 
> I'm curious about.  This suggests we may actually have bad hardware
> somewhere in the mix that we need to weed out.
> 
> So getting details on other similar failures would also be
> interesting.

Specifically, what additional info do you need?  I'll be glad to get it.

One interesting tidbit here is the comparison of /proc/interrupts
between the x86_64 (working) kernel and the i386 (non-working) kernel.
The x86_64 version shows the MSI mapping, but the i386 doesn't; eth0
doesn't show up at all under i386 (with MSI enabled). The atl1
module is definitely loaded, but the network isn't started (because it
kills the box).

[jcliburn@osprey ~]$ cat m2v-x86_64-proc-interrupts.txt 
           CPU0       CPU1       
  0:     391098          0   IO-APIC-edge      timer
  1:         27        371   IO-APIC-edge      i8042
  6:          5          0   IO-APIC-edge      floppy
  7:          0          0   IO-APIC-edge      parport0
  8:          0          0   IO-APIC-edge      rtc
  9:          0          0   IO-APIC-fasteoi   acpi
 12:          4          0   IO-APIC-edge      i8042
 14:       2521      16075   IO-APIC-edge      libata
 15:          0          0   IO-APIC-edge      libata
 17:        195          0   IO-APIC-fasteoi   HDA Intel
 20:          0          0   IO-APIC-fasteoi   uhci_hcd:usb1
 21:        138        104   IO-APIC-fasteoi   libata, ehci_hcd:usb2, uhci_hcd:usb4
 22:         33        263   IO-APIC-fasteoi   uhci_hcd:usb3
 23:          0          0   IO-APIC-fasteoi   uhci_hcd:usb5
2298:         68        120   PCI-MSI-edge      eth0
NMI:          0          0 
LOC:     390899     390790 
ERR:          0

[jcliburn@osprey ~]$ cat m2v-i386-proc-interrupts.txt 
           CPU0       CPU1       
  0:        260          0   IO-APIC-edge      timer
  1:          5        692   IO-APIC-edge      i8042
  4:          0         10   IO-APIC-edge      serial
  6:          1          4   IO-APIC-edge      floppy
  7:          0          0   IO-APIC-edge      parport0
  8:          0          1   IO-APIC-edge      rtc
  9:          0          0   IO-APIC-fasteoi   acpi
 12:          1          3   IO-APIC-edge      i8042
 14:       1981         47   IO-APIC-edge      ide0
 21:          0          0   IO-APIC-fasteoi   uhci_hcd:usb1
 22:        489         30   IO-APIC-fasteoi   uhci_hcd:usb2
 23:         21      13873   IO-APIC-fasteoi   uhci_hcd:usb3, ehci_hcd:usb5, sata_via
 24:          0          0   IO-APIC-fasteoi   uhci_hcd:usb4
 26:          0        166   IO-APIC-fasteoi   HDA Intel
NMI:          0          0 
LOC:     253317     249720 
ERR:          0
MIS:          0

Jay

Comment 13 Jay Cliburn 2007-05-13 20:04:57 UTC
On Sun, 13 May 2007 20:44:07 -0500
Jay Cliburn <jacliburn@bellsouth.net> wrote:

> One interesting tidbit here is the comparison of /proc/interrupts
> between the x86_64 (working) kernel and the i386 (non-working) kernel.

Here's another interesting thing...  The attached png file shows the
lspci outputs for the L1 driver side-by-side.  The working version is on
the left, and the non-working version is on the right.

Note the Address line; it's all zeroes on the non-working side.  That
doesn't seem right.
Comment 14 Anonymous Emailer 2007-05-13 22:59:08 UTC
Reply-To: ninex@NineX.eu.org



Jay Cliburn pisze:
> On Sun, 13 May 2007 09:25:41 -0600
> ebiederm@xmission.com (Eric W. Biederman) wrote:
>
>   
>> Jay Cliburn <jacliburn@bellsouth.net> writes:
>>
>>     
>>> Grzegorz Krzystek wrote:
>>>
>>>       
>>>> hmm  but i see this bug when running 64bit kernel....
>>>>         
>>> Then you're the second person to see it under x86_64.
>>> Congratulations!  :)
>>>       
>> Interesting.  Is the failure mode really apic errors on all kernels?
>>     
>
> I have results from a working x86_64 kernel.  Again, unlike Grzegorz, it
> works for me under x86_64, but fails under i386 kernels.
>   
so maybe there is something in bios settings/version ???
i have latest version on my board
can you make bios settings profile dump via asus a.o.c profile?
if yes i will trye to use tour profile and check if that works...
>   
>> On the non-failing instances of this board can we use lspci
>> to find the working msi mapping capability that is on the path
>> between the pci-express bus and the upstream hypertransport bus.
>>     
>
> Here's a pointer to the files for working and non-working instances:
>
> x86_64 with working MSI:
> ftp://ftp.hogchain.net/pub/linux/m2v/apic-problem/m2v-x86_64-msi-working/
> m2v-x86_64-dmesg.txt
> m2v-x86_64-dmidecode.txt
> m2v-x86_64-lsmod.txt
> m2v-x86_64-lspci.txt
> m2v-x86_64-ping.txt
> m2v-x86_64-proc-interrupts.txt
> m2v-x86_64-uname.txt
>
> i386 with failing MSI:
> ftp://ftp.hogchain.net/pub/linux/m2v/apic-problem/m2v-i386-msi-failing/
> m2v-i386-dmesg.txt
> m2v-i386-dmidecode.txt
> m2v-i386-lsmod.txt
> m2v-i386-lspci.txt
> m2v-i386-proc-interrupts.txt
> m2v-i386-uname.txt
>
>   
>> Can we then get a complete register dump of the msi-mapping
>> capability.
>>     
>
> How do I obtain a register dump?
>
>   
>> Can we then please repeat the process on a failing instance of
>> the board in question.
>>
>> Can we also please compare the pci revision fields in the chipset
>> between working and non-working versions of this chipset.
>>     
>
> How do I obtain the pci revision fields?
>
>   
>> I'm also curious about these apic errors, and the "no irq for vector"
>> error that happened on x86_64.  Even the bit about trigger an NMI 
>> I'm curious about.  This suggests we may actually have bad hardware
>> somewhere in the mix that we need to weed out.
>>
>> So getting details on other similar failures would also be
>> interesting.
>>     
>
> Specifically, what additional info do you need?  I'll be glad to get it.
>
> One interesting tidbit here is the comparison of /proc/interrupts
> between the x86_64 (working) kernel and the i386 (non-working) kernel.
> The x86_64 version shows the MSI mapping, but the i386 doesn't; eth0
> doesn't show up at all under i386 (with MSI enabled). The atl1
> module is definitely loaded, but the network isn't started (because it
> kills the box).
>
> [jcliburn@osprey ~]$ cat m2v-x86_64-proc-interrupts.txt 
>            CPU0       CPU1       
>   0:     391098          0   IO-APIC-edge      timer
>   1:         27        371   IO-APIC-edge      i8042
>   6:          5          0   IO-APIC-edge      floppy
>   7:          0          0   IO-APIC-edge      parport0
>   8:          0          0   IO-APIC-edge      rtc
>   9:          0          0   IO-APIC-fasteoi   acpi
>  12:          4          0   IO-APIC-edge      i8042
>  14:       2521      16075   IO-APIC-edge      libata
>  15:          0          0   IO-APIC-edge      libata
>  17:        195          0   IO-APIC-fasteoi   HDA Intel
>  20:          0          0   IO-APIC-fasteoi   uhci_hcd:usb1
>  21:        138        104   IO-APIC-fasteoi   libata, ehci_hcd:usb2, uhci_hcd:usb4
>  22:         33        263   IO-APIC-fasteoi   uhci_hcd:usb3
>  23:          0          0   IO-APIC-fasteoi   uhci_hcd:usb5
> 2298:         68        120   PCI-MSI-edge      eth0
> NMI:          0          0 
> LOC:     390899     390790 
> ERR:          0
>
> [jcliburn@osprey ~]$ cat m2v-i386-proc-interrupts.txt 
>            CPU0       CPU1       
>   0:        260          0   IO-APIC-edge      timer
>   1:          5        692   IO-APIC-edge      i8042
>   4:          0         10   IO-APIC-edge      serial
>   6:          1          4   IO-APIC-edge      floppy
>   7:          0          0   IO-APIC-edge      parport0
>   8:          0          1   IO-APIC-edge      rtc
>   9:          0          0   IO-APIC-fasteoi   acpi
>  12:          1          3   IO-APIC-edge      i8042
>  14:       1981         47   IO-APIC-edge      ide0
>  21:          0          0   IO-APIC-fasteoi   uhci_hcd:usb1
>  22:        489         30   IO-APIC-fasteoi   uhci_hcd:usb2
>  23:         21      13873   IO-APIC-fasteoi   uhci_hcd:usb3, ehci_hcd:usb5, sata_via
>  24:          0          0   IO-APIC-fasteoi   uhci_hcd:usb4
>  26:          0        166   IO-APIC-fasteoi   HDA Intel
> NMI:          0          0 
> LOC:     253317     249720 
> ERR:          0
> MIS:          0
>
> Jay
>
>   

-- 



GRZEGORZ {NineX} KRZYSTEK
NineX Inc.
ninex@ninex.eu.org
Krak
Comment 15 Anonymous Emailer 2007-05-14 00:16:26 UTC
Reply-To: ninex@NineX.eu.org



bugme-daemon@bugzilla.kernel.org pisze:
> http://bugzilla.kernel.org/show_bug.cgi?id=8472
>
>
>
>
>
> ------- Additional Comments From anonymous@kernel-bugs.osdl.org  2007-05-13 22:59 -------
> Reply-To: ninex@NineX.eu.org
>
>
>
> Jay Cliburn pisze:
>   
>> On Sun, 13 May 2007 09:25:41 -0600
>> ebiederm@xmission.com (Eric W. Biederman) wrote:
>>
>>   
>>     
>>> Jay Cliburn <jacliburn@bellsouth.net> writes:
>>>
>>>     
>>>       
>>>> Grzegorz Krzystek wrote:
>>>>
>>>>       
>>>>         
>>>>> hmm  but i see this bug when running 64bit kernel....
>>>>>         
>>>>>           
>>>> Then you're the second person to see it under x86_64.
>>>> Congratulations!  :)
>>>>       
>>>>         
>>> Interesting.  Is the failure mode really apic errors on all kernels?
>>>     
>>>       
>> I have results from a working x86_64 kernel.  Again, unlike Grzegorz, it
>> works for me under x86_64, but fails under i386 kernels.
>>   
>>     
> so maybe there is something in bios settings/version ???
> i have latest version on my board
> can you make bios settings profile dump via asus a.o.c profile?
> if yes i will trye to use tour profile and check if that works...
>   
>>   
>>     
>>> On the non-failing instances of this board can we use lspci
>>> to find the working msi mapping capability that is on the path
>>> between the pci-express bus and the upstream hypertransport bus.
>>>     
>>>       
>> Here's a pointer to the files for working and non-working instances:
>>
>> x86_64 with working MSI:
>> ftp://ftp.hogchain.net/pub/linux/m2v/apic-problem/m2v-x86_64-msi-working/
>> m2v-x86_64-dmesg.txt
>> m2v-x86_64-dmidecode.txt
>> m2v-x86_64-lsmod.txt
>> m2v-x86_64-lspci.txt
>> m2v-x86_64-ping.txt
>> m2v-x86_64-proc-interrupts.txt
>> m2v-x86_64-uname.txt
>>     
ant take a consideration that MSI/APIC error apers when you try to up
eth interface, not when driver is loading... so let thse guy who created
this logs let boot on x86_64 kernel and let they try to up interface and
see to logs what hapend...
>> i386 with failing MSI:
>> ftp://ftp.hogchain.net/pub/linux/m2v/apic-problem/m2v-i386-msi-failing/
>> m2v-i386-dmesg.txt
>> m2v-i386-dmidecode.txt
>> m2v-i386-lsmod.txt
>> m2v-i386-lspci.txt
>> m2v-i386-proc-interrupts.txt
>> m2v-i386-uname.txt
>>
>>   
>>     
>>> Can we then get a complete register dump of the msi-mapping
>>> capability.
>>>     
>>>       
>> How do I obtain a register dump?
>>
>>   
>>     
>>> Can we then please repeat the process on a failing instance of
>>> the board in question.
>>>
>>> Can we also please compare the pci revision fields in the chipset
>>> between working and non-working versions of this chipset.
>>>     
>>>       
>> How do I obtain the pci revision fields?
>>
>>   
>>     
>>> I'm also curious about these apic errors, and the "no irq for vector"
>>> error that happened on x86_64.  Even the bit about trigger an NMI 
>>> I'm curious about.  This suggests we may actually have bad hardware
>>> somewhere in the mix that we need to weed out.
>>>
>>> So getting details on other similar failures would also be
>>> interesting.
>>>     
>>>       
>> Specifically, what additional info do you need?  I'll be glad to get it.
>>
>> One interesting tidbit here is the comparison of /proc/interrupts
>> between the x86_64 (working) kernel and the i386 (non-working) kernel.
>> The x86_64 version shows the MSI mapping, but the i386 doesn't; eth0
>> doesn't show up at all under i386 (with MSI enabled). The atl1
>> module is definitely loaded, but the network isn't started (because it
>> kills the box).
>>
>> [jcliburn@osprey ~]$ cat m2v-x86_64-proc-interrupts.txt 
>>            CPU0       CPU1       
>>   0:     391098          0   IO-APIC-edge      timer
>>   1:         27        371   IO-APIC-edge      i8042
>>   6:          5          0   IO-APIC-edge      floppy
>>   7:          0          0   IO-APIC-edge      parport0
>>   8:          0          0   IO-APIC-edge      rtc
>>   9:          0          0   IO-APIC-fasteoi   acpi
>>  12:          4          0   IO-APIC-edge      i8042
>>  14:       2521      16075   IO-APIC-edge      libata
>>  15:          0          0   IO-APIC-edge      libata
>>  17:        195          0   IO-APIC-fasteoi   HDA Intel
>>  20:          0          0   IO-APIC-fasteoi   uhci_hcd:usb1
>>  21:        138        104   IO-APIC-fasteoi   libata, ehci_hcd:usb2, uhci_hcd:usb4
>>  22:         33        263   IO-APIC-fasteoi   uhci_hcd:usb3
>>  23:          0          0   IO-APIC-fasteoi   uhci_hcd:usb5
>> 2298:         68        120   PCI-MSI-edge      eth0
>> NMI:          0          0 
>> LOC:     390899     390790 
>> ERR:          0
>>
>> [jcliburn@osprey ~]$ cat m2v-i386-proc-interrupts.txt 
>>            CPU0       CPU1       
>>   0:        260          0   IO-APIC-edge      timer
>>   1:          5        692   IO-APIC-edge      i8042
>>   4:          0         10   IO-APIC-edge      serial
>>   6:          1          4   IO-APIC-edge      floppy
>>   7:          0          0   IO-APIC-edge      parport0
>>   8:          0          1   IO-APIC-edge      rtc
>>   9:          0          0   IO-APIC-fasteoi   acpi
>>  12:          1          3   IO-APIC-edge      i8042
>>  14:       1981         47   IO-APIC-edge      ide0
>>  21:          0          0   IO-APIC-fasteoi   uhci_hcd:usb1
>>  22:        489         30   IO-APIC-fasteoi   uhci_hcd:usb2
>>  23:         21      13873   IO-APIC-fasteoi   uhci_hcd:usb3, ehci_hcd:usb5, sata_via
>>  24:          0          0   IO-APIC-fasteoi   uhci_hcd:usb4
>>  26:          0        166   IO-APIC-fasteoi   HDA Intel
>> NMI:          0          0 
>> LOC:     253317     249720 
>> ERR:          0
>> MIS:          0
>>
>> Jay
>>
>>   
>>     
>
>   

-- 



GRZEGORZ {NineX} KRZYSTEK
NineX Inc.
ninex@ninex.eu.org
Kraków, Idzikowskiego 17a
tel. +48 602135796
_____________________________________________________________
The information transmitted is intended only for the person or entity to which it is addressed and may contain confidential and/or privileged material. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipient is prohibited. If you received this in error, please contact the sender and delete the material from any computer.

<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html>
<head>
  <meta content="text/html;charset=UTF-8" http-equiv="Content-Type">
</head>
<body bgcolor="#ffffff" text="#000000">
<br>
<br>
<a class="moz-txt-link-abbreviated" href="mailto:bugme-daemon@bugzilla.kernel.org">bugme-daemon@bugzilla.kernel.org</a> pisze:
<blockquote cite="mid:200705140558.l4E5w8Kt014430@fire-2.osdl.org"
 type="cite">
  <pre wrap=""><a class="moz-txt-link-freetext" href="http://bugzilla.kernel.org/show_bug.cgi?id=8472">http://bugzilla.kernel.org/show_bug.cgi?id=8472</a>





------- Additional Comments From <a class="moz-txt-link-abbreviated" href="mailto:anonymous@kernel-bugs.osdl.org">anonymous@kernel-bugs.osdl.org</a>  2007-05-13 22:59 -------
Reply-To: <a class="moz-txt-link-abbreviated" href="mailto:ninex@NineX.eu.org">ninex@NineX.eu.org</a>



Jay Cliburn pisze:
  </pre>
  <blockquote type="cite">
    <pre wrap="">On Sun, 13 May 2007 09:25:41 -0600
<a class="moz-txt-link-abbreviated" href="mailto:ebiederm@xmission.com">ebiederm@xmission.com</a> (Eric W. Biederman) wrote:

  
    </pre>
    <blockquote type="cite">
      <pre wrap="">Jay Cliburn <a class="moz-txt-link-rfc2396E" href="mailto:jacliburn@bellsouth.net">&lt;jacliburn@bellsouth.net&gt;</a> writes:

    
      </pre>
      <blockquote type="cite">
        <pre wrap="">Grzegorz Krzystek wrote:

      
        </pre>
        <blockquote type="cite">
          <pre wrap="">hmm  but i see this bug when running 64bit kernel....
        
          </pre>
        </blockquote>
        <pre wrap="">Then you're the second person to see it under x86_64.
Congratulations!  :)
      
        </pre>
      </blockquote>
      <pre wrap="">Interesting.  Is the failure mode really apic errors on all kernels?
    
      </pre>
    </blockquote>
    <pre wrap="">I have results from a working x86_64 kernel.  Again, unlike Grzegorz, it
works for me under x86_64, but fails under i386 kernels.
  
    </pre>
  </blockquote>
  <pre wrap=""><!---->so maybe there is something in bios settings/version ???
i have latest version on my board
can you make bios settings profile dump via asus a.o.c profile?
if yes i will trye to use tour profile and check if that works...
  </pre>
  <blockquote type="cite">
    <pre wrap="">  
    </pre>
    <blockquote type="cite">
      <pre wrap="">On the non-failing instances of this board can we use lspci
to find the working msi mapping capability that is on the path
between the pci-express bus and the upstream hypertransport bus.
    
      </pre>
    </blockquote>
    <pre wrap="">Here's a pointer to the files for working and non-working instances:

x86_64 with working MSI:
<a class="moz-txt-link-freetext" href="ftp://ftp.hogchain.net/pub/linux/m2v/apic-problem/m2v-x86_64-msi-working/">ftp://ftp.hogchain.net/pub/linux/m2v/apic-problem/m2v-x86_64-msi-working/</a>
m2v-x86_64-dmesg.txt
m2v-x86_64-dmidecode.txt
m2v-x86_64-lsmod.txt
m2v-x86_64-lspci.txt
m2v-x86_64-ping.txt
m2v-x86_64-proc-interrupts.txt
m2v-x86_64-uname.txt
    </pre>
  </blockquote>
</blockquote>
ant take a consideration that MSI/APIC error apers when you try to up
eth interface, not when driver is loading... so let thse guy who
created this logs let boot on x86_64 kernel and let they try to up
interface and see to logs what hapend...<br>
<blockquote cite="mid:200705140558.l4E5w8Kt014430@fire-2.osdl.org"
 type="cite">
  <blockquote type="cite">
    <pre wrap="">
i386 with failing MSI:
<a class="moz-txt-link-freetext" href="ftp://ftp.hogchain.net/pub/linux/m2v/apic-problem/m2v-i386-msi-failing/">ftp://ftp.hogchain.net/pub/linux/m2v/apic-problem/m2v-i386-msi-failing/</a>
m2v-i386-dmesg.txt
m2v-i386-dmidecode.txt
m2v-i386-lsmod.txt
m2v-i386-lspci.txt
m2v-i386-proc-interrupts.txt
m2v-i386-uname.txt

  
    </pre>
    <blockquote type="cite">
      <pre wrap="">Can we then get a complete register dump of the msi-mapping
capability.
    
      </pre>
    </blockquote>
    <pre wrap="">How do I obtain a register dump?

  
    </pre>
    <blockquote type="cite">
      <pre wrap="">Can we then please repeat the process on a failing instance of
the board in question.

Can we also please compare the pci revision fields in the chipset
between working and non-working versions of this chipset.
    
      </pre>
    </blockquote>
    <pre wrap="">How do I obtain the pci revision fields?

  
    </pre>
    <blockquote type="cite">
      <pre wrap="">I'm also curious about these apic errors, and the "no irq for vector"
error that happened on x86_64.  Even the bit about trigger an NMI 
I'm curious about.  This suggests we may actually have bad hardware
somewhere in the mix that we need to weed out.

So getting details on other similar failures would also be
interesting.
    
      </pre>
    </blockquote>
    <pre wrap="">Specifically, what additional info do you need?  I'll be glad to get it.

One interesting tidbit here is the comparison of /proc/interrupts
between the x86_64 (working) kernel and the i386 (non-working) kernel.
The x86_64 version shows the MSI mapping, but the i386 doesn't; eth0
doesn't show up at all under i386 (with MSI enabled). The atl1
module is definitely loaded, but the network isn't started (because it
kills the box).

[jcliburn@osprey ~]$ cat m2v-x86_64-proc-interrupts.txt 
           CPU0       CPU1       
  0:     391098          0   IO-APIC-edge      timer
  1:         27        371   IO-APIC-edge      i8042
  6:          5          0   IO-APIC-edge      floppy
  7:          0          0   IO-APIC-edge      parport0
  8:          0          0   IO-APIC-edge      rtc
  9:          0          0   IO-APIC-fasteoi   acpi
 12:          4          0   IO-APIC-edge      i8042
 14:       2521      16075   IO-APIC-edge      libata
 15:          0          0   IO-APIC-edge      libata
 17:        195          0   IO-APIC-fasteoi   HDA Intel
 20:          0          0   IO-APIC-fasteoi   uhci_hcd:usb1
 21:        138        104   IO-APIC-fasteoi   libata, ehci_hcd:usb2, uhci_hcd:usb4
 22:         33        263   IO-APIC-fasteoi   uhci_hcd:usb3
 23:          0          0   IO-APIC-fasteoi   uhci_hcd:usb5
2298:         68        120   PCI-MSI-edge      eth0
NMI:          0          0 
LOC:     390899     390790 
ERR:          0

[jcliburn@osprey ~]$ cat m2v-i386-proc-interrupts.txt 
           CPU0       CPU1       
  0:        260          0   IO-APIC-edge      timer
  1:          5        692   IO-APIC-edge      i8042
  4:          0         10   IO-APIC-edge      serial
  6:          1          4   IO-APIC-edge      floppy
  7:          0          0   IO-APIC-edge      parport0
  8:          0          1   IO-APIC-edge      rtc
  9:          0          0   IO-APIC-fasteoi   acpi
 12:          1          3   IO-APIC-edge      i8042
 14:       1981         47   IO-APIC-edge      ide0
 21:          0          0   IO-APIC-fasteoi   uhci_hcd:usb1
 22:        489         30   IO-APIC-fasteoi   uhci_hcd:usb2
 23:         21      13873   IO-APIC-fasteoi   uhci_hcd:usb3, ehci_hcd:usb5, sata_via
 24:          0          0   IO-APIC-fasteoi   uhci_hcd:usb4
 26:          0        166   IO-APIC-fasteoi   HDA Intel
NMI:          0          0 
LOC:     253317     249720 
ERR:          0
MIS:          0

Jay

  
    </pre>
  </blockquote>
  <pre wrap=""><!---->
  </pre>
</blockquote>
<br>
<pre class="moz-signature" DEFANGED_cols="72">-- 



GRZEGORZ {NineX} KRZYSTEK
NineX Inc.
<a class="moz-txt-link-abbreviated" href="mailto:ninex@ninex.eu.org">ninex@ninex.eu.org</a>
Kraków, Idzikowskiego 17a
tel. +48 602135796
_____________________________________________________________
The information transmitted is intended only for the person or entity to which it is addressed and may contain confidential and/or privileged material. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipient is prohibited. If you received this in error, please contact the sender and delete the material from any computer.
</pre>
</body>
</html>
Comment 16 Anonymous Emailer 2007-05-14 00:25:46 UTC
Reply-To: ninex@NineX.eu.org


>> Here's a pointer to the files for working and non-working instances:
>>
>> x86_64 with working MSI:
>> ftp://ftp.hogchain.net/pub/linux/m2v/apic-problem/m2v-x86_64-msi-working/
>> m2v-x86_64-dmesg.txt
>> m2v-x86_64-dmidecode.txt
>> m2v-x86_64-lsmod.txt
>> m2v-x86_64-lspci.txt
>> m2v-x86_64-ping.txt
>> m2v-x86_64-proc-interrupts.txt
>> m2v-x86_64-uname.txt
>>
>> i386 with failing MSI:
>> ftp://ftp.hogchain.net/pub/linux/m2v/apic-problem/m2v-i386-msi-failing/
>> m2v-i386-dmesg.txt
>> m2v-i386-dmidecode.txt
>> m2v-i386-lsmod.txt
>> m2v-i386-lspci.txt
>> m2v-i386-proc-interrupts.txt
>> m2v-i386-uname.txt
>>
>>   
please take a consideration that MSI bug apears not on boot in my case,
but when i trye tu up interface.... let some one who created logs on
x86_64 kernel trye tu up interface and ping some host... and see into
log again ....
sorry for format of last message - my thunderbird freakout ;)

Comment 17 Jay Cliburn 2007-05-14 03:35:26 UTC
On Mon, 14 May 2007 09:24:21 +0200
Grzegorz Krzystek <ninex@NineX.eu.org> wrote:

> please take a consideration that MSI bug apears not on boot in my
> case, but when i trye tu up interface.... let some one who created
> logs on x86_64 kernel trye tu up interface and ping some host... and
> see into log again ....
> sorry for format of last message - my thunderbird freakout ;)

I see the same behavior; the APIC errors start when the network
interface is brought up. The x86_64 logs were captured on my M2V system
with the interface up, and I included a ping command's output in the
list of files.

Comment 18 Eric W. Biederman 2007-05-14 04:27:43 UTC
Jay Cliburn <jacliburn@bellsouth.net> writes:

> On Mon, 14 May 2007 09:24:21 +0200
> Grzegorz Krzystek <ninex@NineX.eu.org> wrote:
>
>> please take a consideration that MSI bug apears not on boot in my
>> case, but when i trye tu up interface.... let some one who created
>> logs on x86_64 kernel trye tu up interface and ping some host... and
>> see into log again ....
>> sorry for format of last message - my thunderbird freakout ;)
>
> I see the same behavior; the APIC errors start when the network
> interface is brought up. The x86_64 logs were captured on my M2V system
> with the interface up, and I included a ping command's output in the
> list of files.

When you bring up the interface is when we call pci_enable_msi
and allocate the msi.

Eric

Comment 19 Jay Cliburn 2007-05-14 07:39:49 UTC
On Mon, 14 May 2007 05:26:00 -0600
ebiederm@xmission.com (Eric W. Biederman) wrote:

> When you bring up the interface is when we call pci_enable_msi
> and allocate the msi.

Based upon Eric's statement here, I went back to the failing case and
recaptured some files while the interface was up.  My previous data
collections were when the interface was down.

The new files are at:

ftp://ftp.hogchain.net/pub/linux/m2v/apic-problem/m2v-i386-msi-failing

In particular, /proc/interrupts now shows an interrupt assigned to the
device, whereas it didn't before because the driver wasn't started.

This is from the failing i386 instance.  APIC errors are pouring out of
the system when this file is captured.

           CPU0       CPU1       
  0:        260          0   IO-APIC-edge      timer
  1:         41        532   IO-APIC-edge      i8042
  4:          0         10   IO-APIC-edge      serial
  6:          0          5   IO-APIC-edge      floppy
  7:          0          0   IO-APIC-edge      parport0
  8:          0          1   IO-APIC-edge      rtc
  9:          0          0   IO-APIC-fasteoi   acpi
 12:          1          3   IO-APIC-edge      i8042
 14:       1477         47   IO-APIC-edge      ide0
 21:          0          0   IO-APIC-fasteoi   uhci_hcd:usb1
 22:       1210         28   IO-APIC-fasteoi   uhci_hcd:usb2
 23:         18      10754   IO-APIC-fasteoi   uhci_hcd:usb3, ehci_hcd:usb5, sata_via
 24:          0          0   IO-APIC-fasteoi   uhci_hcd:usb4
 26:          0        167   IO-APIC-fasteoi   HDA Intel
218:          0          0   PCI-MSI-edge      eth0
NMI:          0          0 
LOC:     195484     191531 
ERR:        234
MIS:          0

Also, please ignore the previous png file; the atl1 driver wasn't
started, so lspci showed zeros for the MSI address.

Jay

Comment 20 Eric W. Biederman 2007-05-14 09:40:57 UTC
Jay Cliburn <jacliburn@bellsouth.net> writes:

> On Mon, 14 May 2007 05:26:00 -0600
> ebiederm@xmission.com (Eric W. Biederman) wrote:
>
>> When you bring up the interface is when we call pci_enable_msi
>> and allocate the msi.
>
> Based upon Eric's statement here, I went back to the failing case and
> recaptured some files while the interface was up.  My previous data
> collections were when the interface was down.
>
> The new files are at:
>
> ftp://ftp.hogchain.net/pub/linux/m2v/apic-problem/m2v-i386-msi-failing

Hmm.  Your 64bit working case is from a 2.6.20 kernel.


Interesting we are using a different delivery mode in
the 32bit and the 64bit kernel.
64bit
Address: 00000000fee01000  Data: 40d9
fixed delivery mode.
dest: 

32bit
Address: 00000000fee0300c  Data: 416a
lowest priority delivery mode

On your 32bit system could you try the patch below.  I want to see if
things work properly with when you are not in lowest priority delivery
mode.

The other truly odd thing is the two MSI mapping capabilities that lpsci
found were not enabled.  So I am puzzled how things are working in this
case.  I'm guessing it is chipset internal magic not using the standard
capabilities.

Thanks,
Eric


diff --git a/include/asm-i386/mach-default/mach_apic.h b/include/asm-i386/mach-default/mach_apic.h
index 6db1c3b..f72c307 100644
--- a/include/asm-i386/mach-default/mach_apic.h
+++ b/include/asm-i386/mach-default/mach_apic.h
@@ -19,8 +19,8 @@ static inline cpumask_t target_cpus(void)
 #define NO_BALANCE_IRQ (0)
 #define esr_disable (0)
 
-#define INT_DELIVERY_MODE dest_LowestPrio
-#define INT_DEST_MODE 1     /* logical delivery broadcast to all procs */
+#define INT_DELIVERY_MODE (dest_Fixed)
+#define INT_DEST_MODE (0)	/* phys delivery to target proc */
 
 static inline unsigned long check_apicid_used(physid_mask_t bitmap, int apicid)
 {

Comment 21 Jay Cliburn 2007-05-14 10:13:34 UTC
On Mon, 14 May 2007 10:38:01 -0600
ebiederm@xmission.com (Eric W. Biederman) wrote:

> Hmm.  Your 64bit working case is from a 2.6.20 kernel.
Actually, it's a 2.6.21-rcX kernel, unless I'm mistaken.  It's the
Fedora 7 Test 4 Live version, and Fedora's kernel numbers don't track
with the vanilla kernel numbering scheme.

> 
> 
> Interesting we are using a different delivery mode in
> the 32bit and the 64bit kernel.
> 64bit
> Address: 00000000fee01000  Data: 40d9
> fixed delivery mode.
> dest: 
> 
> 32bit
> Address: 00000000fee0300c  Data: 416a
> lowest priority delivery mode
> 
> On your 32bit system could you try the patch below.  I want to see if
> things work properly with when you are not in lowest priority delivery
> mode.

I'll get back to you soon with the result...

> 
> The other truly odd thing is the two MSI mapping capabilities that
> lpsci found were not enabled.  So I am puzzled how things are working
> in this case.  I'm guessing it is chipset internal magic not using
> the standard capabilities.
> 
> Thanks,
> Eric
> 
> 
> diff --git a/include/asm-i386/mach-default/mach_apic.h
> b/include/asm-i386/mach-default/mach_apic.h index 6db1c3b..f72c307
> 100644 --- a/include/asm-i386/mach-default/mach_apic.h
> +++ b/include/asm-i386/mach-default/mach_apic.h
> @@ -19,8 +19,8 @@ static inline cpumask_t target_cpus(void)
>  #define NO_BALANCE_IRQ (0)
>  #define esr_disable (0)
>  
> -#define INT_DELIVERY_MODE dest_LowestPrio
> -#define INT_DEST_MODE 1     /* logical delivery broadcast to all
> procs */ +#define INT_DELIVERY_MODE (dest_Fixed)
> +#define INT_DEST_MODE (0)	/* phys delivery to target proc */
>  
>  static inline unsigned long check_apicid_used(physid_mask_t bitmap,
> int apicid) {
> 

Comment 22 Jay Cliburn 2007-05-14 10:47:32 UTC
On Mon, 14 May 2007 10:38:01 -0600
ebiederm@xmission.com (Eric W. Biederman) wrote:


> On your 32bit system could you try the patch below.  


> diff --git a/include/asm-i386/mach-default/mach_apic.h
> b/include/asm-i386/mach-default/mach_apic.h index 6db1c3b..f72c307
> 100644 --- a/include/asm-i386/mach-default/mach_apic.h
> +++ b/include/asm-i386/mach-default/mach_apic.h
> @@ -19,8 +19,8 @@ static inline cpumask_t target_cpus(void)
>  #define NO_BALANCE_IRQ (0)
>  #define esr_disable (0)
>  
> -#define INT_DELIVERY_MODE dest_LowestPrio
> -#define INT_DEST_MODE 1     /* logical delivery broadcast to all
> procs */ +#define INT_DELIVERY_MODE (dest_Fixed)
> +#define INT_DEST_MODE (0)	/* phys delivery to target proc */
>  
>  static inline unsigned long check_apicid_used(physid_mask_t bitmap,
> int apicid) {
> 


Panics the kernel even before I can grab serial console output.  Jpeg
attached...
Comment 23 Jay Cliburn 2007-05-14 14:02:44 UTC
On Mon, 14 May 2007 10:38:01 -0600
ebiederm@xmission.com (Eric W. Biederman) wrote:

> The other truly odd thing is the two MSI mapping capabilities that
> lpsci found were not enabled.  So I am puzzled how things are working
> in this case.  I'm guessing it is chipset internal magic not using
> the standard capabilities.

I pulled from Linus' tree today, built a current git 2.6.22-rc1 kernel,
booted with apic=debug, started the atl1 driver, and produced the
attached dmesg.  This is under the failing i386 instance.

I've figured out that if I leave the network cable disconnected, I can
start the driver with MSI enabled without crashing the system. That's
probably because the NIC doesn't generate any interrupts so long as the
cable is disconnected.

Hope the attached dmesg helps.  If there's anything else I can provide,
please let me know.

Jay
Comment 24 Eric W. Biederman 2007-05-14 14:42:24 UTC
Jay Cliburn <jacliburn@bellsouth.net> writes:


> Panics the kernel even before I can grab serial console output.  Jpeg
> attached...

Bother then I missed something on the testing patch.  I will take a look a
little later and see if I can come up with something that actually works.

Eric

Comment 25 Eric W. Biederman 2007-05-15 05:31:56 UTC
Jay Cliburn <jacliburn@bellsouth.net> writes:

> On Mon, 14 May 2007 10:38:01 -0600
> ebiederm@xmission.com (Eric W. Biederman) wrote:
>
>> The other truly odd thing is the two MSI mapping capabilities that
>> lpsci found were not enabled.  So I am puzzled how things are working
>> in this case.  I'm guessing it is chipset internal magic not using
>> the standard capabilities.
>
> I pulled from Linus' tree today, built a current git 2.6.22-rc1 kernel,
> booted with apic=debug, started the atl1 driver, and produced the
> attached dmesg.  This is under the failing i386 instance.
>
> I've figured out that if I leave the network cable disconnected, I can
> start the driver with MSI enabled without crashing the system. That's
> probably because the NIC doesn't generate any interrupts so long as the
> cable is disconnected.
>
> Hope the attached dmesg helps.  If there's anything else I can provide,
> please let me know.

I'm still trying to figure out (without trying to hard) if this
is a case where MSI interrupts only work in physical mode, and
not in lowest priority deliver mode.

So since I can't easily switch the i386 kernel to use physical
mode.  Here is my attempt to break your 64bit kernel with
lowest priority delivery mode.

Think you could try this and tell me if MSI continues to
work with this patch applied on your 64bit kernel.

All this patch does is override the selection logic in genapic.c

diff --git a/arch/x86_64/kernel/genapic.c b/arch/x86_64/kernel/genapic.c
index 47496a4..92f4925 100644
--- a/arch/x86_64/kernel/genapic.c
+++ b/arch/x86_64/kernel/genapic.c
@@ -55,6 +55,10 @@ void __init setup_apic_routing(void)
 	else
 		genapic = &apic_physflat;
 
+#if 1
+	genapic = &apic_flat;
+#endif
+
 	printk(KERN_INFO "Setting APIC routing to %s\n", genapic->name);
 }
 


Eric

Comment 26 Jay Cliburn 2007-05-15 05:54:16 UTC
Eric W. Biederman wrote:
> Jay Cliburn <jacliburn@bellsouth.net> writes:
> 
>> On Mon, 14 May 2007 10:38:01 -0600
>> ebiederm@xmission.com (Eric W. Biederman) wrote:
>>
>>> The other truly odd thing is the two MSI mapping capabilities that
>>> lpsci found were not enabled.  So I am puzzled how things are working
>>> in this case.  I'm guessing it is chipset internal magic not using
>>> the standard capabilities.
>> I pulled from Linus' tree today, built a current git 2.6.22-rc1 kernel,
>> booted with apic=debug, started the atl1 driver, and produced the
>> attached dmesg.  This is under the failing i386 instance.
>>
>> I've figured out that if I leave the network cable disconnected, I can
>> start the driver with MSI enabled without crashing the system. That's
>> probably because the NIC doesn't generate any interrupts so long as the
>> cable is disconnected.
>>
>> Hope the attached dmesg helps.  If there's anything else I can provide,
>> please let me know.
> 
> I'm still trying to figure out (without trying to hard) if this
> is a case where MSI interrupts only work in physical mode, and
> not in lowest priority deliver mode.
> 
> So since I can't easily switch the i386 kernel to use physical
> mode.  Here is my attempt to break your 64bit kernel with
> lowest priority delivery mode.
> 
> Think you could try this and tell me if MSI continues to
> work with this patch applied on your 64bit kernel.
> 
> All this patch does is override the selection logic in genapic.c
> 
> diff --git a/arch/x86_64/kernel/genapic.c b/arch/x86_64/kernel/genapic.c
> index 47496a4..92f4925 100644
> --- a/arch/x86_64/kernel/genapic.c
> +++ b/arch/x86_64/kernel/genapic.c
> @@ -55,6 +55,10 @@ void __init setup_apic_routing(void)
>  	else
>  		genapic = &apic_physflat;
>  
> +#if 1
> +	genapic = &apic_flat;
> +#endif
> +
>  	printk(KERN_INFO "Setting APIC routing to %s\n", genapic->name);
>  }

Unfortunately, my day job has gotten in the way of my netdev hacking hobby. 
I'll try your patch as soon as I can, probably in the next couple of nights.

Jay

Comment 27 Eric W. Biederman 2007-05-15 06:05:16 UTC
Jay Cliburn <jacliburn@bellsouth.net> writes:

> Unfortunately, my day job has gotten in the way of my netdev hacking hobby. I'll
> try your patch as soon as I can, probably in the next couple of nights.

No problem.

Eric

Comment 28 Jay Cliburn 2007-05-15 19:25:17 UTC
On Tue, 15 May 2007 06:28:39 -0600
ebiederm@xmission.com (Eric W. Biederman) wrote:


> Think you could try this and tell me if MSI continues to
> work with this patch applied on your 64bit kernel.
> 
> All this patch does is override the selection logic in genapic.c
> 
> diff --git a/arch/x86_64/kernel/genapic.c
> b/arch/x86_64/kernel/genapic.c index 47496a4..92f4925 100644
> --- a/arch/x86_64/kernel/genapic.c
> +++ b/arch/x86_64/kernel/genapic.c
> @@ -55,6 +55,10 @@ void __init setup_apic_routing(void)
>  	else
>  		genapic = &apic_physflat;
>  
> +#if 1
> +	genapic = &apic_flat;
> +#endif
> +
>  	printk(KERN_INFO "Setting APIC routing to %s\n",
> genapic->name); }

I installed Fedora 7 Test 4 x86_64 to a hard disk this evening; (for the
past few weeks I've been using a Fedora Live x86_64 distribution
with the pci=msi kernel command line option to enable MSI).  I cloned
Linus' git tree and built a 64-bit current kernel. The resulting
kernel spews the apic error when the network driver is started, just
like the 32-bit kernel does.  This is different from prior behavior (at
least for me). I can't explain it.

No need to apply your patch Eric.  Neither 64-bit nor 32-bit kernels
work reliably with MSI on this board, apparently.

Jay

Comment 29 Eric W. Biederman 2007-05-16 11:01:16 UTC
Jay Cliburn <jacliburn@bellsouth.net> writes:

> On Tue, 15 May 2007 06:28:39 -0600
> ebiederm@xmission.com (Eric W. Biederman) wrote:
>
>
>> Think you could try this and tell me if MSI continues to
>> work with this patch applied on your 64bit kernel.
>> 
>> All this patch does is override the selection logic in genapic.c
>> 
>> diff --git a/arch/x86_64/kernel/genapic.c
>> b/arch/x86_64/kernel/genapic.c index 47496a4..92f4925 100644
>> --- a/arch/x86_64/kernel/genapic.c
>> +++ b/arch/x86_64/kernel/genapic.c
>> @@ -55,6 +55,10 @@ void __init setup_apic_routing(void)
>>  	else
>>  		genapic = &apic_physflat;
>>  
>> +#if 1
>> +	genapic = &apic_flat;
>> +#endif
>> +
>>  	printk(KERN_INFO "Setting APIC routing to %s\n",
>> genapic->name); }
>
> I installed Fedora 7 Test 4 x86_64 to a hard disk this evening; (for the
> past few weeks I've been using a Fedora Live x86_64 distribution
> with the pci=msi kernel command line option to enable MSI).  I cloned
> Linus' git tree and built a 64-bit current kernel. The resulting
> kernel spews the apic error when the network driver is started, just
> like the 32-bit kernel does.  This is different from prior behavior (at
> least for me). I can't explain it.
>
> No need to apply your patch Eric.  Neither 64-bit nor 32-bit kernels
> work reliably with MSI on this board, apparently.

Jay can you please try the opposite of my patch.

genapic = &apic_physflat.

I am still curious to know if the apic mode makes the working versus
non-working difference.

I am concerned that this may be an Opteron thing and not a chipset thing.
I seem to recall Ingo having some problem with Opterons in lowest
priority delivery mode.  

Thanks,
Eric

Comment 30 Jay Cliburn 2007-05-16 13:14:47 UTC
Eric W. Biederman wrote:

> Jay can you please try the opposite of my patch.
> 
> genapic = &apic_physflat.
> 
> I am still curious to know if the apic mode makes the working versus
> non-working difference.
> 
> I am concerned that this may be an Opteron thing and not a chipset thing.
> I seem to recall Ingo having some problem with Opterons in lowest
> priority delivery mode.  

I'll try and do it tonight.  Be advised I'm using Athlons, not Opterons.

Comment 31 Eric W. Biederman 2007-05-16 13:44:02 UTC
Jay Cliburn <jacliburn@bellsouth.net> writes:

> I'll try and do it tonight.  Be advised I'm using Athlons, not Opterons.

Interesting.  A dual core Athlon64.

Regardless my practical curiosity is if the delivery mode affects how
well this works.


Eric

Comment 32 Jay Cliburn 2007-05-16 13:46:57 UTC
Eric W. Biederman wrote:
> Jay Cliburn <jacliburn@bellsouth.net> writes:
> 
>> I'll try and do it tonight.  Be advised I'm using Athlons, not Opterons.
> 
> Interesting.  A dual core Athlon64.

One socket, dual core.  Socket AM2.

Comment 33 Jay Cliburn 2007-05-16 16:55:44 UTC
On Wed, 16 May 2007 11:53:24 -0600
ebiederm@xmission.com (Eric W. Biederman) wrote:

> Jay can you please try the opposite of my patch.
> 
> genapic = &apic_physflat.
> 
> I am still curious to know if the apic mode makes the working versus
> non-working difference.

Applying this patch makes it work -- no apic errors.  Grzegorz, can you
try it?

I've attached /proc/interrupts and dmesg, too.  What next?  (By the way
Eric, thanks a lot for your help.)

diff --git a/arch/x86_64/kernel/genapic.c b/arch/x86_64/kernel/genapic.c
index 47496a4..82c5340 100644
--- a/arch/x86_64/kernel/genapic.c
+++ b/arch/x86_64/kernel/genapic.c
@@ -54,6 +54,9 @@ void __init setup_apic_routing(void)
 		genapic = &apic_flat;
 	else
 		genapic = &apic_physflat;
+#if 1
+	genapic = &apic_physflat;
+#endif
 
 	printk(KERN_INFO "Setting APIC routing to %s\n", genapic->name);
 }
Comment 34 Eric W. Biederman 2007-05-16 17:35:19 UTC
Jay Cliburn <jacliburn@bellsouth.net> writes:

> On Wed, 16 May 2007 11:53:24 -0600
> ebiederm@xmission.com (Eric W. Biederman) wrote:
>
>> Jay can you please try the opposite of my patch.
>> 
>> genapic = &apic_physflat.
>> 
>> I am still curious to know if the apic mode makes the working versus
>> non-working difference.
>
> Applying this patch makes it work -- no apic errors.  Grzegorz, can you
> try it?
>
> I've attached /proc/interrupts and dmesg, too.  What next?  (By the way
> Eric, thanks a lot for your help.)

Ok.  So it looks like we have a problem with lowest priority delivery
mode and msi and your chipset.

You chipset does not have a active hypertransport msi mapping.

So I think it is time to step back and see if we can come up with
a reasonably maintainble MSI enable quirks that will be useable.

Eric

Comment 35 Anonymous Emailer 2007-05-17 00:38:11 UTC
Reply-To: ninex@NineX.eu.org



Jay Cliburn pisze:
> On Wed, 16 May 2007 11:53:24 -0600
> ebiederm@xmission.com (Eric W. Biederman) wrote:
>
>   
>> Jay can you please try the opposite of my patch.
>>
>> genapic = &apic_physflat.
>>
>> I am still curious to know if the apic mode makes the working versus
>> non-working difference.
>>     
>
> Applying this patch makes it work -- no apic errors.  Grzegorz, can you
> try it?
>
>   
sure! :)
i will thest this today,when i back home from work...
and report results
> I've attached /proc/interrupts and dmesg, too.  What next?  (By the way
> Eric, thanks a lot for your help.)
>
> diff --git a/arch/x86_64/kernel/genapic.c b/arch/x86_64/kernel/genapic.c
> index 47496a4..82c5340 100644
> --- a/arch/x86_64/kernel/genapic.c
> +++ b/arch/x86_64/kernel/genapic.c
> @@ -54,6 +54,9 @@ void __init setup_apic_routing(void)
>  		genapic = &apic_flat;
>  	else
>  		genapic = &apic_physflat;
> +#if 1
> +	genapic = &apic_physflat;
> +#endif
>  
>  	printk(KERN_INFO "Setting APIC routing to %s\n", genapic->name);
>  }

-- 



GRZEGORZ {NineX} KRZYSTEK
NineX Inc.
ninex@ninex.eu.org
Kraków, Idzikowskiego 17a
tel. +48 602135796
_____________________________________________________________
The information transmitted is intended only for the person or entity to which it is addressed and may contain confidential and/or privileged material. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipient is prohibited. If you received this in error, please contact the sender and delete the material from any computer.


Comment 36 Anonymous Emailer 2007-05-17 12:32:38 UTC
Reply-To: ninex@NineX.eu.org

Jay Cliburn pisze:
> On Wed, 16 May 2007 11:53:24 -0600
> ebiederm@xmission.com (Eric W. Biederman) wrote:
>
>   
>> Jay can you please try the opposite of my patch.
>>
>> genapic = &apic_physflat.
>>
>> I am still curious to know if the apic mode makes the working versus
>> non-working difference.
>>     
>
> Applying this patch makes it work -- no apic errors.  Grzegorz, can you
> try it?
>
> I've attached /proc/interrupts and dmesg, too.  What next?  (By the way
> Eric, thanks a lot for your help.)
>
> diff --git a/arch/x86_64/kernel/genapic.c b/arch/x86_64/kernel/genapic.c
> index 47496a4..82c5340 100644
> --- a/arch/x86_64/kernel/genapic.c
> +++ b/arch/x86_64/kernel/genapic.c
> @@ -54,6 +54,9 @@ void __init setup_apic_routing(void)
>  		genapic = &apic_flat;
>  	else
>  		genapic = &apic_physflat;
> +#if 1
> +	genapic = &apic_physflat;
> +#endif
>  
>  	printk(KERN_INFO "Setting APIC routing to %s\n", genapic->name);
>  }
this patch don't work with my 2.6.21.1 kernel :(


Comment 37 Eric W. Biederman 2007-05-17 15:02:06 UTC
Grzegorz Krzystek <ninex@NineX.eu.org> writes:

> Jay Cliburn pisze:
>> On Wed, 16 May 2007 11:53:24 -0600
>> ebiederm@xmission.com (Eric W. Biederman) wrote:
>>
>>   
>>> Jay can you please try the opposite of my patch.
>>>
>>> genapic = &apic_physflat.
>>>
>>> I am still curious to know if the apic mode makes the working versus
>>> non-working difference.
>>>     
>>
>> Applying this patch makes it work -- no apic errors.  Grzegorz, can you
>> try it?
>>
>> I've attached /proc/interrupts and dmesg, too.  What next?  (By the way
>> Eric, thanks a lot for your help.)
>>
>> diff --git a/arch/x86_64/kernel/genapic.c b/arch/x86_64/kernel/genapic.c
>> index 47496a4..82c5340 100644
>> --- a/arch/x86_64/kernel/genapic.c
>> +++ b/arch/x86_64/kernel/genapic.c
>> @@ -54,6 +54,9 @@ void __init setup_apic_routing(void)
>>  		genapic = &apic_flat;
>>  	else
>>  		genapic = &apic_physflat;
>> +#if 1
>> +	genapic = &apic_physflat;
>> +#endif
>>  
>>  	printk(KERN_INFO "Setting APIC routing to %s\n", genapic->name);
>>  }
> this patch don't work with my 2.6.21.1 kernel :(

It doesn't apply or it doesn't fix your apic errors?

Eric

Comment 38 Anonymous Emailer 2007-05-18 00:23:35 UTC
Reply-To: ninex@NineX.eu.org

bugme-daemon@bugzilla.kernel.org pisze:
> http://bugzilla.kernel.org/show_bug.cgi?id=8472
>
>
>
>
>
> ------- Additional Comments From ebiederm@xmission.com  2007-05-17 15:02 -------
> Grzegorz Krzystek <ninex@NineX.eu.org> writes:
>
>   
>> Jay Cliburn pisze:
>>     
>>> On Wed, 16 May 2007 11:53:24 -0600
>>> ebiederm@xmission.com (Eric W. Biederman) wrote:
>>>
>>>   
>>>       
>>>> Jay can you please try the opposite of my patch.
>>>>
>>>> genapic = &apic_physflat.
>>>>
>>>> I am still curious to know if the apic mode makes the working versus
>>>> non-working difference.
>>>>     
>>>>         
>>> Applying this patch makes it work -- no apic errors.  Grzegorz, can you
>>> try it?
>>>
>>> I've attached /proc/interrupts and dmesg, too.  What next?  (By the way
>>> Eric, thanks a lot for your help.)
>>>
>>> diff --git a/arch/x86_64/kernel/genapic.c b/arch/x86_64/kernel/genapic.c
>>> index 47496a4..82c5340 100644
>>> --- a/arch/x86_64/kernel/genapic.c
>>> +++ b/arch/x86_64/kernel/genapic.c
>>> @@ -54,6 +54,9 @@ void __init setup_apic_routing(void)
>>>  		genapic = &apic_flat;
>>>  	else
>>>  		genapic = &apic_physflat;
>>> +#if 1
>>> +	genapic = &apic_physflat;
>>> +#endif
>>>  
>>>  	printk(KERN_INFO "Setting APIC routing to %s\n", genapic->name);
>>>  }
>>>       
>> this patch don't work with my 2.6.21.1 kernel :(
>>     
>
>   
it dosnt aplcy cause there are:

print:
              printk(KERN_INFO "Setting APIC routing to %s\n",
genapic->name);
    }

i was fixed patch for this but it dosn't fix APIC problem
> It doesn't apply or it doesn't fix your apic errors?
>
> Eric
>
>
>
> ------- You are receiving this mail because: -------
> You reported the bug, or are watching the reporter.
>
>   

-- 



GRZEGORZ {NineX} KRZYSTEK
NineX Inc.
ninex@ninex.eu.org
Kraków, Idzikowskiego 17a
tel. +48 602135796
_____________________________________________________________
The information transmitted is intended only for the person or entity to which it is addressed and may contain confidential and/or privileged material. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipient is prohibited. If you received this in error, please contact the sender and delete the material from any computer.


Comment 39 huang xiong 2007-05-20 06:11:13 UTC
As I know, L1 does not support MSI, so the initial driver to enable msi is 
wrong.

Is there anybody remove pci_enable_msi and re-try ?

best regards
xiong
Comment 40 Anonymous Emailer 2007-05-20 09:22:54 UTC
Reply-To: ninex@NineX.eu.org

can you prepare patch?
i'm not a programer....
but i found in atl1_main.c

err = pci_enamle_msi(adapter->pdev);
if (err) {
                dev_info(&adapter->pdevp>dev,
                               "Unable to enable MSI: %D\n", err);
                irq_flags |=IRQF_SHARED;
}

how to modify it?

bugme-daemon@bugzilla.kernel.org pisze:
> http://bugzilla.kernel.org/show_bug.cgi?id=8472
>
>
>
>
>
> ------- Additional Comments From huang.xiong@gmail.com  2007-05-20 06:11 -------
> As I know, L1 does not support MSI, so the initial driver to enable msi is 
> wrong.
>
> Is there anybody remove pci_enable_msi and re-try ?
>
> best regards
> xiong
>
> ------- You are receiving this mail because: -------
> You reported the bug, or are watching the reporter.
>
>   

-- 



GRZEGORZ {NineX} KRZYSTEK
NineX Inc.
ninex@ninex.eu.org
Kraków, Idzikowskiego 17a
tel. +48 602135796
_____________________________________________________________
The information transmitted is intended only for the person or entity to
which it is addressed and may contain confidential and/or privileged
material. Any review, retransmission, dissemination or other use of, or
taking of any action in reliance upon, this information by persons or
entities other than the intended recipient is prohibited. If you
received this in error, please contact the sender and delete the
material from any computer.



Comment 41 huang xiong 2007-05-21 04:55:41 UTC
really sorry, I made a mistake.

Today I do watch TLP via Protocol analyzer , L1 does support MSI.

best regards
xiong
Comment 42 Jay Cliburn 2007-05-22 18:44:08 UTC
On Wed, 16 May 2007 18:31:37 -0600
ebiederm@xmission.com (Eric W. Biederman) wrote:

> Ok.  So it looks like we have a problem with lowest priority delivery
> mode and msi and your chipset.
> 
> You chipset does not have a active hypertransport msi mapping.
> 
> So I think it is time to step back and see if we can come up with
> a reasonably maintainble MSI enable quirks that will be useable.

Where are we on this?  I need advice from the more experienced kernel
developers here.

Recap:

- MSI is enabled by default in the atl1 network device driver.

- The atl1 driver with MSI enabled results in debilitating APIC errors
on the Asus M2V mainboard (VIA K8T890 chipset, VT3351 host
bridge/ioapic).

- Under vanilla 2.6.22 x86_64, forcing apic routing to physflat seems to
fix it for me, but does /not/ fix it for the OP under gentoo 2.6.21
x86_64.

- Booting with pci=nomsi provides a workaround for the APIC errors.

- The atl1 driver with MSI enabled works fine on Intel chipset
mainboards.


Should I:

(a) sit back, shutup, and wait for a fix from Eric et al.;
(b) propose a quirk in drivers/pci/quirks.c that disables MSI
altogether when it sees this chipset;
(c) remove MSI from the atl1 driver;
(d) other?

I'd really like to avoid foisting this APIC error stuff on an
unsuspecting user base.  And this driver is in -stable now.

Please.  Guide me.  I'm pretty new at the netdev maintenance thing.

Jay

Comment 43 Eric W. Biederman 2007-05-24 12:52:06 UTC
Jay Cliburn <jacliburn@bellsouth.net> writes:

> On Wed, 16 May 2007 18:31:37 -0600
> ebiederm@xmission.com (Eric W. Biederman) wrote:
>
>> Ok.  So it looks like we have a problem with lowest priority delivery
>> mode and msi and your chipset.
>> 
>> You chipset does not have a active hypertransport msi mapping.
>> 
>> So I think it is time to step back and see if we can come up with
>> a reasonably maintainble MSI enable quirks that will be useable.
>
> Where are we on this?  I need advice from the more experienced kernel
> developers here.
>
> Recap:
>
> - MSI is enabled by default in the atl1 network device driver.
>
> - The atl1 driver with MSI enabled results in debilitating APIC errors
> on the Asus M2V mainboard (VIA K8T890 chipset, VT3351 host
> bridge/ioapic).
>
> - Under vanilla 2.6.22 x86_64, forcing apic routing to physflat seems to
> fix it for me, but does /not/ fix it for the OP under gentoo 2.6.21
> x86_64.
>
> - Booting with pci=nomsi provides a workaround for the APIC errors.
>
> - The atl1 driver with MSI enabled works fine on Intel chipset
> mainboards.
>
>
> Should I:
>
> (a) sit back, shutup, and wait for a fix from Eric et al.;
> (b) propose a quirk in drivers/pci/quirks.c that disables MSI
> altogether when it sees this chipset;
> (c) remove MSI from the atl1 driver;
> (d) other?
>
> I'd really like to avoid foisting this APIC error stuff on an
> unsuspecting user base.  And this driver is in -stable now.
>
> Please.  Guide me.  I'm pretty new at the netdev maintenance thing.

Thanks for asking (sorry for not replying sooner), I missed this
in the deluge in my inbox.

So the practical problem is how do we avoid foisting the APIC error
and other non-functioning MSI problems on an unsuspecting user base.

I am kind of Mr. MSI because be default more then by choice.

So from the information that I have available.  We need to figure out
how to disable MSI by default on everything at the bus level, and
then we can add in quirks to turn MSI on as appropriate.  Anything
else seems to be just looking for trouble.

My hypothesis that there would be an MSI irq mapping capability
appears incorrect, so even on hypertransport case it appears that
you will have to know the chipset to be able to enable MSI safely.
Although if you have a msi mapping capability and if it is enabled
that seems to be a reasonable default.

So it looks like we need to go down the only enable MSI on known
chipsets path to make this safe.  Hmm... This looks like a fairly
small and simple patch.  I will see if I can post a patchset later
today.

Eric

Comment 44 Jay Cliburn 2007-06-24 11:31:46 UTC
Subject: Re: [Bugme-new]  New: atl1 module APIC error when MSI
 enabled in kernel 2.6.21

On Thu, 24 May 2007 13:48:14 -0600
ebiederm@xmission.com (Eric W. Biederman) wrote:

> Jay Cliburn <jacliburn@bellsouth.net> writes:

> > Where are we on this?  I need advice from the more experienced
> > kernel developers here.
> >
> > Recap:
> >
> > - MSI is enabled by default in the atl1 network device driver.
> >
> > - The atl1 driver with MSI enabled results in debilitating APIC
> > errors on the Asus M2V mainboard (VIA K8T890 chipset, VT3351 host
> > bridge/ioapic).
> >
> > - Under vanilla 2.6.22 x86_64, forcing apic routing to physflat
> > seems to fix it for me, but does /not/ fix it for the OP under
> > gentoo 2.6.21 x86_64.
> >
> > - Booting with pci=nomsi provides a workaround for the APIC errors.
> >
> > - The atl1 driver with MSI enabled works fine on Intel chipset
> > mainboards.
> >
> >
> > Should I:
> >
> > (a) sit back, shutup, and wait for a fix from Eric et al.;
> > (b) propose a quirk in drivers/pci/quirks.c that disables MSI
> > altogether when it sees this chipset;
> > (c) remove MSI from the atl1 driver;
> > (d) other?
> >
> > I'd really like to avoid foisting this APIC error stuff on an
> > unsuspecting user base.  And this driver is in -stable now.

Option (b) was implemented.

Please close bugzilla 8472 owing to the following commit:

commit 184b812f7da6726d7ea4ca409c7a8762ff6c6df6
Author: Jay Cliburn <jacliburn@bellsouth.net>
Date:   Sat May 26 17:01:04 2007 -0500

    PCI: quirk disable MSI on via vt3351
    
    The Via VT3351 APIC does not play well with MSI and unleashes a flood
    of APIC errors when MSI is used to deliver interrupts.  The problem
    was recently exposed when the atl1 network device driver, which enables
    MSI by default, stimulated APIC errors on an Asus M2V mainboard, which
    employs the Via VT3351.
    See http://bugzilla.kernel.org/show_bug.cgi?id=8472 for additional
    details on this bug.
    
    Signed-off-by: Jay Cliburn <jacliburn@bellsouth.net>
    Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

Jay
Comment 45 Tejun Heo 2007-06-24 19:40:26 UTC
Alright, thanks.