Bug 4774
Summary: | e1000 driver works on UP, but not SMP x86_64 | ||
---|---|---|---|
Product: | Drivers | Reporter: | Alexey Dobriyan (adobriyan) |
Component: | Network | Assignee: | Jeff Garzik (jgarzik) |
Status: | REJECTED INVALID | ||
Severity: | normal | CC: | akpm, dlang, greg, nacc |
Priority: | P2 | ||
Hardware: | i386 | ||
OS: | Linux | ||
Kernel Version: | 2.6.7, ..., 2.6.12 | Subsystem: | |
Regression: | --- | Bisected commit-id: | |
Attachments: |
UP .config
Debugging patch SMP .config cat /proc/interrupts 2.6.7 .config that works 2.6.12.3 .config working config with ACPI |
Description
Alexey Dobriyan
2005-06-21 13:27:41 UTC
Created attachment 5200 [details]
UP .config
From: David Lang <david.lang@digitalinsight.com> Ganesh Venkatesan <ganesh.venkatesan@gmail.com> wrote: > Does this not happen with kernels earlier than 2.6.7 or you have not > tried them? After you ifconfig the fourth port what does > /proc/interrupts look like? Any additional info in syslog? I haven't tried with kernels earlier then 2.6.7, when I ran into the problem there I tried upgradeing to see if it fixed it and it didn't. here is /proc/interrupts with eth0 and eth1 in use happy1-p:~# cat /proc/interrupts CPU0 CPU1 0: 651218 31813 IO-APIC-edge timer 2: 0 0 XT-PIC cascade 3: 208 1 IO-APIC-edge serial 4: 76 1 IO-APIC-edge serial 8: 2 1 IO-APIC-edge rtc 14: 0 13 IO-APIC-edge ide0 25: 7909 1 IO-APIC-level eth0 26: 1595 1 IO-APIC-level eth1 29: 1698 87 IO-APIC-level ioc0 NMI: 0 0 LOC: 674604 677630 ERR: 0 MIS: 0 here is after doing ifconfig cat /proc/interrupts CPU0 CPU1 0: 760722 31813 IO-APIC-edge timer 2: 0 0 XT-PIC cascade 3: 555 1 IO-APIC-edge serial 4: 76 1 IO-APIC-edge serial 8: 2 1 IO-APIC-edge rtc 14: 0 13 IO-APIC-edge ide0 25: 9146 1 IO-APIC-level eth0 26: 1830 1 IO-APIC-level eth1 29: 1736 87 IO-APIC-level ioc0 NMI: 0 0 LOC: 784112 787138 ERR: 0 MIS: 0 syslog shows Jun 24 13:37:12 happy1-p kernel: e1000: eth11: e1000_up: Unable to allocate interrupt Error: -22 Created attachment 5211 [details]
Debugging patch
That e1000 error message indicates an EINVAL error code, which is from this
code:
if ((irqflags & SA_SHIRQ) && !dev_id)
return -EINVAL;
if (irq >= NR_IRQS)
return -EINVAL;
if (!handler)
return -EINVAL;
I don't think it's the last one, because e1000_intr (which is sent in to
request_irq() from e1000) is prototyped/defined. I spun up a patch to spit out
some debugging here which simply inserts some printks (if the only driver which
gets this warning is e1000, then it shouldn't flood your logs) -- basically
narrowing down which error condition is causing the failure. I'm guessing it's
probably the first case, but let's be sure.
Thanks,
Nish
From: David Lang <david.lang@digitalinsight.com> happy1-p:~# ifconfig eth11 192.168.255.1 SIOCSIFFLAGS: Invalid argument happy1-p:~# tail -3 /var/log/messages Jun 24 15:14:01 happy1-p /USR/SBIN/CRON[392]: (root) CMD (touch /tmp/.crond_running >/dev/null 2>/dev/null) Jun 24 15:14:22 happy1-p kernel: request_irq: IRQ requested is greater than NR_IRQs! Jun 24 15:14:22 happy1-p kernel: e1000: eth11: e1000_up: Unable to allocate interrupt Error: -22 Hrm, that means that the corresponding PCI device (adapter->pdev->irq) is requesting an IRQ greater than 224? Could you also attach the SMP .config? I assume all you did was enabled SMP, ran make oldconfig & rebuilt? Do you know of any kernel that *does* work? Thanks, Nish From: David Lang <david.lang@digitalinsight.com> I have a dual athlon system which I have used with one of these cards without a problem, monday I'll try pulling some of the cards out of a machine and see if that eliminates the error. I had the SMP config, tested it, then disables SMP and recompiled to generate the .config I sent earlier attached is the config I used with the debug test Created attachment 5225 [details]
SMP .config
config David used with the debug test
Created attachment 5226 [details]
cat /proc/interrupts
I verified that this bug happens from one to four quad cards, I've attached
what /proc/interfaces looks like after attempting to bring up all 16 interfaces
From: David Lang <david.lang@digitalinsight.com> this is the 2.6.7 dual athlon config that is working for me. today I'm going to update this box to 2.6.12.1 to make it easier to diff the working vs non-working config (we did try putting this kernel on the dual Opteron box and it booted and generated the same error, but this kernel doesn't have K8 or generic 386 support) Created attachment 5240 [details]
2.6.7 .config that works
From: David Lang <david.lang@digitalinsight.com> I just tried 2.6.12.3 with the attached config on a dual athlon box where it ran with no problem, on the dual Opteron box it produces the same error what additional testing do you need me to do? Created attachment 5355 [details]
2.6.12.3 .config
Hm, ACPI is not enabled in your .config, any reason why? Try enabling it and see if that helps things work better. If not, care to attach your early boot messages? Can we have an update on this one please? Retesting 2.6.13-rc4 would help. reply to Greg, my understandign was that ACPI was primarily power management, not something needed for these servers (the only reason any power management is enabled is to allow the systems to power themselves down on shutdown), I'll give it a shot. No, ACPI is not primarily power management at all :) then can you move ACPI out of the power management section of menuconfig so poor folks like me don't make this mistake :-) first test with 2.6.13-pre4 and ACPI works, but it was compiled differently (the system I had been useing blew up on me last week) so now the next step will be to try it again compileing it in a chroot sandbox with the old tools. the new setup is Debian Woody, the old one was Slackware 10.0 recompiling with the slackware toolchain worked as well, going to 2.6.12.3 with ACPI turned on apprears to work as well (working config to be uploaded shortly) Created attachment 5466 [details]
working config with ACPI
could someone please confirm that the only relavent change is the ACPI change?
David, please confirm that enabling ACPI fixed this problem. If so then please close off this bug, thanks. enabling ACPI appears to have cleared up the issue, all the way back to 2.6.7 I do not have the authority to close this ticket but consider it solved Well it's nice to see ACPI actually fixing something for once ;) Thanks, David. |