Bug 4774

Summary: e1000 driver works on UP, but not SMP x86_64
Product: Drivers Reporter: Alexey Dobriyan (adobriyan)
Component: NetworkAssignee: Jeff Garzik (jgarzik)
Status: REJECTED INVALID    
Severity: normal CC: akpm, dlang, greg, nacc
Priority: P2    
Hardware: i386   
OS: Linux   
Kernel Version: 2.6.7, ..., 2.6.12 Subsystem:
Regression: --- Bisected commit-id:
Attachments: UP .config
Debugging patch
SMP .config
cat /proc/interrupts
2.6.7 .config that works
2.6.12.3 .config
working config with ACPI

Description Alexey Dobriyan 2005-06-21 13:27:41 UTC
From: David Lang <david.lang@digitalinsight.com>
http://marc.theaimsgroup.com/?t=111922556200001&r=1&w=2

I have some systems with three Intel quad gig-E cards in them that 
function with the attached UP config, but port 4 of each card doesn't work 
properly with a SMP kernel (otherwise the same config).

on a SMP kernel when I do an ifconfig of the fourth port I get the 
following error
SIOCSIFFLAGS: Function not implemented

doing an ifconfig of the interface then looks proper, but no network route 
is added.

I first ran into this problem with a 2.6.7 kernel and tried several 
kernels from there to 2.6.12, all of which showed the same problem on SMP 
kernels. the problem happens with the driver built-in and as a module.

the systems are dual Opteron 246, 2G ram MPT fusion SCSI drives.
Comment 1 Alexey Dobriyan 2005-06-21 13:30:03 UTC
Created attachment 5200 [details]
UP .config
Comment 2 Alexey Dobriyan 2005-06-24 13:51:24 UTC
From: David Lang <david.lang@digitalinsight.com>

Ganesh Venkatesan <ganesh.venkatesan@gmail.com> wrote:
> Does this not happen with kernels earlier than 2.6.7 or you have not
> tried them? After you ifconfig the fourth port what does
> /proc/interrupts look like? Any additional info in syslog?

I haven't tried with kernels earlier then 2.6.7, when I ran into the 
problem there I tried upgradeing to see if it fixed it and it didn't.

here is /proc/interrupts with eth0 and eth1 in use
happy1-p:~# cat /proc/interrupts
            CPU0       CPU1
   0:     651218      31813    IO-APIC-edge  timer
   2:          0          0          XT-PIC  cascade
   3:        208          1    IO-APIC-edge  serial
   4:         76          1    IO-APIC-edge  serial
   8:          2          1    IO-APIC-edge  rtc
  14:          0         13    IO-APIC-edge  ide0
  25:       7909          1   IO-APIC-level  eth0
  26:       1595          1   IO-APIC-level  eth1
  29:       1698         87   IO-APIC-level  ioc0
NMI:          0          0
LOC:     674604     677630
ERR:          0
MIS:          0

here is after doing ifconfig

  cat /proc/interrupts
            CPU0       CPU1
   0:     760722      31813    IO-APIC-edge  timer
   2:          0          0          XT-PIC  cascade
   3:        555          1    IO-APIC-edge  serial
   4:         76          1    IO-APIC-edge  serial
   8:          2          1    IO-APIC-edge  rtc
  14:          0         13    IO-APIC-edge  ide0
  25:       9146          1   IO-APIC-level  eth0
  26:       1830          1   IO-APIC-level  eth1
  29:       1736         87   IO-APIC-level  ioc0
NMI:          0          0
LOC:     784112     787138
ERR:          0
MIS:          0

syslog shows
Jun 24 13:37:12 happy1-p kernel: e1000: eth11: e1000_up: Unable to allocate
interrupt Error: -22
Comment 3 Nishanth Aravamudan 2005-06-24 14:27:54 UTC
Created attachment 5211 [details]
Debugging patch

That e1000 error message indicates an EINVAL error code, which is from this
code:

	if ((irqflags & SA_SHIRQ) && !dev_id)
		return -EINVAL;
	if (irq >= NR_IRQS)
		return -EINVAL;
	if (!handler)
		return -EINVAL;

I don't think it's the last one, because e1000_intr (which is sent in to
request_irq() from e1000) is prototyped/defined. I spun up a patch to spit out
some debugging here which simply inserts some printks (if the only driver which
gets this warning is e1000, then it shouldn't flood your logs) -- basically
narrowing down which error condition is causing the failure. I'm guessing it's
probably the first case, but let's be sure.

Thanks,
Nish
Comment 4 Alexey Dobriyan 2005-06-24 15:41:46 UTC
From: David Lang <david.lang@digitalinsight.com>

happy1-p:~# ifconfig eth11 192.168.255.1
SIOCSIFFLAGS: Invalid argument
happy1-p:~# tail -3 /var/log/messages
Jun 24 15:14:01 happy1-p /USR/SBIN/CRON[392]: (root) CMD (touch
/tmp/.crond_running >/dev/null 2>/dev/null)
Jun 24 15:14:22 happy1-p kernel: request_irq: IRQ requested is greater than NR_IRQs!
Jun 24 15:14:22 happy1-p kernel: e1000: eth11: e1000_up: Unable to allocate
interrupt Error: -22
Comment 5 Nishanth Aravamudan 2005-06-25 10:55:44 UTC
Hrm, that means that the corresponding PCI device (adapter->pdev->irq) is
requesting an IRQ greater than 224? Could you also attach the SMP .config? I
assume all you did was enabled SMP, ran make oldconfig & rebuilt? Do you know of
any kernel that *does* work?

Thanks,
Nish
Comment 6 Alexey Dobriyan 2005-06-27 08:53:29 UTC
From: David Lang <david.lang@digitalinsight.com>

I have a dual athlon system which I have used with one of these cards 
without a problem, monday I'll try pulling some of the cards out of a 
machine and see if that eliminates the error.

I had the SMP config, tested it, then disables SMP and recompiled to 
generate the .config I sent earlier

attached is the config I used with the debug test
Comment 7 Alexey Dobriyan 2005-06-27 08:55:58 UTC
Created attachment 5225 [details]
SMP .config

config David used with the debug test
Comment 8 David Lang 2005-06-27 15:16:43 UTC
Created attachment 5226 [details]
cat /proc/interrupts

I verified that this bug happens from one to four quad cards, I've attached
what /proc/interfaces looks like after attempting to bring up all 16 interfaces
Comment 9 Alexey Dobriyan 2005-06-29 14:00:55 UTC
From: David Lang <david.lang@digitalinsight.com>

this is the 2.6.7 dual athlon config that is working for me. today I'm
going to update this box to 2.6.12.1 to make it easier to diff the working
vs non-working config (we did try putting this kernel on the dual Opteron
box and it booted and generated the same error, but this kernel doesn't
have K8 or generic 386 support)
Comment 10 Alexey Dobriyan 2005-06-29 14:02:24 UTC
Created attachment 5240 [details]
2.6.7 .config that works
Comment 11 Alexey Dobriyan 2005-07-21 02:13:35 UTC
From: David Lang <david.lang@digitalinsight.com>

I just tried 2.6.12.3 with the attached config on a dual athlon box where
it ran with no problem, on the dual Opteron box it produces the same error

what additional testing do you need me to do?
Comment 12 Alexey Dobriyan 2005-07-21 02:14:56 UTC
Created attachment 5355 [details]
2.6.12.3 .config
Comment 13 Greg Kroah-Hartman 2005-07-25 16:32:32 UTC
Hm, ACPI is not enabled in your .config, any reason why?  Try enabling it
and see if that helps things work better.  If not, care to attach your 
early boot messages?
Comment 14 Andrew Morton 2005-07-28 22:01:16 UTC
Can we have an update on this one please?  Retesting
2.6.13-rc4 would help.
Comment 15 David Lang 2005-08-01 16:30:12 UTC
reply to Greg,
my understandign was that ACPI was primarily power management, not something
needed for these servers (the only reason any power management is enabled is to
allow the systems to power themselves down on shutdown), I'll give it a shot.
Comment 16 Greg Kroah-Hartman 2005-08-01 16:42:02 UTC
No, ACPI is not primarily power management at all :)
Comment 17 David Lang 2005-08-01 17:43:04 UTC
then can you move ACPI out of the power management section of menuconfig so poor
folks like me don't make this mistake :-)

first test with 2.6.13-pre4 and ACPI works, but it was compiled differently (the
system I had been useing blew up on me last week) so now the next step will be
to try it again compileing it in a chroot sandbox with the old tools.

the new setup is Debian Woody, the old one was Slackware 10.0
Comment 18 David Lang 2005-08-01 18:34:26 UTC
recompiling with the slackware toolchain worked as well, going to 2.6.12.3 with
ACPI turned on apprears to work as well (working config to be uploaded shortly)
Comment 19 David Lang 2005-08-01 18:35:30 UTC
Created attachment 5466 [details]
working config with ACPI

could someone please confirm that the only relavent change is the ACPI change?
Comment 20 Andrew Morton 2005-08-04 13:06:30 UTC
David, please confirm that enabling ACPI fixed this problem.

If so then please close off this bug, thanks.
Comment 21 David Lang 2005-08-05 14:09:00 UTC
enabling ACPI appears to have cleared up the issue, all the way back to 2.6.7

I do not have the authority to close this ticket but consider it solved
Comment 22 Andrew Morton 2005-08-05 14:17:42 UTC
Well it's nice to see ACPI actually fixing something for once ;)

Thanks, David.