Bug 5080 - bonding related oops on boot
Summary: bonding related oops on boot
Status: REJECTED INSUFFICIENT_DATA
Alias: None
Product: Networking
Classification: Unclassified
Component: Other (show other bugs)
Hardware: i386 Linux
: P2 normal
Assignee: Arnaldo Carvalho de Melo
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2005-08-17 07:20 UTC by Stefan Praszalowicz
Modified: 2007-09-07 15:14 UTC (History)
2 users (show)

See Also:
Kernel Version: 2.6.13-rc6
Subsystem:
Regression: Yes
Bisected commit-id:


Attachments
kernel .config (27.80 KB, text/plain)
2005-08-17 07:22 UTC, Stefan Praszalowicz
Details

Description Stefan Praszalowicz 2005-08-17 07:20:29 UTC
Distribution: debian pure64
Hardware Environment: 4 way x86_64
Software Environment: linux 2.6.12, 2.6.13-rc6 
Problem Description:

I have a bond with two slave interfaces, both connected.

On boot, when the bond gets initialized, I get the following oops:

[  136.773164] Ethernet Channel Bonding Driver: v2.6.3 (June 8, 2005)
[  136.773266] bonding: In ALB mode you might experience client disconnections
upon reconnection of a link if the bonding module updelay parameter (15000 msec)
is incompatible with the forwarding delay time of the switch
[  136.773427] bonding: MII link monitoring set to 100 ms
[  137.122235] bonding: bond0: enslaving eth0 as an active interface with a down
link.
[  137.353781] bonding: bond0: enslaving eth1 as an active interface with a down
link.
[  137.397579] e100: eth2: e100_watchdog: link up, 100Mbps, full-duplex
[  138.823615] NET: Registered protocol family 10
[  138.824319] IPv6 over IPv4 tunneling driver
[  142.995176] tg3: eth0: Link is up at 1000 Mbps, full duplex.
[  142.995238] tg3: eth0: Flow control is on for TX and on for RX.
[  142.995294] bonding: bond0: link status up for interface eth0, enabling it in
15000 ms.
[  144.226482] tg3: eth1: Link is up at 1000 Mbps, full duplex.
[  144.226543] tg3: eth1: Flow control is on for TX and on for RX.
[  144.293858] bonding: bond0: link status up for interface eth1, enabling it in
15000 ms.
[  149.051679] eth0: no IPv6 routers present
[  149.311570] bond0: no IPv6 routers present
[  149.661411] eth1: no IPv6 routers present
[  149.781362] eth2: no IPv6 routers present
[  157.987577] bonding: bond0: link status definitely up for interface eth0.

[  157.987642] bonding: bond0: making interface eth0 the new active one.
[  158.023763] RTNL: assertion failed at net/ipv4/devinet.c (962)
[  158.023819]
[  158.023820] Call Trace: <IRQ> <ffffffff80273c03>{rt_run_flush+48}
<ffffffff8029d6c8>{inetdev_event+116}
[  158.023964]        <ffffffff80273c4e>{rt_run_flush+123}
<ffffffff8013f087>{notifier_call_chain+31}
[  158.024094]        <ffffffff802610f8>{dev_set_mac_address+84}
<ffffffff88035740>{:bonding:alb_set_slave_mac_addr+76}
[  158.024233]        <ffffffff8803580f>{:bonding:alb_swap_mac_addr+170}
<ffffffff8802f0b8>{:bonding:bond_change_active_slave+546}
[  158.024373]        <ffffffff8802f997>{:bonding:bond_mii_monitor+1012}
<ffffffff8802f5a3>{:bonding:bond_mii_monitor+0}
[  158.024507]        <ffffffff8013a86a>{run_timer_softirq+384}
<ffffffff80136d42>{__do_softirq+110}
[  158.024631]        <ffffffff8010ec07>{call_softirq+31}
<ffffffff801106a1>{do_softirq+54}
[  158.024752]        <ffffffff8010e3b6>{apic_timer_interrupt+98}  <EOI>
<ffffffff801f7cbc>{acpi_walk_namespace+117}
[  158.024886]        <ffffffff8010bf4b>{mwait_idle+86}
<ffffffff80207694>{acpi_processor_idle+298}
[  158.025011]        <ffffffff8010bedb>{cpu_idle+76}
<ffffffff803fc708>{start_kernel+372}
[  158.025130]        <ffffffff803fc216>{_sinittext+534}
[  158.061376] RTNL: assertion failed at net/ipv4/devinet.c (962)
[  158.061431]
[  158.061432] Call Trace: <IRQ> <ffffffff80273c32>{rt_run_flush+95}
<ffffffff8029d6c8>{inetdev_event+116}
[  158.061569]        <ffffffff80273c4e>{rt_run_flush+123}
<ffffffff8013f087>{notifier_call_chain+31}
[  158.061693]        <ffffffff802610f8>{dev_set_mac_address+84}
<ffffffff88035740>{:bonding:alb_set_slave_mac_addr+76}
[  158.061826]        <ffffffff88035821>{:bonding:alb_swap_mac_addr+188}
<ffffffff8802f0b8>{:bonding:bond_change_active_slave+546}
[  158.061963]        <ffffffff8802f997>{:bonding:bond_mii_monitor+1012}
<ffffffff8802f5a3>{:bonding:bond_mii_monitor+0}
[  158.062096]        <ffffffff8013a86a>{run_timer_softirq+384}
<ffffffff80136d42>{__do_softirq+110}
[  158.062219]        <ffffffff8010ec07>{call_softirq+31}
<ffffffff801106a1>{do_softirq+54}
[  158.062338]        <ffffffff8010e3b6>{apic_timer_interrupt+98}  <EOI>
<ffffffff801f7cbc>{acpi_walk_namespace+117}
[  158.062470]        <ffffffff8010bf4b>{mwait_idle+86}
<ffffffff80207694>{acpi_processor_idle+298}
[  158.062593]        <ffffffff8010bedb>{cpu_idle+76}
<ffffffff803fc708>{start_kernel+372}
[  158.062711]        <ffffffff803fc216>{_sinittext+534}
[  159.366930] bonding: bond0: link status definitely up for interface eth1.

The bond has the following options:
 options bonding mode=6 miimon=100 updelay=15000 max_bonds=2

I know this doesn't appear with 2.6.11.12, and that it did in a 2.6.12, although
I don't know which precisely :(

I'll attach my .config

Steps to reproduce: just boot :p
Comment 1 Stefan Praszalowicz 2005-08-17 07:22:06 UTC
Created attachment 5656 [details]
kernel .config
Comment 2 Andrew Morton 2005-08-17 09:20:42 UTC

Begin forwarded message:

Date: Wed, 17 Aug 2005 07:20:36 -0700
From: bugme-daemon@kernel-bugs.osdl.org
To: bugme-new@lists.osdl.org
Subject: [Bugme-new] [Bug 5080] New: bonding related oops on boot


http://bugzilla.kernel.org/show_bug.cgi?id=5080

           Summary: bonding related oops on boot
    Kernel Version: 2.6.13-rc6
            Status: NEW
          Severity: normal
             Owner: acme@conectiva.com.br
         Submitter: stefan@avedya.com


Distribution: debian pure64
Hardware Environment: 4 way x86_64
Software Environment: linux 2.6.12, 2.6.13-rc6 
Problem Description:

I have a bond with two slave interfaces, both connected.

On boot, when the bond gets initialized, I get the following oops:

[  136.773164] Ethernet Channel Bonding Driver: v2.6.3 (June 8, 2005)
[  136.773266] bonding: In ALB mode you might experience client disconnections
upon reconnection of a link if the bonding module updelay parameter (15000 msec)
is incompatible with the forwarding delay time of the switch
[  136.773427] bonding: MII link monitoring set to 100 ms
[  137.122235] bonding: bond0: enslaving eth0 as an active interface with a down
link.
[  137.353781] bonding: bond0: enslaving eth1 as an active interface with a down
link.
[  137.397579] e100: eth2: e100_watchdog: link up, 100Mbps, full-duplex
[  138.823615] NET: Registered protocol family 10
[  138.824319] IPv6 over IPv4 tunneling driver
[  142.995176] tg3: eth0: Link is up at 1000 Mbps, full duplex.
[  142.995238] tg3: eth0: Flow control is on for TX and on for RX.
[  142.995294] bonding: bond0: link status up for interface eth0, enabling it in
15000 ms.
[  144.226482] tg3: eth1: Link is up at 1000 Mbps, full duplex.
[  144.226543] tg3: eth1: Flow control is on for TX and on for RX.
[  144.293858] bonding: bond0: link status up for interface eth1, enabling it in
15000 ms.
[  149.051679] eth0: no IPv6 routers present
[  149.311570] bond0: no IPv6 routers present
[  149.661411] eth1: no IPv6 routers present
[  149.781362] eth2: no IPv6 routers present
[  157.987577] bonding: bond0: link status definitely up for interface eth0.

[  157.987642] bonding: bond0: making interface eth0 the new active one.
[  158.023763] RTNL: assertion failed at net/ipv4/devinet.c (962)
[  158.023819]
[  158.023820] Call Trace: <IRQ> <ffffffff80273c03>{rt_run_flush+48}
<ffffffff8029d6c8>{inetdev_event+116}
[  158.023964]        <ffffffff80273c4e>{rt_run_flush+123}
<ffffffff8013f087>{notifier_call_chain+31}
[  158.024094]        <ffffffff802610f8>{dev_set_mac_address+84}
<ffffffff88035740>{:bonding:alb_set_slave_mac_addr+76}
[  158.024233]        <ffffffff8803580f>{:bonding:alb_swap_mac_addr+170}
<ffffffff8802f0b8>{:bonding:bond_change_active_slave+546}
[  158.024373]        <ffffffff8802f997>{:bonding:bond_mii_monitor+1012}
<ffffffff8802f5a3>{:bonding:bond_mii_monitor+0}
[  158.024507]        <ffffffff8013a86a>{run_timer_softirq+384}
<ffffffff80136d42>{__do_softirq+110}
[  158.024631]        <ffffffff8010ec07>{call_softirq+31}
<ffffffff801106a1>{do_softirq+54}
[  158.024752]        <ffffffff8010e3b6>{apic_timer_interrupt+98}  <EOI>
<ffffffff801f7cbc>{acpi_walk_namespace+117}
[  158.024886]        <ffffffff8010bf4b>{mwait_idle+86}
<ffffffff80207694>{acpi_processor_idle+298}
[  158.025011]        <ffffffff8010bedb>{cpu_idle+76}
<ffffffff803fc708>{start_kernel+372}
[  158.025130]        <ffffffff803fc216>{_sinittext+534}
[  158.061376] RTNL: assertion failed at net/ipv4/devinet.c (962)
[  158.061431]
[  158.061432] Call Trace: <IRQ> <ffffffff80273c32>{rt_run_flush+95}
<ffffffff8029d6c8>{inetdev_event+116}
[  158.061569]        <ffffffff80273c4e>{rt_run_flush+123}
<ffffffff8013f087>{notifier_call_chain+31}
[  158.061693]        <ffffffff802610f8>{dev_set_mac_address+84}
<ffffffff88035740>{:bonding:alb_set_slave_mac_addr+76}
[  158.061826]        <ffffffff88035821>{:bonding:alb_swap_mac_addr+188}
<ffffffff8802f0b8>{:bonding:bond_change_active_slave+546}
[  158.061963]        <ffffffff8802f997>{:bonding:bond_mii_monitor+1012}
<ffffffff8802f5a3>{:bonding:bond_mii_monitor+0}
[  158.062096]        <ffffffff8013a86a>{run_timer_softirq+384}
<ffffffff80136d42>{__do_softirq+110}
[  158.062219]        <ffffffff8010ec07>{call_softirq+31}
<ffffffff801106a1>{do_softirq+54}
[  158.062338]        <ffffffff8010e3b6>{apic_timer_interrupt+98}  <EOI>
<ffffffff801f7cbc>{acpi_walk_namespace+117}
[  158.062470]        <ffffffff8010bf4b>{mwait_idle+86}
<ffffffff80207694>{acpi_processor_idle+298}
[  158.062593]        <ffffffff8010bedb>{cpu_idle+76}
<ffffffff803fc708>{start_kernel+372}
[  158.062711]        <ffffffff803fc216>{_sinittext+534}
[  159.366930] bonding: bond0: link status definitely up for interface eth1.

The bond has the following options:
 options bonding mode=6 miimon=100 updelay=15000 max_bonds=2

I know this doesn't appear with 2.6.11.12, and that it did in a 2.6.12, although
I don't know which precisely :(

I'll attach my .config

Steps to reproduce: just boot :p

------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.

Comment 3 Herbert Xu 2005-08-17 19:27:42 UTC
Andrew Morton <akpm@osdl.org> wrote:
> [  158.023763] RTNL: assertion failed at net/ipv4/devinet.c (962)
> [  158.023819]
> [  158.023820] Call Trace: <IRQ> <ffffffff80273c03>{rt_run_flush+48}
> <ffffffff8029d6c8>{inetdev_event+116}
> [  158.023964]        <ffffffff80273c4e>{rt_run_flush+123}
> <ffffffff8013f087>{notifier_call_chain+31}
> [  158.024094]        <ffffffff802610f8>{dev_set_mac_address+84}
> <ffffffff88035740>{:bonding:alb_set_slave_mac_addr+76}
> [  158.024233]        <ffffffff8803580f>{:bonding:alb_swap_mac_addr+170}
> <ffffffff8802f0b8>{:bonding:bond_change_active_slave+546}
> [  158.024373]        <ffffffff8802f997>{:bonding:bond_mii_monitor+1012}
> <ffffffff8802f5a3>{:bonding:bond_mii_monitor+0}
> [  158.024507]        <ffffffff8013a86a>{run_timer_softirq+384}
> <ffffffff80136d42>{__do_softirq+110}

Yes this is a known problem.  The MAC address manipulations need to
be moved to process context and put under the protection of the RTNL.

Jay, have you had any luck in rewriting this stuff?

Thanks,
Comment 4 Anonymous Emailer 2005-08-17 23:54:07 UTC
Reply-To: fubar@us.ibm.com

Herbert Xu <herbert@gondor.apana.org.au> wrote:

>Andrew Morton <akpm@osdl.org> wrote:
>> <ffffffff88035740>{:bonding:alb_set_slave_mac_addr+76}
>> [  158.024233]        <ffffffff8803580f>{:bonding:alb_swap_mac_addr+170}
>> <ffffffff8802f0b8>{:bonding:bond_change_active_slave+546}
>> [  158.024373]        <ffffffff8802f997>{:bonding:bond_mii_monitor+1012}
>> <ffffffff8802f5a3>{:bonding:bond_mii_monitor+0}
>> [  158.024507]        <ffffffff8013a86a>{run_timer_softirq+384}
>> <ffffffff80136d42>{__do_softirq+110}
>
>Yes this is a known problem.  The MAC address manipulations need to
>be moved to process context and put under the protection of the RTNL.
>
>Jay, have you had any luck in rewriting this stuff?

	Some; I've been working on it over time since we discussed it
last time, and have gone through a couple of prototypes that didn't work
out.  I think I've got an overall reworking of things (link monitoring,
that yucky ioctl call thing, enslave, deslave, change active, linkwatch)
that will permit the needed locking rearrangement.  If all else stays
calm, I was planning to float some test patches hopefully next week
(among other things, I need to merge up to the current mainline).

	I hadn't seen that particular RTNL assert failure before, but
the s390 qeth driver hits this problem as well because it (like the USB
drivers) can sleep in places that most PCI drivers don't.

	-J

---
	-Jay Vosburgh, IBM Linux Technology Center, fubar@us.ibm.com

Comment 5 Herbert Xu 2005-08-17 23:59:14 UTC
On Wed, Aug 17, 2005 at 11:53:31PM -0700, Jay Vosburgh wrote:
> 
> 	Some; I've been working on it over time since we discussed it
> last time, and have gone through a couple of prototypes that didn't work
> out.  I think I've got an overall reworking of things (link monitoring,
> that yucky ioctl call thing, enslave, deslave, change active, linkwatch)
> that will permit the needed locking rearrangement.  If all else stays
> calm, I was planning to float some test patches hopefully next week
> (among other things, I need to merge up to the current mainline).

Sounds good.

> 	I hadn't seen that particular RTNL assert failure before, but
> the s390 qeth driver hits this problem as well because it (like the USB
> drivers) can sleep in places that most PCI drivers don't.

The RTNL assertion failure is due to the CHANGEADDR notification which
was added around the end of May.

Cheers,
Comment 6 Stefan Praszalowicz 2006-05-07 06:42:54 UTC
FYI, this is still here

[  169.880990] RTNL: assertion failed at net/ipv4/devinet.c (985)
[  169.881098]
[  169.881099] Call Trace: <IRQ> <ffffffff802aee6f>{_spin_lock_bh+22}
[  169.881446]        <ffffffff80299369>{inetdev_event+95}
<ffffffff802710b6>{rt_run_flush+151}
[  169.881830]        <ffffffff801385e1>{notifier_call_chain+33}
<ffffffff8025c4bb>{dev_set_mac_address+83}
[  169.882213]        <ffffffff880b4fd7>{:bonding:alb_set_slave_mac_addr+74}
[  169.882448]        <ffffffff880b5428>{:bonding:alb_swap_mac_addr+139}
<ffffffff802603ad>{dev_mc_add+266}
[  169.882828]        <ffffffff880b01f5>{:bonding:bond_change_active_slave+491}
[  169.883057]        <ffffffff880b1d34>{:bonding:bond_mii_monitor+976}
<ffffffff880b1964>{:bonding:bond_mii_monitor+0}
[  169.883446]        <ffffffff801343e7>{run_timer_softirq+343}
<ffffffff80130bd1>{__do_softirq+92}
[  169.883820]        <ffffffff8010bc8a>{call_softirq+30}
<ffffffff8010d3cc>{do_softirq+44}
[  169.884195]        <ffffffff80109b3d>{mwait_idle+0}
<ffffffff8010b62e>{apic_timer_interrupt+98} <EOI>
[  169.884628]        <ffffffff80109b73>{mwait_idle+54}
<ffffffff80109b1e>{cpu_idle+107}
[  169.885000]        <ffffffff803c7416>{start_secondary+1235}
[  169.917795] RTNL: assertion failed at net/ipv4/devinet.c (985)
[  169.917901]
[  169.917902] Call Trace: <IRQ> <ffffffff80271069>{rt_run_flush+74}
[  169.918250]        <ffffffff80299369>{inetdev_event+95}
<ffffffff802710b6>{rt_run_flush+151}
[  169.918624]        <ffffffff801385e1>{notifier_call_chain+33}
<ffffffff8025c4bb>{dev_set_mac_address+83}
[  169.919000]        <ffffffff880b4fd7>{:bonding:alb_set_slave_mac_addr+74}
[  169.919231]        <ffffffff880b543a>{:bonding:alb_swap_mac_addr+157}
<ffffffff802603ad>{dev_mc_add+266}
[  169.919610]        <ffffffff880b01f5>{:bonding:bond_change_active_slave+491}
[  169.919838]        <ffffffff880b1d34>{:bonding:bond_mii_monitor+976}
<ffffffff880b1964>{:bonding:bond_mii_monitor+0}
[  169.920227]        <ffffffff801343e7>{run_timer_softirq+343}
<ffffffff80130bd1>{__do_softirq+92}
[  169.920601]        <ffffffff8010bc8a>{call_softirq+30}
<ffffffff8010d3cc>{do_softirq+44}
[  169.920970]        <ffffffff80109b3d>{mwait_idle+0}
<ffffffff8010b62e>{apic_timer_interrupt+98} <EOI>
[  169.921402]        <ffffffff80109b73>{mwait_idle+54}
<ffffffff80109b1e>{cpu_idle+107}
[  169.921770]        <ffffffff803c7416>{start_secondary+1235}
Comment 7 Natalie Protasevich 2007-07-18 16:45:40 UTC
Any update on this problem, is the bug still present in 2.6.22+?
Thanks.
Comment 8 Adrian Bunk 2007-09-07 15:14:28 UTC
Please reopen this bug if it's still present with kernel 2.6.22.

Note You need to log in before you can comment on or make changes to this bug.