Bug 5080
Summary: | bonding related oops on boot | ||
---|---|---|---|
Product: | Networking | Reporter: | Stefan Praszalowicz (stefan) |
Component: | Other | Assignee: | Arnaldo Carvalho de Melo (acme) |
Status: | REJECTED INSUFFICIENT_DATA | ||
Severity: | normal | CC: | marcus.gustafsson, protasnb |
Priority: | P2 | ||
Hardware: | i386 | ||
OS: | Linux | ||
Kernel Version: | 2.6.13-rc6 | Subsystem: | |
Regression: | Yes | Bisected commit-id: | |
Attachments: | kernel .config |
Description
Stefan Praszalowicz
2005-08-17 07:20:29 UTC
Created attachment 5656 [details]
kernel .config
Begin forwarded message: Date: Wed, 17 Aug 2005 07:20:36 -0700 From: bugme-daemon@kernel-bugs.osdl.org To: bugme-new@lists.osdl.org Subject: [Bugme-new] [Bug 5080] New: bonding related oops on boot http://bugzilla.kernel.org/show_bug.cgi?id=5080 Summary: bonding related oops on boot Kernel Version: 2.6.13-rc6 Status: NEW Severity: normal Owner: acme@conectiva.com.br Submitter: stefan@avedya.com Distribution: debian pure64 Hardware Environment: 4 way x86_64 Software Environment: linux 2.6.12, 2.6.13-rc6 Problem Description: I have a bond with two slave interfaces, both connected. On boot, when the bond gets initialized, I get the following oops: [ 136.773164] Ethernet Channel Bonding Driver: v2.6.3 (June 8, 2005) [ 136.773266] bonding: In ALB mode you might experience client disconnections upon reconnection of a link if the bonding module updelay parameter (15000 msec) is incompatible with the forwarding delay time of the switch [ 136.773427] bonding: MII link monitoring set to 100 ms [ 137.122235] bonding: bond0: enslaving eth0 as an active interface with a down link. [ 137.353781] bonding: bond0: enslaving eth1 as an active interface with a down link. [ 137.397579] e100: eth2: e100_watchdog: link up, 100Mbps, full-duplex [ 138.823615] NET: Registered protocol family 10 [ 138.824319] IPv6 over IPv4 tunneling driver [ 142.995176] tg3: eth0: Link is up at 1000 Mbps, full duplex. [ 142.995238] tg3: eth0: Flow control is on for TX and on for RX. [ 142.995294] bonding: bond0: link status up for interface eth0, enabling it in 15000 ms. [ 144.226482] tg3: eth1: Link is up at 1000 Mbps, full duplex. [ 144.226543] tg3: eth1: Flow control is on for TX and on for RX. [ 144.293858] bonding: bond0: link status up for interface eth1, enabling it in 15000 ms. [ 149.051679] eth0: no IPv6 routers present [ 149.311570] bond0: no IPv6 routers present [ 149.661411] eth1: no IPv6 routers present [ 149.781362] eth2: no IPv6 routers present [ 157.987577] bonding: bond0: link status definitely up for interface eth0. [ 157.987642] bonding: bond0: making interface eth0 the new active one. [ 158.023763] RTNL: assertion failed at net/ipv4/devinet.c (962) [ 158.023819] [ 158.023820] Call Trace: <IRQ> <ffffffff80273c03>{rt_run_flush+48} <ffffffff8029d6c8>{inetdev_event+116} [ 158.023964] <ffffffff80273c4e>{rt_run_flush+123} <ffffffff8013f087>{notifier_call_chain+31} [ 158.024094] <ffffffff802610f8>{dev_set_mac_address+84} <ffffffff88035740>{:bonding:alb_set_slave_mac_addr+76} [ 158.024233] <ffffffff8803580f>{:bonding:alb_swap_mac_addr+170} <ffffffff8802f0b8>{:bonding:bond_change_active_slave+546} [ 158.024373] <ffffffff8802f997>{:bonding:bond_mii_monitor+1012} <ffffffff8802f5a3>{:bonding:bond_mii_monitor+0} [ 158.024507] <ffffffff8013a86a>{run_timer_softirq+384} <ffffffff80136d42>{__do_softirq+110} [ 158.024631] <ffffffff8010ec07>{call_softirq+31} <ffffffff801106a1>{do_softirq+54} [ 158.024752] <ffffffff8010e3b6>{apic_timer_interrupt+98} <EOI> <ffffffff801f7cbc>{acpi_walk_namespace+117} [ 158.024886] <ffffffff8010bf4b>{mwait_idle+86} <ffffffff80207694>{acpi_processor_idle+298} [ 158.025011] <ffffffff8010bedb>{cpu_idle+76} <ffffffff803fc708>{start_kernel+372} [ 158.025130] <ffffffff803fc216>{_sinittext+534} [ 158.061376] RTNL: assertion failed at net/ipv4/devinet.c (962) [ 158.061431] [ 158.061432] Call Trace: <IRQ> <ffffffff80273c32>{rt_run_flush+95} <ffffffff8029d6c8>{inetdev_event+116} [ 158.061569] <ffffffff80273c4e>{rt_run_flush+123} <ffffffff8013f087>{notifier_call_chain+31} [ 158.061693] <ffffffff802610f8>{dev_set_mac_address+84} <ffffffff88035740>{:bonding:alb_set_slave_mac_addr+76} [ 158.061826] <ffffffff88035821>{:bonding:alb_swap_mac_addr+188} <ffffffff8802f0b8>{:bonding:bond_change_active_slave+546} [ 158.061963] <ffffffff8802f997>{:bonding:bond_mii_monitor+1012} <ffffffff8802f5a3>{:bonding:bond_mii_monitor+0} [ 158.062096] <ffffffff8013a86a>{run_timer_softirq+384} <ffffffff80136d42>{__do_softirq+110} [ 158.062219] <ffffffff8010ec07>{call_softirq+31} <ffffffff801106a1>{do_softirq+54} [ 158.062338] <ffffffff8010e3b6>{apic_timer_interrupt+98} <EOI> <ffffffff801f7cbc>{acpi_walk_namespace+117} [ 158.062470] <ffffffff8010bf4b>{mwait_idle+86} <ffffffff80207694>{acpi_processor_idle+298} [ 158.062593] <ffffffff8010bedb>{cpu_idle+76} <ffffffff803fc708>{start_kernel+372} [ 158.062711] <ffffffff803fc216>{_sinittext+534} [ 159.366930] bonding: bond0: link status definitely up for interface eth1. The bond has the following options: options bonding mode=6 miimon=100 updelay=15000 max_bonds=2 I know this doesn't appear with 2.6.11.12, and that it did in a 2.6.12, although I don't know which precisely :( I'll attach my .config Steps to reproduce: just boot :p ------- You are receiving this mail because: ------- You are on the CC list for the bug, or are watching someone who is. Andrew Morton <akpm@osdl.org> wrote: > [ 158.023763] RTNL: assertion failed at net/ipv4/devinet.c (962) > [ 158.023819] > [ 158.023820] Call Trace: <IRQ> <ffffffff80273c03>{rt_run_flush+48} > <ffffffff8029d6c8>{inetdev_event+116} > [ 158.023964] <ffffffff80273c4e>{rt_run_flush+123} > <ffffffff8013f087>{notifier_call_chain+31} > [ 158.024094] <ffffffff802610f8>{dev_set_mac_address+84} > <ffffffff88035740>{:bonding:alb_set_slave_mac_addr+76} > [ 158.024233] <ffffffff8803580f>{:bonding:alb_swap_mac_addr+170} > <ffffffff8802f0b8>{:bonding:bond_change_active_slave+546} > [ 158.024373] <ffffffff8802f997>{:bonding:bond_mii_monitor+1012} > <ffffffff8802f5a3>{:bonding:bond_mii_monitor+0} > [ 158.024507] <ffffffff8013a86a>{run_timer_softirq+384} > <ffffffff80136d42>{__do_softirq+110} Yes this is a known problem. The MAC address manipulations need to be moved to process context and put under the protection of the RTNL. Jay, have you had any luck in rewriting this stuff? Thanks, Reply-To: fubar@us.ibm.com Herbert Xu <herbert@gondor.apana.org.au> wrote: >Andrew Morton <akpm@osdl.org> wrote: >> <ffffffff88035740>{:bonding:alb_set_slave_mac_addr+76} >> [ 158.024233] <ffffffff8803580f>{:bonding:alb_swap_mac_addr+170} >> <ffffffff8802f0b8>{:bonding:bond_change_active_slave+546} >> [ 158.024373] <ffffffff8802f997>{:bonding:bond_mii_monitor+1012} >> <ffffffff8802f5a3>{:bonding:bond_mii_monitor+0} >> [ 158.024507] <ffffffff8013a86a>{run_timer_softirq+384} >> <ffffffff80136d42>{__do_softirq+110} > >Yes this is a known problem. The MAC address manipulations need to >be moved to process context and put under the protection of the RTNL. > >Jay, have you had any luck in rewriting this stuff? Some; I've been working on it over time since we discussed it last time, and have gone through a couple of prototypes that didn't work out. I think I've got an overall reworking of things (link monitoring, that yucky ioctl call thing, enslave, deslave, change active, linkwatch) that will permit the needed locking rearrangement. If all else stays calm, I was planning to float some test patches hopefully next week (among other things, I need to merge up to the current mainline). I hadn't seen that particular RTNL assert failure before, but the s390 qeth driver hits this problem as well because it (like the USB drivers) can sleep in places that most PCI drivers don't. -J --- -Jay Vosburgh, IBM Linux Technology Center, fubar@us.ibm.com On Wed, Aug 17, 2005 at 11:53:31PM -0700, Jay Vosburgh wrote: > > Some; I've been working on it over time since we discussed it > last time, and have gone through a couple of prototypes that didn't work > out. I think I've got an overall reworking of things (link monitoring, > that yucky ioctl call thing, enslave, deslave, change active, linkwatch) > that will permit the needed locking rearrangement. If all else stays > calm, I was planning to float some test patches hopefully next week > (among other things, I need to merge up to the current mainline). Sounds good. > I hadn't seen that particular RTNL assert failure before, but > the s390 qeth driver hits this problem as well because it (like the USB > drivers) can sleep in places that most PCI drivers don't. The RTNL assertion failure is due to the CHANGEADDR notification which was added around the end of May. Cheers, FYI, this is still here [ 169.880990] RTNL: assertion failed at net/ipv4/devinet.c (985) [ 169.881098] [ 169.881099] Call Trace: <IRQ> <ffffffff802aee6f>{_spin_lock_bh+22} [ 169.881446] <ffffffff80299369>{inetdev_event+95} <ffffffff802710b6>{rt_run_flush+151} [ 169.881830] <ffffffff801385e1>{notifier_call_chain+33} <ffffffff8025c4bb>{dev_set_mac_address+83} [ 169.882213] <ffffffff880b4fd7>{:bonding:alb_set_slave_mac_addr+74} [ 169.882448] <ffffffff880b5428>{:bonding:alb_swap_mac_addr+139} <ffffffff802603ad>{dev_mc_add+266} [ 169.882828] <ffffffff880b01f5>{:bonding:bond_change_active_slave+491} [ 169.883057] <ffffffff880b1d34>{:bonding:bond_mii_monitor+976} <ffffffff880b1964>{:bonding:bond_mii_monitor+0} [ 169.883446] <ffffffff801343e7>{run_timer_softirq+343} <ffffffff80130bd1>{__do_softirq+92} [ 169.883820] <ffffffff8010bc8a>{call_softirq+30} <ffffffff8010d3cc>{do_softirq+44} [ 169.884195] <ffffffff80109b3d>{mwait_idle+0} <ffffffff8010b62e>{apic_timer_interrupt+98} <EOI> [ 169.884628] <ffffffff80109b73>{mwait_idle+54} <ffffffff80109b1e>{cpu_idle+107} [ 169.885000] <ffffffff803c7416>{start_secondary+1235} [ 169.917795] RTNL: assertion failed at net/ipv4/devinet.c (985) [ 169.917901] [ 169.917902] Call Trace: <IRQ> <ffffffff80271069>{rt_run_flush+74} [ 169.918250] <ffffffff80299369>{inetdev_event+95} <ffffffff802710b6>{rt_run_flush+151} [ 169.918624] <ffffffff801385e1>{notifier_call_chain+33} <ffffffff8025c4bb>{dev_set_mac_address+83} [ 169.919000] <ffffffff880b4fd7>{:bonding:alb_set_slave_mac_addr+74} [ 169.919231] <ffffffff880b543a>{:bonding:alb_swap_mac_addr+157} <ffffffff802603ad>{dev_mc_add+266} [ 169.919610] <ffffffff880b01f5>{:bonding:bond_change_active_slave+491} [ 169.919838] <ffffffff880b1d34>{:bonding:bond_mii_monitor+976} <ffffffff880b1964>{:bonding:bond_mii_monitor+0} [ 169.920227] <ffffffff801343e7>{run_timer_softirq+343} <ffffffff80130bd1>{__do_softirq+92} [ 169.920601] <ffffffff8010bc8a>{call_softirq+30} <ffffffff8010d3cc>{do_softirq+44} [ 169.920970] <ffffffff80109b3d>{mwait_idle+0} <ffffffff8010b62e>{apic_timer_interrupt+98} <EOI> [ 169.921402] <ffffffff80109b73>{mwait_idle+54} <ffffffff80109b1e>{cpu_idle+107} [ 169.921770] <ffffffff803c7416>{start_secondary+1235} Any update on this problem, is the bug still present in 2.6.22+? Thanks. Please reopen this bug if it's still present with kernel 2.6.22. |