Bug 89541 - bonding status file in /proc not removed when using bond-device as a slave
Summary: bonding status file in /proc not removed when using bond-device as a slave
Status: NEW
Alias: None
Product: Drivers
Classification: Unclassified
Component: Network (show other bugs)
Hardware: All Linux
: P1 normal
Assignee: drivers_network@kernel-bugs.osdl.org
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2014-12-10 19:17 UTC by Andy Gospodarek
Modified: 2016-03-19 17:24 UTC (History)
2 users (show)

See Also:
Kernel Version: 3.14.26
Subsystem:
Regression: No
Bisected commit-id:


Attachments

Description Andy Gospodarek 2014-12-10 19:17:22 UTC
(reported by Peter Schmitt)

I want to create a master-backup bond that has two LACP bonds as slaves. Both
LACP slaves should be connected to different switches so that I have
connectivity even if one switch fails.
While experimenting with this setup on the recent LTS kernel 3.14.26 I have found the following behaviour:
When I add a bond device (by default an LACP bond, mode 4) to a master-backup
bond (mode 1) and then remove it, the corresponding status file for the bond remains in
/proc/net/bonding/ and when I do a cat on this file, the machine crashes or I get a
general protection fault.

The following snippet creates such a scenario:

#!/bin/bash
echo +bond1 > /sys/class/net/bonding_masters
echo 1 > /sys/class/net/bond1/bonding/mode
echo +bond2 > /sys/class/net/bonding_masters
echo +bond2 > /sys/class/net/bond1/bonding/slaves
echo -bond2 > /sys/class/net/bond1/bonding/slaves
echo -bond2 > /sys/class/net/bonding_masters

After this is executed, the file /proc/net/bonding/bond2 still exists
while /sys/class/net/bonding_masters only shows bond1:

> ls -lah /proc/net/bonding/bond*
r--r--r-- 1 root root 0 Dec  8 16:53 /proc/net/bonding/bond1
r--r--r-- 1 root root 0 Dec  8 16:53 /proc/net/bonding/bond2

> cat /sys/class/net/bonding_masters
bond1

When I now make a "cat" on the file in /proc, I get a general protection fault
or even worse, the machine just crashes and is unresponsive and it can only be
fixed with a power-cycle.

> uname -a
Linux bondingtest 3.14.26-x86 #1 SMP Sun Dec 7 11:29:36 CET 2014 i686 GNU/Linux

The bonding module is loaded with the following options:
modprobe bonding miimon=100 max_bonds=0 mode=4 lacp_rate=1 xmit_hash_policy=layer2+3


> cat /proc/net/bonding/bond2
general protection fault: 0000 [#1] SMP·
Modules linked in: w83627hf hwmon_vid coretemp hwmon ip_set iptable_nat nf_nat_ipv4 ipt_REJECT nf_nat_irc nf_conntrack_irc nf_nat_ftp nf_nat nf_conntrack_ftp msr ipmi_devintf ipmi_msghandler ip_gre gre bonding pcspkr i3200_edac edac_core uhci_hcd ehci_pci ehci_hcd lpc_ich mfd_core pata_acpi ata_generic shpchp e1000e ptp pps_core [last unloaded: cpuid]
CPU: 0 PID: 31747 Comm: cat Not tainted 3.14.26-x86_64 #1
Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./To be filled by O.E.M., BIOS 080015  06/29/2009
task: ffff8800d61d77b0 ti: ffff8800d6068000 task.ti: ffff8800d6068000
RIP: 0010:[<ffffffffc00f7ef0>]  [<ffffffffc00f7ef0>] bond_info_seq_show+0x2d0/0x5e0 [bonding]
RSP: 0018:ffff8800d6069e08  EFLAGS: 00010212
RAX: 5f7367705f656c62 RBX: ffff880198570380 RCX: 0000000000000001
RDX: ffffffffc00f9547 RSI: ffffffffc00f954f RDI: ffff880198570380
RBP: ffff8800d6069e48 R08: ffffffff9413ed60 R09: ffff8800dba15cd4
R10: 0000000000000001 R11: 0000000000000000 R12: ffff8800d60917c0
R13: 656b636f6c6e756d R14: ffff880197d469c0 R15: ffff8800d6069e90
FS:  0000000000000000(0000) GS:ffff88019fc00000(0063) knlGS:00000000f761c8d0
CS:  0010 DS: 002b ES: 002b CR0: 000000008005003b
CR2: 000000000804ce10 CR3: 00000000db8cd000 CR4: 00000000000407f0
Stack:
ffff8800d6069e48 ffffffffc00f8318 ffff8801986ba9f0 ffff880197d469c0
ffff880198570380 0000000000000001 ffff880197d469c0 ffff8800d6069e90
ffff8800d6069ec8 ffffffff9413eed1 ffff880198289a58 0000000008213000
Call Trace:
[<ffffffffc00f8318>] ? bond_info_seq_start+0x28/0xa8 [bonding]
[<ffffffff9413eed1>] seq_read+0x171/0x3f0
[<ffffffff94175a7e>] proc_reg_read+0x3e/0x70
[<ffffffff9411e0e1>] vfs_read+0xa1/0x160
[<ffffffff9411e281>] SyS_read+0x51/0xc0
[<ffffffff94039c9c>] ? do_page_fault+0xc/0x10
[<ffffffff94516cdf>] sysenter_dispatch+0x7/0x1e
Code: 04 49 8b 55 00 48 c7 c6 2a 95 0f c0 48 89 df 31 c0 e8 d5 6c 04 d4 49 8b 04 24 48 c7 c2 47 95 0f c0 48 c7 c6 4f 95 0f c0 48 89 df <48> 8b 40 48 a8 04 48 c7 c0 4c 95 0f c0 48 0f 44 d0 31 c0 e8 a8·
RIP  [<ffffffffc00f7ef0>] bond_info_seq_show+0x2d0/0x5e0 [bonding]
RSP <ffff8800d6069e08>
---[ end trace 96fae3d9de6068c7 ]---
Segmentation fault

ver_linux:
Linux bondingtest 3.14.26-x86 #1 SMP Sun Dec 7 11:29:36 CET 2014 i686 GNU/Linux

Gnu C                  4.4.3
Gnu make              3.81
binutils              2.20.1
util-linux            2.17.2
mount                  support
module-init-tools      found
e2fsprogs              1.42.11
PPP                    2.4.5
Linux C Library        2.11.1
Dynamic linker (ldd)  2.11.1
Procps                3.2.8
Net-tools              1.60
Kbd                    1.15
Sh-utils              found
Modules Loaded        xt_TPROXY xt_set xt_socket nf_defrag_ipv6 xt_REDIRECT ip_set_hash_ip hwmon_vid hwmon bridge ip_set iptable_nat nf_nat_ipv4 ipt_REJECT nf_nat_irc nf_conntrack_irc nf_nat_ftp nf_nat nf_conntrack_ftp msr ipmi_devintf ipmi_msghandler ip_gre gre bonding pcspkr shpchp uhci_hcd ehci_pci ehci_hcd lpc_ich mfd_core pata_acpi ata_generic

If you have any questions or need more information or tests, I will gladly help you with that.

Thank you in advance.

Best regards,
Peter Schmitt
Comment 1 Andy Gospodarek 2014-12-10 19:18:50 UTC
We should probably either prevent stacked bonds or fix the panic when they are stacked.
Comment 2 Alexey Dobriyan 2015-02-20 20:53:05 UTC
reproduced
Comment 3 Alexey Dobriyan 2015-02-20 21:56:48 UTC
Technically this happens because interface is not IFF_BONDING
after bond2 is released from bond1.

static int bond_netdev_event(struct notifier_block *this,
                             unsigned long event, void *ptr)
{
        struct net_device *event_dev = netdev_notifier_info_to_dev(ptr);

        netdev_dbg(event_dev, "event: %lx\n", event);

        if (!(event_dev->priv_flags & IFF_BONDING))
                return NOTIFY_DONE;
 ...

so NETDEV_UNREGISTER event is ignored and proc entry is not removed.
Comment 4 Alexey Dobriyan 2015-02-20 21:58:07 UTC
static int __bond_release_one(struct net_device *bond_dev,
                              struct net_device *slave_dev,
                              bool all)
{

...

        slave_dev->priv_flags &= ~IFF_BONDING;

        bond_free_slave(slave);

        return 0;
}

Note You need to log in before you can comment on or make changes to this bug.