Bug 13006

Summary: Bonding: Holes in arp_ip_target list cause ARP replies to be ignored
Product: Drivers Reporter: Steve Howard (steve)
Component: NetworkAssignee: drivers_network (drivers_network)
Status: RESOLVED OBSOLETE    
Severity: normal CC: alan, kirr
Priority: P1    
Hardware: All   
OS: Linux   
URL: www.astutenetworks.com
Kernel Version: 2.6.25 Subsystem:
Regression: No Bisected commit-id:

Description Steve Howard 2009-04-03 20:17:37 UTC
I know there have been changes to bonding driver since 2.6.25, but
this bug is still there.

To Recreate:

Set up a bond as follows:
  mode active-backup
  arp_interval 1000
  arp_ip_target 1.1.1.1 2.2.2.2 (address 2 points to a reachable target)

/sys/class/net/bond0/bonding/active_slave contains an interface name

Now delete the first target (cat -1.1.1.1 >arp_ip_target)

The first target is deleted, but active_slave is now empty.

The proglem is in line 2611 of bond_main.c, in bond_validate_arp()

	for (i = 0; (i < BOND_MAX_ARP_TARGETS) && targets[i]; i++) {

This is the loop that searches the arp_ip_targets array for a match
with an incomming ARP.  The problem is that when we deleted the first
entry, it was set to zero.  This "for" loop exits before comparing with
the remaining entries in targets[].

One fix is to delete the " && targets[i]" from the termination conditions.
Then the loop will always search the full list.

A better fix is to make sure there are no empty holes in the target[]
array, beceause there may other code which expects no holes. 

This can be done by modifying this loop in bond_sysfs.c at line 792,
which is where entries are deleted in bonding_store_arp_targets().
This change deletes the hole just created by removing an entry.
 
		for (i = 0; (i < BOND_MAX_ARP_TARGETS); i++) {
			if (targets[i] == newtarget) {
				printk(KERN_INFO DRV_NAME
				       ": %s: removing ARP target %d.%d.%d.%d.\n",
				       bond->dev->name, NIPQUAD(newtarget));
				targets[i] = 0;
				done = 1;

				// Move any entries after this one into the newly
				// empty slot, so there are no holes.  Be careful not to access
				// more than BOND_MAX_ARP_TARGETS in the array.
				for (j = i; (j < (BOND_MAX_ARP_TARGETS-1)) && targets[j + 1]; j++) {
					targets[j] = targets[j + 1];
					targets[j + 1] = 0;
				}
			}
		}1


Here is the output of ver_linux:

 Linux caspianB012a 2.6.25-astute-v15 #19 SMP Fri Apr 3 09:32:28 PDT 2009 i686 GNU/Linux
 Gnu C                  18:
 /stim2/steve/work/acp/host/caspian/pd/kernel/scripts/ver_linux: line 24: ld: command not found
 binutils               
 util-linux             2.13
 mount                  2.13
 module-init-tools      3.2-pre1
 e2fsprogs              1.40
 Linux C Library        2.5.90
 /stim2/steve/work/acp/host/caspian/pd/kernel/scripts/ver_linux: line 69: ldd: command not found
 Procps                 3.2.7
 Net-tools              1.60
 Kbd                    82:
 Sh-utils               5.97
 udev                   105
 Modules Loaded         nfs lockd sunrpc pegasus iSCSITarget ScsiGatewayShim ScsiGateway an2000_l2pthru amod linux_user_bde linux_kernel_bde astute_hostapi_mod megaraid_sas an2k afpga e1000e ipmi_poweroff ipmi_watchdog ipmi_devintf ipmi_si ipmi_msghandler
Comment 1 Andrew Morton 2009-04-09 21:22:32 UTC
(switched to email.  Please respond via emailed reply-to-all, not via the
bugzilla web interface).

On Fri, 3 Apr 2009 20:17:37 GMT
bugzilla-daemon@bugzilla.kernel.org wrote:

> http://bugzilla.kernel.org/show_bug.cgi?id=13006
> 
>                URL: www.astutenetworks.com
>            Summary: Bonding: Holes in arp_ip_target list cause ARP replies
>                     to be ignored
>            Product: Drivers
>            Version: 2.5
>     Kernel Version: 2.6.25
>           Platform: All
>         OS/Version: Linux
>               Tree: Mainline
>             Status: NEW
>           Severity: normal
>           Priority: P1
>          Component: Network
>         AssignedTo: drivers_network@kernel-bugs.osdl.org
>         ReportedBy: steve@astutenetworks.com
>         Regression: No
> 
> 
> I know there have been changes to bonding driver since 2.6.25, but
> this bug is still there.
> 
> To Recreate:
> 
> Set up a bond as follows:
>   mode active-backup
>   arp_interval 1000
>   arp_ip_target 1.1.1.1 2.2.2.2 (address 2 points to a reachable target)
> 
> /sys/class/net/bond0/bonding/active_slave contains an interface name
> 
> Now delete the first target (cat -1.1.1.1 >arp_ip_target)
> 
> The first target is deleted, but active_slave is now empty.
> 
> The proglem is in line 2611 of bond_main.c, in bond_validate_arp()
> 
>     for (i = 0; (i < BOND_MAX_ARP_TARGETS) && targets[i]; i++) {
> 
> This is the loop that searches the arp_ip_targets array for a match
> with an incomming ARP.  The problem is that when we deleted the first
> entry, it was set to zero.  This "for" loop exits before comparing with
> the remaining entries in targets[].
> 
> One fix is to delete the " && targets[i]" from the termination conditions.
> Then the loop will always search the full list.
> 
> A better fix is to make sure there are no empty holes in the target[]
> array, beceause there may other code which expects no holes. 
> 
> This can be done by modifying this loop in bond_sysfs.c at line 792,
> which is where entries are deleted in bonding_store_arp_targets().
> This change deletes the hole just created by removing an entry.
> 
>         for (i = 0; (i < BOND_MAX_ARP_TARGETS); i++) {
>             if (targets[i] == newtarget) {
>                 printk(KERN_INFO DRV_NAME
>                        ": %s: removing ARP target %d.%d.%d.%d.\n",
>                        bond->dev->name, NIPQUAD(newtarget));
>                 targets[i] = 0;
>                 done = 1;
> 
>                 // Move any entries after this one into the newly
>                 // empty slot, so there are no holes.  Be careful not to
>                 access
>                 // more than BOND_MAX_ARP_TARGETS in the array.
>                 for (j = i; (j < (BOND_MAX_ARP_TARGETS-1)) && targets[j + 1];
> j++) {
>                     targets[j] = targets[j + 1];
>                     targets[j + 1] = 0;
>                 }
>             }
>         }1
> 
> 
> Here is the output of ver_linux:
> 
>  Linux caspianB012a 2.6.25-astute-v15 #19 SMP Fri Apr 3 09:32:28 PDT 2009
>  i686
> GNU/Linux
>  Gnu C                  18:
>  /stim2/steve/work/acp/host/caspian/pd/kernel/scripts/ver_linux: line 24: ld:
> command not found
>  binutils               
>  util-linux             2.13
>  mount                  2.13
>  module-init-tools      3.2-pre1
>  e2fsprogs              1.40
>  Linux C Library        2.5.90
>  /stim2/steve/work/acp/host/caspian/pd/kernel/scripts/ver_linux: line 69:
>  ldd:
> command not found
>  Procps                 3.2.7
>  Net-tools              1.60
>  Kbd                    82:
>  Sh-utils               5.97
>  udev                   105
>  Modules Loaded         nfs lockd sunrpc pegasus iSCSITarget ScsiGatewayShim
> ScsiGateway an2000_l2pthru amod linux_user_bde linux_kernel_bde
> astute_hostapi_mod megaraid_sas an2k afpga e1000e ipmi_poweroff ipmi_watchdog
> ipmi_devintf ipmi_si ipmi_msghandler
Comment 2 Alan 2012-10-30 16:46:17 UTC
If this is still seen on modern kernels then please re-open/update