Bug 9173 - BUG: soft lockup detected on CPU#0 - maybe related to TCP_MD5SIG
Summary: BUG: soft lockup detected on CPU#0 - maybe related to TCP_MD5SIG
Status: RESOLVED CODE_FIX
Alias: None
Product: Networking
Classification: Unclassified
Component: IPV4 (show other bugs)
Hardware: All Linux
: P1 normal
Assignee: Stephen Hemminger
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2007-10-17 07:34 UTC by Tore Anderson
Modified: 2007-10-29 22:52 UTC (History)
2 users (show)

See Also:
Kernel Version: 2.6.20-9-server-lp2
Subsystem:
Regression: ---
Bisected commit-id:


Attachments
Call traces (66.80 KB, text/x-log)
2007-10-17 07:35 UTC, Tore Anderson
Details

Description Tore Anderson 2007-10-17 07:34:49 UTC
Most recent kernel where this bug did not occur: 2.6.12 (with TCP_MD5SIG-implementation from http://hasso.linux.ee/doku.php/english:network:rfc2385)
Distribution: Ubuntu 6.06.1 LTS
Hardware Environment: Sun X4100 (x86_64, SMP)
Software Environment: Ubuntu 2.6.20-9-server-lp2 (-lp2 because it's recompiled with TCP_MD5SIG enabled), 64-bits userspace
Problem Description:

The server is a border router running Quagga for BGP and OSPF, and usually forwards 4-500Mbps worth of traffic between around 80 VLAN interfaces.  Four network interfaces, bonded pairwise.  It has three BGP sessions with MD5 signatures enabled.

The server has an identical twin (for failover) which has also locked up like this, although it happens much more frequently on the active one (no matter which one is active, unfortunately).  We've got lots of these servers, but only the border routers have had these lockups.

Once in a while (say, once every four to six weeks) it will flood the console with BUG: soft lockup detected on CPU#0! and shortly after fail completely.This time I had increased the default prink level one notch and got the back traces too.  I'm not used to reading those, but the md5sig stuff seems to stand out...

I'll try to attach the trace somehow (got an error message about the bug being to large when attempting to include it here).

Steps to reproduce:  It happens completely out of the blue, so I don't know how.

Tore
Comment 1 Tore Anderson 2007-10-17 07:35:49 UTC
Created attachment 13187 [details]
Call traces

The traces printed to the console when the server locks up
Comment 2 Stephen Hemminger 2007-10-29 22:52:22 UTC
This bug was just fixed.

commit 2c4f6219aca5939b57596278ea8b014275d4917b
Author: David S. Miller <davem@sunset.davemloft.net>
Date:   Tue Feb 20 23:51:47 2007 -0800

    [TCP]: Fix MD5 signature pool locking.
    
    The locking calls assumed that these code paths were only
    invoked in software interrupt context, but that isn't true.
    
    Therefore we need to use spin_{lock,unlock}_bh() throughout.
    
    Signed-off-by: David S. Miller <davem@davemloft.net>

Note You need to log in before you can comment on or make changes to this bug.