Most recent kernel where this bug did not occur: 2.6.12 (with TCP_MD5SIG-implementation from http://hasso.linux.ee/doku.php/english:network:rfc2385) Distribution: Ubuntu 6.06.1 LTS Hardware Environment: Sun X4100 (x86_64, SMP) Software Environment: Ubuntu 2.6.20-9-server-lp2 (-lp2 because it's recompiled with TCP_MD5SIG enabled), 64-bits userspace Problem Description: The server is a border router running Quagga for BGP and OSPF, and usually forwards 4-500Mbps worth of traffic between around 80 VLAN interfaces. Four network interfaces, bonded pairwise. It has three BGP sessions with MD5 signatures enabled. The server has an identical twin (for failover) which has also locked up like this, although it happens much more frequently on the active one (no matter which one is active, unfortunately). We've got lots of these servers, but only the border routers have had these lockups. Once in a while (say, once every four to six weeks) it will flood the console with BUG: soft lockup detected on CPU#0! and shortly after fail completely.This time I had increased the default prink level one notch and got the back traces too. I'm not used to reading those, but the md5sig stuff seems to stand out... I'll try to attach the trace somehow (got an error message about the bug being to large when attempting to include it here). Steps to reproduce: It happens completely out of the blue, so I don't know how. Tore
Created attachment 13187 [details] Call traces The traces printed to the console when the server locks up
This bug was just fixed. commit 2c4f6219aca5939b57596278ea8b014275d4917b Author: David S. Miller <davem@sunset.davemloft.net> Date: Tue Feb 20 23:51:47 2007 -0800 [TCP]: Fix MD5 signature pool locking. The locking calls assumed that these code paths were only invoked in software interrupt context, but that isn't true. Therefore we need to use spin_{lock,unlock}_bh() throughout. Signed-off-by: David S. Miller <davem@davemloft.net>