Distribution: debian Hardware Environment: Pentium IV Problem Description: Disabling the SO_TIMESTAMP sockopt on AF_PACKET sockets has no effect; the packets continue to have timestamps added to them, which means calling do_gettimeofday for every packet. Oprofile has shown this to be the dominant cost in using AF_PACKET sockets. Steps to reproduce: * Open an AF_PACKET socket. * Disable the SO_TIMESTAMP sockopt * Observe the timestamps as you read packets from it. The follwing patch was against 2.6.8 (the same code is in 2.6.12) and fixes this for the tpacket_rcv case: diff -urN kernel-source-2.6.8/net/packet/af_packet.c kernel-source-2.6.8-new/net/packet/af_packet.c --- kernel-source-2.6.8/net/packet/af_packet.c 2004-08-14 15:37:25.000000000 +1000 +++ kernel-source-2.6.8-new/net/packet/af_packet.c 2005-06-22 09:37:16.704601489 +1000 @@ -625,7 +625,7 @@ h->tp_snaplen = snaplen; h->tp_mac = macoff; h->tp_net = netoff; - if (skb->stamp.tv_sec == 0) { + if (sk->sk_rcvtstamp && skb->stamp.tv_sec == 0) { do_gettimeofday(&skb->stamp); sock_enable_timestamp(sk); } I'm not sufficiently familiar with the codepath that packet_rcv uses, so I'm not sure if the same bug exists there, or how to fix it (there's no direct do_gettimeofday call in that version).
Created attachment 5350 [details] Patch to disable timestamps when requested Does this patch fix the problem? Please keep in mind that if _any_ socket on the system requests timestamps, or you use ip_queue or anything else that requests timestamps, do_gettimeofday will still be called for all packets.
I haven't tested your patch, but my strong suspicion is that it will not fix the problem. The issue was that the AF_PACKET handler was ignoring the sk_rcvtstamp value in the sk structure; not that the value itself was being set incorrectly. In a brief inspection of the packet_recvmsg function in af_packet.c, an unguarded call to sock_recv_timestamp will inspect the sk->sk_rcvtstamp and then do a do_gettimeofday if that value is true. The equivalent code in tpacket_rcv works differently however; rather than calling sock_recv_timestamp it will stamp the packet itself, without inspecing the sk->sk_rcvtstamp value. I suppose the real fix is to make tpacket_rcv call sock_recv_timestamp. I appreciate that there are many use cases for packet timestamps, however my profiling showed that the cost of fetching those timestamps is huge (on my test system, around 80% of the cpu time was spent in get_offset_pmtmr when running memory-mapped tcpdump to capture packets off the wire. My patch plus disabling SO_TIMESTAMP on the pcap socket eliminated that time expenditure). If making a sockopt operate as documented will have the effect of allowing some use cases that don't need timestamps to avoid incurring the cost of fetching them, I see that as a win. What would be really cool is a way to set the accuracy of the timestamps; rarely do you need the accuracy that gettimeofday provides in networking code (and I think calling it actually takes longer than the microsecond accuracy it allegedly provides), but that's probably a separate discussion. The other thing to note here is that your patch suggests that the code to handle that sockopt has changed recently. On a 2.6.8 tree that I've got lying around, the sock_setsockopt function in net/core/sock.c uses the following code to handle SO_TIMESTAMP: case SO_TIMESTAMP: sk->sk_rcvtstamp = valbool; if (valbool) sock_enable_timestamp(sk); break; Which looks perfectly sane to me. Is it possible that a regression has been introduced here?
You're right, there seem to be two seperate bugs. I'll have a closer look tomorrow.
Patrick, what is the status of these issues?
Patrick?
A fix is heading to 2.6.24 to reduce the overhead by only stamping packets after the socket filter has run. Since the timestamp is part of the ABI, we really can't turn it off easily.
Hi Stephen, Which ABI is it a part of? Having an empty timestamp value surely doesn't affect the ABI? Surely it's equally bad to ignore the will of the socket's creator. Will the new behaviour you have suggested fix the case I initially pointed out as an example of why this is bad? Will tcpdump operate with more reasonable CPU usage?
Created attachment 12815 [details] Patch to disable timestamps if requested for net-2.6.24
Closing out old stale bugs