Most recent kernel where this bug did not occur: 2.6.14-rc5 , 2.6.14-rc4-mm1 Distribution: Debian GNU/Linux unstable Hardware Environment: Dual PIV Xeon HT (HP Proliant DL360 G4) Software Environment: ii libc6 2.3.5-7 GNU C Library: Shared libraries and Timezone ii libc6-dev 2.3.5-7 GNU C Library: Development Libraries and Hea ii libc6-i686 2.3.5-7 GNU C Library: Shared libraries [i686 optimi ii bin86 0.16.14-1.2 16-bit x86 assembler and loader ii gcc-4.0 4.0.2-3 The GNU C compiler ii gcc-4.0-base 4.0.2-3 The GNU Compiler Collection (base package) ns1:~# cc -v Using built-in specs. Target: i486-linux-gnu Configured with: ../src/configure -v --enable-languages=c,c++,java,f95,objc,ada,treelang --prefix=/usr --enable-shared --with-system-zlib --libexecdir=/usr/lib --without-included-gettext --enable-threads=posix --enable-nls --program-suffix=-4.0 --enable-__cxa_atexit --enable-libstdcxx-allocator=mt --enable-clocale=gnu --enable-libstdcxx-debug --enable-java-gc=boehm --enable-java-awt=gtk --enable-gtk-cairo --with-java-home=/usr/lib/jvm/java-1.4.2-gcj-4.0-1.4.2.0/jre --enable-mpfr --disable-werror --enable-checking=release i486-linux-gnu Thread model: posix gcc version 4.0.3 20051023 (prerelease) (Debian 4.0.2-3) Problem Description: BIND9 receives errno 14 and is unable to answer further queries. Oct 26 15:38:53 ns1 named[2197]: errno2result.c:109: unexpected error: Oct 26 15:38:53 ns1 named[2197]: unable to convert errno to isc_result: 14: Bad address Oct 26 15:38:53 ns1 named[2197]: UDP client handler shutting down due to fatal receive error: unexpected error I got this with BIND9 version 9.3.2beta1 which is running smoothly on my other machines and with 9.3.2beta2 and the debian bind9 package which provides 9.3.1 . I tried with and without CONFIG_PACKET_MMAP=y. The same BIND9 binary runs on 2.6.13.4 for 3 hours without problems now, on the abovementioned other two 2.6 kernels the problem occurred from 10m to 1h after BIND9 startup. I had no problems whatsoever with 2.6.14-rc4-mm1 running about a day of distributed.net's dnetc client. The machine is brand new and passes all BIOS checks. DNS-Workload of the machine is ~10 queries per second. It is holding 40k zones as secondary nameserver. I have machines running BIND 9.3.1beta1 recursive-only that are seeing ~1000 queries per second - i tell you this to explain why i tend to believe BIND9 is not the problem here.
Date: Mon, 31 Oct 2005 11:04:54 +0100 From: Stefan Schmidt <zaphodb@zaphods.net> To: Ian McDonald <imcdnzl@gmail.com> Cc: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>, Andrew Morton <akpm@osdl.org>, Herbert Xu <herbert@gondor.apana.org.au>, Arnaldo Carvalho de Melo <acme@conectiva.com.br>, "David S. Miller" <davem@davemloft.net>, netdev@vger.kernel.org, Linus Torvalds <torvalds@osdl.org> Subject: Re: [Bug 5505] New: BIND9 getting errno 14: Bad address On Fri, Oct 28, 2005 at 12:18:27AM +0200, Stefan Schmidt wrote: > > > > Stefan, are you able to obtain the `strace -f' output from a failur= e? > > > Please. =2E.. > I will try to get hold of another machine and load it with the same > installation and zone-data in order to reproduce the problem. > Getting the same machine model will be difficult as we are low on pool > machines but i'll try to aquire a PIV Xeon HT machine. It seems my strategy to forget about the week through some pints of guinness was a full success. I just remembered i haven't sent you my results yet this very morning, sorry: I was able to reproduce the bug with on another DL360 and kernel 2.6.14-rc5 and i have a full strace for you. This is the interesting part of it: 2539 18:31:57.648544 <... futex resumed> ) =3D 0 2539 18:31:57.648624 futex(0x818ab08, FUTEX_WAKE, 1) =3D 0 2539 18:31:57.648722 gettimeofday({1130517117, 648752}, NULL) =3D 0 2539 18:31:57.648797 recvmsg(20, {msg_name(16)=3D{sa_family=3DAF_INET, sin= _port=3Dhtons(4269), sin_addr=3Dinet_addr("222.255.121.130")}, msg_iov(1)= =3D[{"+\236\0\0\0\1\0\0\0\0\0\1\rextreme-event\2de\0\0\17"..., 4096}], msg_= con trollen=3D20, {cmsg_len=3D20, cmsg_level=3DSOL_SOCKET, cmsg_type=3D0x1d /* = SCM_??? */, ...}, msg_flags=3D0}, 0) =3D 45 2539 18:31:57.648930 recvmsg(20, 0xb764e3c8, 0) =3D -1 EFAULT (Bad address) 2539 18:31:57.648993 futex(0x8186eac, FUTEX_WAKE, 2147483647) =3D 0 2539 18:31:57.649086 time([1130517117]) =3D 1130517117 2539 18:31:57.649152 open("/etc/localtime", O_RDONLY) =3D -1 ENOENT (No su= ch file or directory) 2539 18:31:57.649233 open("/etc/localtime", O_RDONLY) =3D -1 ENOENT (No su= ch file or directory) 2539 18:31:57.649305 open("/etc/localtime", O_RDONLY) =3D -1 ENOENT (No su= ch file or directory) 2539 18:31:57.649391 send(3, "<27>Oct 28 16:31:57 named[2537]:"..., 70, MS= G_NOSIGNAL) =3D 70 2539 18:31:57.649478 time([1130517117]) =3D 1130517117 2539 18:31:57.649544 open("/etc/localtime", O_RDONLY) =3D -1 ENOENT (No su= ch file or directory) 2539 18:31:57.649617 open("/etc/localtime", O_RDONLY) =3D -1 ENOENT (No su= ch file or directory) 2539 18:31:57.649688 open("/etc/localtime", O_RDONLY) =3D -1 ENOENT (No su= ch file or directory) 2539 18:31:57.649769 send(3, "<27>Oct 28 16:31:57 named[2537]:"..., 87, MS= G_NOSIGNAL) =3D 87 2539 18:31:57.648930 recvmsg(20, 0xb764e3c8, 0) =3D -1 EFAULT (Bad address) 2539 18:31:57.648993 futex(0x8186eac, FUTEX_WAKE, 2147483647) =3D 0 2539 18:31:57.649086 time([1130517117]) =3D 1130517117 2539 18:31:57.649152 open("/etc/localtime", O_RDONLY) =3D -1 ENOENT (No su= ch file or directory) 2539 18:31:57.649233 open("/etc/localtime", O_RDONLY) =3D -1 ENOENT (No su= ch file or directory) 2539 18:31:57.649305 open("/etc/localtime", O_RDONLY) =3D -1 ENOENT (No su= ch file or directory) 2539 18:31:57.649391 send(3, "<27>Oct 28 16:31:57 named[2537]:"..., 70, MS= G_NOSIGNAL) =3D 70 2539 18:31:57.649478 time([1130517117]) =3D 1130517117 2539 18:31:57.649544 open("/etc/localtime", O_RDONLY) =3D -1 ENOENT (No su= ch file or directory) 2539 18:31:57.649617 open("/etc/localtime", O_RDONLY) =3D -1 ENOENT (No su= ch file or directory) 2539 18:31:57.649688 open("/etc/localtime", O_RDONLY) =3D -1 ENOENT (No su= ch file or directory) 2539 18:31:57.649769 send(3, "<27>Oct 28 16:31:57 named[2537]:"..., 87, MS= G_NOSIGNAL) =3D 87 2539 18:31:57.649852 futex(0x818ab40, FUTEX_WAKE, 1) =3D 1 2539 18:31:57.649954 recvmsg(20, <unfinished ...> 2539 18:31:57.650021 <... recvmsg resumed> 0xb764e3c8, 0) =3D -1 EAGAIN (R= esource temporarily unavailable) 2539 18:31:57.650088 write(8, "\24\0\0\0\375\377\377\377", 8 <unfinished .= =2E.> 2539 18:31:57.650209 <... write resumed> ) =3D 8 2539 18:31:57.650459 sendmsg(20, {msg_name(16)=3D{sa_family=3DAF_INET, sin= _port=3Dhtons(4269), sin_addr=3Dinet_addr("222.255.121.130")}, msg_iov(1)= =3D[{"+\236\204\0\0\1\0\1\0\2\0\3\rextreme-event\2de\0\0\17"..., 152}], msg= _co ntrollen=3D0, msg_flags=3D0}, 0 <unfinished ...> 2539 18:31:57.650625 <... sendmsg resumed> ) =3D 152 2539 18:31:57.650764 recvmsg(20, <unfinished ...> 2539 18:31:57.650880 <... recvmsg resumed> 0xb764cdc8, 0) =3D -1 EAGAIN (R= esource temporarily unavailable) 2539 18:31:57.650964 futex(0x818ab40, FUTEX_WAIT, 3815, NULL <unfinished .= =2E.> I only provided you with the output from pid 2539 here. Find the full strace -f -tt here: -> http://www.zaphods.net/~zaphodb/bigger.named.strace.bz2 but be warned: -rw-r--r-- 1 root root 256M 2005-10-28 18:38 bigger.named.strace -rw-r--r-- 1 root root 19M 2005-10-28 18:38 bigger.named.strace.bz2 > > I would also add are you able to use git-bisect to track down the > > changeset causing this. I've used this to find obscure bugs already > > and it is easy to use. > I will google on what git-bisect is and then tell you whether or not i am > able to. ;) I will have to give back the machine soon, i'll see what i can do. Stefan PS: tcpreplay rocks.
I had this error twice now on my Gentoo Linux System with - bind-9.3.1 - kernel 2.6.14 Since i upgraded both Kernel and Bind on the same day, i can't really say who's fault this is. Bind 9.3.1 has been out for quite some time, however... I will go back to 2.6.13 tonight and within a week i'll know if this has fixed it :-) Oct 30 19:04:08 [named] errno2result.c:109: unexpected error: Oct 30 19:04:08 [named] unable to convert errno to isc_result: 14: Bad address Oct 30 19:04:08 [named] UDP client handler shutting down due to fatal receive error: unexpected error Oct 31 13:44:44 [named] errno2result.c:109: unexpected error: Oct 31 13:44:44 [named] unable to convert errno to isc_result: 14: Bad address Oct 31 13:44:44 [named] UDP client handler shutting down due to fatal receive error: unexpected error
Created attachment 6453 [details] Patch to fix zero-length packet reception This patch fixes zero-length packet reception which currently results in EFAULT due to the rewrite of skb_copy_datagram_iovec.