Bluewhite64 13.0 Athlon 64 3000 in 64 bits mode Acer Aspire 1511LMi laptop Since 2.6.32-rc1 compiled with memcheck, I got the following error: b43-phy0: Loading firmware version 410.2160 (2007-05-26 15:32:10) ADDRCONF(NETDEV_UP): eth1: link is not ready WARNING: kmemcheck: Caught 8-bit read from freed memory (ffff8800429bb738) 0000000000000000000000000000000000000000000000001f20010000000000 f f f f f f f f f f f f f f f f f f f f f f f f f f f f f f f f ^ Modules linked in: Pid: 5, comm: events/0 Not tainted 2.6.32-rc2 #3 Aspire 1510 RIP: 0010:[<ffffffff8133c7ce>] [<ffffffff8133c7ce>] b43_dma_tx+0x64e/0xbc0 RSP: 0018:ffff88004bbd7ca0 EFLAGS: 00010202 RAX: 0000000040000000 RBX: 0000000000c14020 RCX: 0000000000000001 RDX: 0000000000000000 RSI: 000000006000002a RDI: ffff88004f2d9930 RBP: ffff88004bbd7da0 R08: 0000000040c14020 R09: 0000000000000001 R10: 0000000000000001 R11: 0000000000000000 R12: 0000000000000020 R13: ffff88004255c720 R14: ffff8800429bbd00 R15: ffff8800429bb700 FS: 0000000001b1c860(0000) GS:ffffffff81783000(0000) knlGS:0000000000000000 CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b CR2: ffff88004f832550 CR3: 0000000040dec000 CR4: 00000000000006f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff4ff0 DR7: 0000000000000400 [<ffffffff813242ce>] b43_tx_work+0x5e/0x90 [<ffffffff8104dd42>] worker_thread+0x1d2/0x320 [<ffffffff81051f9e>] kthread+0x8e/0xa0 [<ffffffff8100c1ca>] child_rip+0xa/0x20 [<ffffffffffffffff>] 0xffffffffffffffff eth1: deauthenticating by local choice (reason=3) eth1: direct probe to AP 00:07:cb:55:12:b0 (try 1) eth1: direct probe to AP 00:07:cb:55:12:b0 (try 1) eth1 direct probe responded eth1: authenticate with AP 00:07:cb:55:12:b0 (try 1) eth1: authenticated eth1: associate with AP 00:07:cb:55:12:b0 (try 1) eth1: RX AssocResp from 00:07:cb:55:12:b0 (capab=0x411 status=0 aid=2) eth1: associated ADDRCONF(NETDEV_CHANGE): eth1: link becomes ready After that, everything works nice.
A little bit more information is required. Does this happen on interface up/down only or does it happen in operation? Is it possible for you to track down the b43_dma_tx+0x64e/0xbc0 to the line of C code where this happens?
It's just at the first association. After that, the warning doesn't occur anymore (the variable may be initialized). I added some printk and found the faulty line was the last test in b43_dma_tx, that is: printk(KERN_INFO "b43_dma_tx 9\n"); ring->nr_tx_packets++; printk(KERN_INFO "b43_dma_tx A\n"); **** This line **** if ((free_slots(ring) < TX_SLOTS_PER_FRAME) || should_inject_overflow(ring)) { **** End of line **** printk(KERN_INFO "b43_dma_tx A1\n"); /* This TX ring is full. */ ieee80211_stop_queue(dev->wl->hw, skb_get_queue_mapping(skb)); printk(KERN_INFO "b43_dma_tx A2\n"); ring->stopped = 1; printk(KERN_INFO "b43_dma_tx A3\n"); if (b43_debug(dev, B43_DBG_DMAVERBOSE)) { printk(KERN_INFO "b43_dma_tx A4\n"); b43dbg(dev->wl, "Stopped TX ring %d\n", ring->index); printk(KERN_INFO "b43_dma_tx A5\n"); } } printk(KERN_INFO "b43_dma_tx B\n"); out: In my logs, I get: b43_dma_tx A b43_dma_tx B but no A{1-n} so ring may be the faulty variable
Are you sure this isn't a bug in kmemcheck? It's reporting an 8bit-read from feed memory, but ring is only freed on interface shutdown. And free_slots() checks two 32bit variables, so I think this can't trigger an 8bit read error. Do you have CONFIG_B43_DEBUG enabled? If not, should_inject_overflow() compiles to nothing. But even if you have CONFIG_B43_DEBUG defined, I don't see an 8bit read in that function. How does kmemcheck detect whether it's an 8bit read?
Uhm, wait a second (I just read your message again), what did you actually check in your test? The if ((free_slots(ring) < TX_SLOTS_PER_FRAME) || should_inject_overflow(ring)) { check is _not_ supposed to trigger often. It's an unlikely branch. You won't hit it in practice very often, so it's expected that you don't see a A1/2/3/4/5 message. You need to check what triggers the 8-bit read error.
I'm not sure of anything about kmemcheck. It may have bugs of course, I don't know how it works: I only activated it and tried to see which line was faulty. I was looking for the line which was triggering the problem. I added printk before and after each line of b43_dma_tx, and found the problem was in the test (I didn't cut in two subtests to see if free_slots or should_inject_overflow was responsible of the problem). Indeed, I added A1 etc. in the if branch only to see if the warning was issued by a line in the test. As you said it, the flow doesn't go inside the block controlled by the if, since A1 doesn't occur. So we know the warning is preceded by A and followed by B (which is after the test in the execution flow). So this is really the code in the test that triggers the problem. Now, I don't know what it does, however the problem occurs between the call to free_slots and the end of shoud_inject (since this is an "or"). These two functions may do anything they do, and I don't know what ring is. However, something annoys me semantically speaking: if "free_slots" is supposed to free something in ring from its name, shoud_inject_overflow should'nt use ring anymore as it is executed second. That is coherent with kmemcheck warning. But of course, once again, I don't know at all the b43 code, just trying to help.
(In reply to comment #5) > However, something annoys me semantically speaking: if "free_slots" is > supposed > to free something in ring No it isn't. It returns the number of free slots in the ring. It's pretty obvious, if you look at the trivial implementation. > from its name, shoud_inject_overflow should'nt use > ring anymore as it is executed second. ??? should_inject_overflow is a debugging function that is completely optimized out, if you disable debugging. > That is coherent with kmemcheck warning. > But of course, once again, I don't know at all the b43 code, just trying to > help. I'm pretty sure the b43 code is ok.
OK, so I'll redo this test with laters -rc, to see if this is a kmemcheck false positive or something that get fixed elsewhere, since I cannot do anything more for this. For should_inject_overflow, if I remove kernel debugging (at least kmemcheck) I cannot see the problem ;-) That's the goal of my checks! The worse that could happen is a block freed erratically by another subsystem (and in this case other people will experience real troubles).
(In reply to comment #7) > For should_inject_overflow, if I remove kernel debugging (at least kmemcheck) > I > cannot see the problem ;-) That's the goal of my checks! Well, my suggestion was to disable b43-debugging, not kmemcheck debugging. b43-debugging is CONFIG_B43_DEBUG. If the oops still happens with b43-debugging disabled (and kmemcheck enabled), it narrows it down to two 32-bit reads. However, I can't see how they could be freed from within full operation, except of course, if another unrelated driver is freeing dangling pointers... The two 32bit integers are members of the ring structure. If we'd free that in operation, we'd have _much_ bigger problems than an 8bit read ;)
CONFIG_B43_DEBUG is already set to off in my .config. The only debug option I use when I check a new kernel are in the kernel hacking menu, so it's not should_inject_overflow. I'll try to narrow the problem in free_slots, but I've just finished a bisect on another one and I'm tired. Will check on next -rc now I think, next week.
How do you narrow it down, btw? I'm not sure if kmemcheck is synchronous. Does it work through trapping each mem access? Or is it some asynchronous timer based mechanism? I don't know.
I had the feeling it was synchronous, so putting some printk should be sufficient. I guess kmemcheck use the paging mechanisms to trap memory problems, like electric fence for user space software (or gflags under windows, to which I'm more accustomed at work). So the warning must be issued by a signal, handled by the VMM layer, and I'm quite sure paging this is synchronous (otherwise I don't understand how it can work reliably). But all this is still guess, kernel dev is not my party at all. I guess it catch only once the error by default, but there is a kernel option that seems to make it catch everytime (at each access). However when I tried this option, I didn't see any difference, so either I haven't understood what it was for, or the problem occurs only once (at boot apparently). When I tried to reassociate, I couldn't reproduce the problem: I get it only at boot (I should check with module loading/unloading too I think).
Can you provide a new warning (since you've changed the kernel) along with disassembly of the matching version of the function in question?
After some more tests, I'm quite confused and think also it's a kmemcheck bug. First, when trying to reproduce, I noticed I misinterpreted my printks results: the warning occurs ***after*** B, that is, the faulty line is the "out:" label! This is not the access to ring, but something else the compiler generated, or a bug in kmemcheck. To be sure ring is OK, I redid the test with the following code on -rc3: My printk are now the following at the end of the function: printk(KERN_INFO "b43_dma_tx 9\n"); ring->nr_tx_packets++; printk(KERN_INFO "b43_dma_tx A\n"); /* int nr_slots = ring->nr_slots; printk(KERN_INFO "b43_dma_tx A01\n"); int used_slots = ring->used_slots; printk(KERN_INFO "b43_dma_tx A02\n"); int ifs = free_slots(ring); printk(KERN_INFO "b43_dma_tx A02, %d=%d-%d\n", ifs, nr_slots, used_slots); */ if ((free_slots(ring) < TX_SLOTS_PER_FRAME) || should_inject_overflow(ring)) { printk(KERN_INFO "b43_dma_tx A1\n"); /* This TX ring is full. */ ieee80211_stop_queue(dev->wl->hw, skb_get_queue_mapping(skb)); ring->stopped = 1; if (b43_debug(dev, B43_DBG_DMAVERBOSE)) { b43dbg(dev->wl, "Stopped TX ring %d\n", ring->index); } } printk(KERN_INFO "b43_dma_tx B\n"); out: return err; I still get A, then B: b43-phy0: Loading firmware version 410.2160 (2007-05-26 15:32:10) ADDRCONF(NETDEV_UP): eth1: link is not ready b43_dma_tx 9 b43_dma_tx A b43_dma_tx B WARNING: kmemcheck: Caught 8-bit read from freed memory (ffff88004e754e38) 0000000000000000000000000000000000000000000000001f20010000000000 f f f f f f f f f f f f f f f f f f f f f f f f f f f f f f f f ^ Modules linked in: Pid: 5, comm: events/0 Not tainted 2.6.32-rc3 #4 Aspire 1510 RIP: 0010:[<ffffffff813366ab>] [<ffffffff813366ab>] b43_dma_tx+0x57b/0x930 RSP: 0018:ffff88004e961cd0 EFLAGS: 00010202 RAX: 0000000040000000 RBX: 0000000000c14020 RCX: 0000000000000001 RDX: 0000000000000000 RSI: 000000006000002a RDI: ffff88004fa02130 etc. If I uncomment the lines (OK, this is in C++ declaration style...), I get this: b43-phy0: Loading firmware version 410.2160 (2007-05-26 15:32:10) ADDRCONF(NETDEV_UP): eth1: link is not ready b43_dma_tx 9 b43_dma_tx A b43_dma_tx A01 b43_dma_tx A02 b43_dma_tx A02, 254=256-2 b43_dma_tx B WARNING: kmemcheck: Caught 8-bit read from freed memory (ffff88004d516838) 0000000000000000000000000000000000000000000000001f20010000000000 f f f f f f f f f f f f f f f f f f f f f f f f f f f f f f f f ^ Modules linked in: Pid: 3117, comm: irq/19-b43 Not tainted 2.6.32-rc3 #5 Aspire 1510 RIP: 0010:[<ffffffff813366ab>] [<ffffffff813366ab>] b43_dma_tx+0x57b/0x960 RSP: 0018:ffff88004e961cd0 EFLAGS: 00010202 RAX: 0000000040000000 RBX: 0000000000c14020 RCX: 0000000000000001 RDX: 0000000000000000 RSI: 000000006000002a RDI: ffff88004fa02130 RBP: ffff88004e961da0 R08: 0000000040c14020 R09: 0000000000000001 R10: 0000000000000001 R11: 0000000000000000 R12: ffff880000c14020 R13: ffff88004e7db240 R14: ffff88004d516100 R15: ffff88004d516800 FS: 0000000002142860(0000) GS:ffffffff8176d000(0000) knlGS:0000000000000000 CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b etc. So I always get "B" before the warning, and there is stricly no code after B. I tried to disassemble dma.o but I'm not sure it is the right solution. Therefore I prefer giving to you the dma.o file (with and without the commented code portion), you may be more used to get faulty asm code. Ah, since this can be a bad code generation, I'm on Bluewhite64 13, that uses: ../gcc-4.3.3/configure --prefix=/usr --libdir=/usr/lib --enable-shared --enable-bootstrap --enable-languages=ada,c,c++,fortran,java,objc --enable-threads=posix --enable-checking=release --with-system-zlib --disable-libunwind-exceptions --enable-__cxa_atexit --enable-libssp --with-gnu-ld --verbose --disable-multilib --target=x86_64-pc-linux --build=x86_64-pc-linux --host=x86_64-pc-linux and: GNU ld (Linux/GNU Binutils) 2.18.50.0.9.20080822 Quite strange bug indeed.
Created attachment 23318 [details] dma.o object file (code without the commented check on ring)
Created attachment 23319 [details] dma.o object file (code with the check on ring)
Created attachment 23320 [details] config file (for debug options I selected)
Created attachment 23539 [details] possible fix Please test
This fix seems to work. I've not managed to get the warning after several consecutive reboots. Thanks
fixed in 32-rc7
Closing.