Bug 35532 - Caught 64-bit read from uninitialized memory in process_one_work
Summary: Caught 64-bit read from uninitialized memory in process_one_work
Status: CLOSED CODE_FIX
Alias: None
Product: Memory Management
Classification: Unclassified
Component: Page Allocator (show other bugs)
Hardware: All Linux
: P1 normal
Assignee: Tejun Heo
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2011-05-21 08:52 UTC by Christian Casteyde
Modified: 2011-09-01 17:01 UTC (History)
1 user (show)

See Also:
Kernel Version: 2.6.39
Subsystem:
Regression: No
Bisected commit-id:


Attachments
Full dmesg output for 2.6.39 (39.28 KB, text/plain)
2011-05-21 08:53 UTC, Christian Casteyde
Details
lockdep-clear.patch (533 bytes, patch)
2011-07-14 06:17 UTC, Tejun Heo
Details | Diff

Description Christian Casteyde 2011-05-21 08:52:47 UTC
Acer Aspire 7750G
Slackware64 13.37
Core i7 2630QM in 64bits
6GB RAM

With Kernel 2.6.39 (and even some -rc before), but not tested with previous versions, I get the following warning at boot with kmemcheck and other hacking options activated:

...
pci 0000:00:16.0: PME# disabled
pci 0000:00:1a.0: [8086:1c2d] type 0 class 0x000c03
pci 0000:00:1a.0: reg 10: [mem 0xc130a000-0xc130a3ff]
WARNING: kmemcheck: Caught 64-bit read from uninitialized memory (ffff8801c67697d8)
0000000000000000b5e0a481ffffffff00000000000000000000000000000000
 i i i i i i i i i i i i i i i i i i i i u u u u u u u u u u u u
                                                 ^

Pid: 1, comm: swapper Not tainted 2.6.39 #1 Acer Aspire 7750G/JE70_HR
RIP: 0010:[<ffffffff810900e3>]  [<ffffffff810900e3>] process_one_work+0x83/0x450
RSP: 0018:ffff8801c651dd80  EFLAGS: 00010006
RAX: 0000000000000000 RBX: ffff8801c64ea240 RCX: 0000000000000000
RDX: ffff8801c79d6c00 RSI: ffff8801c6769790 RDI: ffff8801c64ea240
RBP: ffff8801c651de10 R08: 0000000000000400 R09: 0000000000000001
R10: 0000000000000000 R11: 0000000000000000 R12: ffff8801c6769790
R13: ffff8801c79d6c00 R14: ffff8801c780ddc0 R15: 000000000000000f
FS:  0000000000000000(0000) GS:ffff8801c7800000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: ffff8801c741faf8 CR3: 0000000001c03000 CR4: 00000000000406f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff4ff0 DR7: 0000000000000400
 [<ffffffff8109348c>] worker_thread+0x15c/0x330
 [<ffffffff81097ff6>] kthread+0xa6/0xb0
 [<ffffffff817a14d4>] kernel_thread_helper+0x4/0x10
 [<ffffffffffffffff>] 0xffffffffffffffff
pci 0000:00:1a.0: PME# supported from D0 D3hot D3cold
pci 0000:00:1a.0: PME# disabled
pci 0000:00:1b.0: [8086:1c20] type 0 class 0x000403
pci 0000:00:1b.0: reg 10: [mem 0xc1300000-0xc1303fff 64bit]
...
Other than that, everything works fine.
Full dmesg appended.

This may be the same as this bug I reported on another older laptop, since the call stack begins the same, but on my new laptop I only get the above:
https://bugzilla.kernel.org/show_bug.cgi?id=34882
Comment 1 Christian Casteyde 2011-05-21 08:53:55 UTC
Created attachment 58862 [details]
Full dmesg output for 2.6.39
Comment 2 Andrew Morton 2011-06-01 21:27:42 UTC
I think I'll assign this one to Tejun then run away :)
Comment 3 Christian Casteyde 2011-06-02 13:12:58 UTC
Update: still present in 3.0-rc1.
Comment 4 Tejun Heo 2011-06-10 12:18:37 UTC
Hmmm... I can't reproduce the problem here. Can you please determine which line the RIP corresponds to using gdb?  ie. build the kernel with debug info, run gdb on vmlinux and run "l *0xffffffff810900e3".  Also attaching disassembly of the offending function (process_one_work) would be helpful.

Thank you.
Comment 5 Christian Casteyde 2011-06-11 15:29:08 UTC
I've checked with 3.0-rc2, the problem is still there.
I've got several occurrences of this warning.
For this one:
PCI: Using host bridge windows from ACPI; if necessary, use "pci=nocrs" and report a bug
WARNING: kmemcheck: Caught 64-bit read from uninitialized memory (ffff8801c6783c58)
0000000000000000ee0caa81ffffffff00000000000000000000000000000000
 i i i i i i i i i i i i i i i i i i i i u u u u u u u u u u u u
                                                 ^

Pid: 1, comm: swapper Not tainted 3.0.0-rc2 #8 Acer Aspire 7750G/JE70_HR
RIP: 0010:[<ffffffff8108f3b3>]  [<ffffffff8108f3b3>] process_one_work+0x83/0x450
RSP: 0018:ffff8801c6507d80  EFLAGS: 00010002
RAX: 0000000000000000 RBX: ffff8801c7418300 RCX: 0000000000000000
RDX: ffff8801c79d5800 RSI: ffff8801c6783c10 RDI: ffff8801c7418300
RBP: ffff8801c6507e10 R08: ffff8801c64f8058 R09: 0000000000000001
R10: 0000000000000000 R11: 0000000000000000 R12: ffff8801c6783c10
R13: ffff8801c79d5800 R14: ffff8801c780ca00 R15: 000000000000003b
FS:  0000000000000000(0000) GS:ffff8801c7800000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: ffff8801c7430364 CR3: 0000000001c03000 CR4: 00000000000406f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff4ff0 DR7: 0000000000000400
 [<ffffffff810927cc>] worker_thread+0x15c/0x330
 [<ffffffff81097396>] kthread+0xa6/0xb0
 [<ffffffff817b3c54>] kernel_thread_helper+0x4/0x10
 [<ffffffffffffffff>] 0xffffffffffffffff
\_SB_.PCI0:_OSC invalid UUID
_OSC request data:1 8 1f

I get this:
gdb vmlinux
GNU gdb (GDB) 7.2
Copyright (C) 2010 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-slackware-linux".
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>...
Reading symbols from /usr/src/linux-2.6.39/vmlinux...rundone.
(gdb) l *0xffffffff8108f3b3
0xffffffff8108f3b3 is in process_one_work (kernel/workqueue.c:1815).
1810             * inside the function that is called from it, this we need to
1811             * take into account for lockdep too.  To avoid bogus "held
1812             * lock freed" warnings as well as problems when looking into
1813             * work->lockdep_map, make a copy and use that here.
1814             */
1815            struct lockdep_map lockdep_map = work->lockdep_map;
1816    #endif
1817            /*
1818             * A single work shouldn't be executed concurrently by
1819             * multiple workers on a single cpu.  Check whether anyone is
(gdb) disass 0xffffffff8108f3b3
Dump of assembler code for function process_one_work:
   0xffffffff8108f330 <+0>:     push   %rbp
   0xffffffff8108f331 <+1>:     mov    %rsp,%rbp
   0xffffffff8108f334 <+4>:     push   %r15
   0xffffffff8108f336 <+6>:     mov    %rsi,%r15
   0xffffffff8108f339 <+9>:     push   %r14
   0xffffffff8108f33b <+11>:    push   %r13
   0xffffffff8108f33d <+13>:    xor    %r13d,%r13d
   0xffffffff8108f340 <+16>:    push   %r12
   0xffffffff8108f342 <+18>:    mov    %rsi,%r12
   0xffffffff8108f345 <+21>:    push   %rbx
   0xffffffff8108f346 <+22>:    mov    %rdi,%rbx
   0xffffffff8108f349 <+25>:    sub    $0x68,%rsp
   0xffffffff8108f34d <+29>:    mov    (%rsi),%rax
   0xffffffff8108f350 <+32>:    mov    %rax,%rdx
   0xffffffff8108f353 <+35>:    and    $0xfffffffffffffe00,%rdx
   0xffffffff8108f35a <+42>:    test   $0x4,%al
   0xffffffff8108f35c <+44>:    mov    %rsi,%rax
   0xffffffff8108f35f <+47>:    cmovne %rdx,%r13
   0xffffffff8108f363 <+51>:    shr    $0x6,%rax
   0xffffffff8108f367 <+55>:    mov    0x0(%r13),%r14
   0xffffffff8108f36b <+59>:    shr    $0xc,%r15
   0xffffffff8108f36f <+63>:    add    %rax,%r15
   0xffffffff8108f372 <+66>:    mov    0x8(%r13),%rax
   0xffffffff8108f376 <+70>:    and    $0x3f,%r15d
   0xffffffff8108f37a <+74>:    mov    (%rax),%eax
   0xffffffff8108f37c <+76>:    add    $0x8,%r15
   0xffffffff8108f380 <+80>:    mov    %eax,-0x64(%rbp)
   0xffffffff8108f383 <+83>:    mov    0x18(%rsi),%rax
   0xffffffff8108f387 <+87>:    mov    %rax,-0x78(%rbp)
   0xffffffff8108f38b <+91>:    mov    0x20(%rsi),%rax
   0xffffffff8108f38f <+95>:    mov    %rax,-0x60(%rbp)
   0xffffffff8108f393 <+99>:    mov    0x28(%rsi),%rax
   0xffffffff8108f397 <+103>:   mov    %rax,-0x58(%rbp)
   0xffffffff8108f39b <+107>:   mov    0x30(%rsi),%rax
   0xffffffff8108f39f <+111>:   mov    %rax,-0x50(%rbp)
   0xffffffff8108f3a3 <+115>:   mov    0x38(%rsi),%rax
   0xffffffff8108f3a7 <+119>:   mov    %rax,-0x48(%rbp)
   0xffffffff8108f3ab <+123>:   mov    0x40(%rsi),%rax
   0xffffffff8108f3af <+127>:   mov    %rax,-0x40(%rbp)
   0xffffffff8108f3b3 <+131>:   mov    0x48(%rsi),%rax
   0xffffffff8108f3b7 <+135>:   mov    0x38(%r14,%r15,8),%rsi
   0xffffffff8108f3bc <+140>:   mov    %rax,-0x38(%rbp)
   0xffffffff8108f3c0 <+144>:   test   %rsi,%rsi
   0xffffffff8108f3c3 <+147>:   jne    0xffffffff8108f3d8 <process_one_work+168>
   0xffffffff8108f3c5 <+149>:   jmp    0xffffffff8108f400 <process_one_work+208>
   0xffffffff8108f3c7 <+151>:   nopw   0x0(%rax,%rax,1)
   0xffffffff8108f3d0 <+160>:   mov    (%rsi),%rsi
   0xffffffff8108f3d3 <+163>:   test   %rsi,%rsi
   0xffffffff8108f3d6 <+166>:   je     0xffffffff8108f400 <process_one_work+208>
   0xffffffff8108f3d8 <+168>:   cmp    0x10(%rsi),%r12
   0xffffffff8108f3dc <+172>:   jne    0xffffffff8108f3d0 <process_one_work+160>
   0xffffffff8108f3de <+174>:   mov    %r12,%rdi
   0xffffffff8108f3e1 <+177>:   add    $0x20,%rsi
   0xffffffff8108f3e5 <+181>:   xor    %edx,%edx
   0xffffffff8108f3e7 <+183>:   callq  0xffffffff8108ee80 <move_linked_works>
   0xffffffff8108f3ec <+188>:   add    $0x68,%rsp
   0xffffffff8108f3f0 <+192>:   pop    %rbx
   0xffffffff8108f3f1 <+193>:   pop    %r12
   0xffffffff8108f3f3 <+195>:   pop    %r13
   0xffffffff8108f3f5 <+197>:   pop    %r14
   0xffffffff8108f3f7 <+199>:   pop    %r15
   0xffffffff8108f3f9 <+201>:   leaveq 
   0xffffffff8108f3fa <+202>:   retq

The others are exactly at the same RIP.
Comment 6 Christian Casteyde 2011-07-13 18:46:15 UTC
Update: Still present in 3.0-rc7
Comment 7 Tejun Heo 2011-07-14 06:17:41 UTC
Created attachment 65552 [details]
lockdep-clear.patch

Sorry, missed the last post. Can you please test whether the attached patch makes the problem go away?

Thanks.
Comment 8 Christian Casteyde 2011-07-14 10:56:06 UTC
Yes, it works well.
Thanks
Comment 9 Tejun Heo 2011-07-14 13:20:54 UTC
Awesome. Patch posted upstream.

  http://article.gmane.org/gmane.linux.kernel/1167447

Thank you.
Comment 10 Florian Mickler 2011-08-08 08:17:24 UTC
A patch referencing this bug report has been merged in Linux v3.1-rc1:

commit f59de8992aa6dc85e81aadc26b0f69e17809721d
Author: Tejun Heo <tj@kernel.org>
Date:   Thu Jul 14 15:19:09 2011 +0200

    lockdep: Clear whole lockdep_map on initialization
Comment 11 Christian Casteyde 2011-09-01 17:01:32 UTC
Indeed I cannot reproduce since 3.1.
Closing

Note You need to log in before you can comment on or make changes to this bug.