Bug 13556

Summary: Random oopses
Product: File System Reporter: Michael Uleysky (uleysky)
Component: ReiserFSAssignee: ReiseFS developers team (reiserfs-devel)
Status: RESOLVED CODE_FIX    
Severity: high CC: akpm, devzero, rjw
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 2.6.30 Subsystem:
Regression: No Bisected commit-id:
Attachments: Example kernel log, oopses and information about machine.
Trial patch

Description Michael Uleysky 2009-06-17 08:05:25 UTC
Created attachment 21959 [details]
Example kernel log, oopses and information about machine.

On my machine kernel constantly give oopses. This usually happens in random moments of time, even when the machine was not touched. But, I can always call oops, simply compile the kernel. I do this: make allyesconfig && make. Oops happens within a minute or two, if kernel directory located on disk. Sometimes kernel just freezes with no info in logs and in netconsole. I attach archive with example kernel log, some oopses and information about my machine. I see this problem from, at least, linux-2.6.28. Kernel is vanilla and compiled without modules support.

Kernel command-line: /boot/linux-2.6.30-netconsole root=/dev/sdb3 resume=/dev/sdb2 nmi_watchdog=1

cat /proc/version
Linux version 2.6.30-netconsole (root@Poincare) (gcc version 4.3.3 (GCC) ) #1 SMP Wed Jun 17 14:57:04 VLAST 2009

Output of ver_linux script
Gnu C                  4.3.3
Gnu make               3.81
binutils               2.17
util-linux             2.12r
mount                  2.12r
module-init-tools      3.2.2
e2fsprogs              1.39
reiserfsprogs          3.6.21
Linux C Library        2.5
Dynamic linker (ldd)   2.5
Linux C++ Library      6.0.10
Procps                 3.2.6
Kbd                    1.12
Sh-utils               6.7
udev                   105

mount
rootfs on / type rootfs (rw)
/dev/root on / type ext2 (ro,noatime,errors=panic)
none on /tmp type tmpfs (rw,nosuid,nodev,relatime,size=2097152k)
proc on /proc type proc (rw,relatime)
none on /dev type tmpfs (rw,nosuid,relatime,size=1024k)
none on /sys type sysfs (rw,relatime)
none on /dev/pts type devpts (rw,nosuid,noexec,relatime,mode=620)
usbfs on /proc/bus/usb type usbfs (rw,relatime)
none on /proc/fs/nfsd type nfsd (rw,relatime)
/dev/sdb6 on /usr type reiserfs (ro,nodev,noatime)
/dev/sdb5 on /var type reiserfs (rw,nosuid,nodev,relatime,data=journal)
/dev/sdb8 on /home type reiserfs (rw,nosuid,nodev,relatime,data=journal)
/dev/sdb7 on /home/common/cluster type ext2 (rw,noatime,errors=continue)
/dev/sdb8 on /home/common/culc_p type reiserfs (rw,nosuid,nodev,relatime,data=journal)
configfs on /config type configfs (rw,relatime)
Comment 1 Andrew Morton 2009-06-23 22:18:28 UTC
It looks like a reiserfs regression:

Jun 17 15:04:37 poincare kernel: BUG: unable to handle kernel NULL pointer dereference at 0000000000000018
Jun 17 15:04:37 poincare kernel: IP: [<ffffffff805c46ea>] _spin_lock_irq+0xa/0x20
Jun 17 15:04:37 poincare kernel: PGD 259926067 PUD 25987f067 PMD 0
Jun 17 15:04:37 poincare kernel: Oops: 0002 [#1] SMP
Jun 17 15:04:37 poincare kernel: last sysfs file: /sys/kernel/uevent_seqnum
Jun 17 15:04:37 poincare kernel: CPU 0
Jun 17 15:04:37 poincare kernel: Pid: 8270, comm: rm Not tainted 2.6.30-netconsole #1 EP45T-DS3
Jun 17 15:04:37 poincare kernel: RIP: 0010:[<ffffffff805c46ea>]  [<ffffffff805c46ea>] _spin_lock_irq+0xa/0x20
Jun 17 15:04:37 poincare kernel: RSP: 0018:ffff88025bf7b918  EFLAGS: 00010086
Jun 17 15:04:37 poincare kernel: RAX: 0000000000000100 RBX: ffffe20008127300 RCX: 0000000000000001
Jun 17 15:04:37 poincare kernel: RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000018
Jun 17 15:04:37 poincare kernel: RBP: ffff88025bf7b918 R08: ffff88025f2e1800 R09: 0000000002d296f9
Jun 17 15:04:37 poincare kernel: R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
Jun 17 15:04:37 poincare kernel: R13: 0000000000000000 R14: 0000000000000018 R15: ffff88025f2e1800
Jun 17 15:04:37 poincare kernel: FS:  00002b72db9186f0(0000) GS:ffff880028034000(0000) knlGS:0000000000000000
Jun 17 15:04:37 poincare kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
Jun 17 15:04:37 poincare kernel: CR2: 0000000000000018 CR3: 0000000259874000 CR4: 00000000000406e0
Jun 17 15:04:37 poincare kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Jun 17 15:04:37 poincare kernel: DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Jun 17 15:04:37 poincare kernel: Process rm (pid: 8270, threadinfo ffff88025bf7a000, task ffff88025bef9640)
Jun 17 15:04:37 poincare kernel: Stack:
Jun 17 15:04:37 poincare kernel:  ffff88025bf7b948 ffffffff802c1a40 ffff88024da158c0 0000000000000000
Jun 17 15:04:37 poincare kernel:  ffff88025bfc1e40 0000000000000000 ffff88025bf7b968 ffffffff802c1b36
Jun 17 15:04:37 poincare kernel:  0000000000000000 ffffc200098ee208 ffff88025bf7b9e8 ffffffff80316ab3
Jun 17 15:04:37 poincare kernel: Call Trace:
Jun 17 15:04:37 poincare kernel:  [<ffffffff802c1a40>] __set_page_dirty+0x30/0xd0
Jun 17 15:04:37 poincare kernel:  [<ffffffff802c1b36>] mark_buffer_dirty+0x56/0xa0
Jun 17 15:04:37 poincare kernel:  [<ffffffff80316ab3>] flush_commit_list+0x713/0x720
Jun 17 15:04:37 poincare kernel:  [<ffffffff80317ac6>] flush_journal_list+0x166/0x7e0
Jun 17 15:04:37 poincare kernel:  [<ffffffff80317d32>] flush_journal_list+0x3d2/0x7e0
Jun 17 15:04:37 poincare kernel:  [<ffffffff80318206>] flush_used_journal_lists+0xc6/0xe0
Jun 17 15:04:38 poincare kernel:  [<ffffffff8024c083>] ? queue_delayed_work_on+0xa3/0xd0
Jun 17 15:04:38 poincare kernel:  [<ffffffff80319181>] do_journal_end+0xf61/0x1100
Jun 17 15:04:38 poincare kernel:  [<ffffffff80300c82>] ? reiserfs_update_sd_size+0x2b2/0x2e0
Jun 17 15:04:38 poincare kernel:  [<ffffffff80318527>] ? do_journal_end+0x307/0x1100
Jun 17 15:04:38 poincare kernel:  [<ffffffff80319919>] do_journal_begin_r+0x289/0x340
Jun 17 15:04:38 poincare kernel:  [<ffffffff8024fa20>] ? autoremove_wake_function+0x0/0x40
Jun 17 15:04:38 poincare kernel:  [<ffffffff80319b86>] journal_begin+0x96/0x170
Jun 17 15:04:38 poincare kernel:  [<ffffffff80314102>] reiserfs_do_truncate+0x2b2/0x530
Jun 17 15:04:38 poincare kernel:  [<ffffffff803143ab>] reiserfs_delete_object+0x2b/0x80
Jun 17 15:04:38 poincare kernel:  [<ffffffff80303aca>] reiserfs_delete_inode+0xaa/0xf0
Jun 17 15:04:38 poincare kernel:  [<ffffffff80303a20>] ? reiserfs_delete_inode+0x0/0xf0
Jun 17 15:04:38 poincare kernel:  [<ffffffff802b2bb9>] generic_delete_inode+0x89/0x130
Jun 17 15:04:38 poincare kernel:  [<ffffffff802b2ce5>] generic_drop_inode+0x85/0x210
Jun 17 15:04:38 poincare kernel:  [<ffffffff802b1c6d>] iput+0x5d/0x70
Jun 17 15:04:38 poincare kernel:  [<ffffffff802a9bbf>] do_unlinkat+0x11f/0x1d0
Jun 17 15:04:38 poincare kernel:  [<ffffffff80253649>] ? up_read+0x9/0x10
Jun 17 15:04:38 poincare kernel:  [<ffffffff802272d7>] ? do_page_fault+0x167/0x280
Jun 17 15:04:38 poincare kernel:  [<ffffffff802a9dcd>] sys_unlinkat+0x1d/0x40
Jun 17 15:04:38 poincare kernel:  [<ffffffff8020b36b>] system_call_fastpath+0x16/0x1b
Comment 2 Roland Kletzing 2009-08-11 11:52:19 UTC
can you reproduce the problem by doing a kernel compile on any of your reiserfs filesystem, or does it only happen on a specific one ?
Comment 3 Michael Uleysky 2009-08-20 09:00:00 UTC
I have three reiserfs mounts (may be more, if needed) on /usr, /var and /home.
This is output of mount:
rootfs on / type rootfs (rw)
/dev/root on / type ext2 (ro,relatime,errors=continue)
none on /tmp type tmpfs (rw,nosuid,nodev,relatime,size=2097152k)
proc on /proc type proc (rw,relatime)
none on /dev type tmpfs (rw,nosuid,relatime,size=1024k)
none on /sys type sysfs (rw,relatime)
none on /dev/pts type devpts (rw,nosuid,noexec,relatime,mode=620)
usbfs on /proc/bus/usb type usbfs (rw,relatime)
none on /proc/fs/nfsd type nfsd (rw,relatime)
/dev/sdb6 on /usr type reiserfs (ro,nodev,noatime)
/dev/sdb5 on /var type reiserfs (rw,nosuid,nodev,relatime,data=journal)
/dev/sdb8 on /home type reiserfs (rw,nosuid,nodev,relatime,data=journal)
/dev/sdb7 on /home/common/cluster type ext2 (rw,noatime,errors=continue)
configfs on /config type configfs (rw,relatime)

I run kernel compilation on all three filesystems. On /var and /home crashes always happens after some time of compilation, but on /usr kernel compiles normally. Directories with sources was {/home/root/Kernels,/usr/local,/var}/linux-2.6.30. Running kernel was 2.6.30.5.
Comment 4 Roland Kletzing 2009-08-20 10:28:09 UTC
so, f you change /usr to have same fs mount options (i suspect atime/noatime or data=journal) can you produce the crash with /usr, too ?

if that is the case, can you please evaluate which specific mount option is causing it ?
Comment 5 Michael Uleysky 2009-08-20 12:42:52 UTC
I create another reiserfs filesystem and test it with different mount options.

rw,nosuid,nodev,relatime,data=journal    -   Crash
rw,nosuid,nodev,relatime                 -   Ok
rw,nodev,noatime                         -   Ok
rw,nodev,noatime,data=journal            -   Crash

So, problem, most probably, in data=journal option.
Comment 6 Roland Kletzing 2009-08-20 14:28:32 UTC
>So, problem, most probably, in data=journal option.

yes, very likely. 

possible related one:
http://marc.info/?t=124680365400002&r=1&w=2

that user is also using "data=journal"
Comment 7 Roland Kletzing 2009-08-20 15:30:45 UTC
probably a duplicate: http://bugzilla.kernel.org/show_bug.cgi?id=13876
Comment 8 Linus Torvalds 2009-08-21 19:54:26 UTC
Created attachment 22800 [details]
Trial patch

Hmm. Does this fix it?
Comment 9 Michael Uleysky 2009-08-22 01:01:48 UTC
Yes, fix. Thanks! I close this bug. Today is a beer day!