Bug 5046

Summary: Switching elevator on-the-fly causes NULL pointer dereference oops
Product: IO/Storage Reporter: Kerin Millar (kfm)
Component: Block LayerAssignee: Jens Axboe (axboe)
Status: CLOSED CODE_FIX    
Severity: high CC: akpm, cp
Priority: P2    
Hardware: i386   
OS: Linux   
Kernel Version: 2.6.13-rc6 Subsystem:
Regression: --- Bisected commit-id:
Attachments: dmesg-with-oops.log
dotconfig
lspci-vvv
fdisk-l
mount
mdstat

Description Kerin Millar 2005-08-11 10:40:28 UTC
Distribution: Gentoo Linux (base system v1.6.13, stable "x86" except for the use
of gcc-3.4.4), kernel 2.6.13-rc6
Hardware Environment: Compaq Proliant ML370 G3
Software Environment: LVM2 / RAID5 (md) | amavisd-new, apache2, bind, clamd,
courier-imapd, isc-dhcp, hylafax, mysql, (open)ntpd, postfix, samba, squid,
(open)sshd, syslog-ng, vixie-cron, zope

Problem Description:

Firstly, note that the kernel was a stock 2.6.13-rc6 kernel except that
PAGE_OFFSET was modified to be 0xB0000000(UL) instead of 0xC0000000(UL) in
accordance with my usual practice (the machine has exactly 1G of RAM).

The machine currently has 4 SCSI hard disks where sda/sdb/sdc are participating
in a RAID5 array and sdd is independent. The kernel is set to boot with
"deadline" as the default elevator. It had been running for several days
whereupon I decided that I wanted to test the CFQ elevator. So I switched each
SCSI disk sequentially to use CFQ via the sysfs filesystem. Thereafter it ran
with CFQ for about 1.5 days then I decided to switch them back to the deadline
elevator using the same method. At that point the ooops occurred. I believe it
happened after changing the final device (/dev/sdd) although I cannot recall
exactly. It was done via ssh and I only noticed initially that I was kicked out
of root mode back to my regular account from which I had su'd. Then I realised
that things were amiss, checked dmesg and noticed the oops:

---
Unable to handle kernel NULL pointer dereference at virtual address 00000008
 printing eip:
b023424f
*pde = 00000000
Oops: 0000 [#1]
SMP
Modules linked in: iptable_filter ip_tables hangcheck_timer
CPU:    1
EIP:    0060:[<b023424f>]    Not tainted VLI
EFLAGS: 00010286   (2.6.13-rc6)
EIP is at deadline_exit_queue+0x7/0x40
eax: 00000008   ebx: 00000000   ecx: b022fd40   edx: b0234248
esi: b5ef9700   edi: b03da080   ebp: b03da4e0   esp: d93cdef4
ds: 007b   es: 007b   ss: 0068
Process bash (pid: 32727, threadinfo=d93cc000 task=ef26a020)
Stack: efccb0c0 b022a0a6 efcde344 b022a797 b0229ed3 00000000 efccb0c0 d93cdf30
       d93cdf31 b03da335 efcde344 b022a8b1 00000000 00000000 00000004 00716663
       00000000 00000000 00000000 b022a80d efcde344 c1ee0000 b03930a0 b022ea2e
Call Trace:
 [<b022a0a6>] elevator_exit+0x27/0x3e
 [<b022a797>] elevator_switch+0xa3/0x119
 [<b0229ed3>] elevator_find+0x60/0x67
 [<b022a8b1>] elv_iosched_store+0xa4/0xe5
 [<b022a80d>] elv_iosched_store+0x0/0xe5
 [<b022ea2e>] queue_attr_store+0x2c/0x3b
 [<b018ddd5>] flush_write_buffer+0x2e/0x3e
 [<b018de1d>] sysfs_write_file+0x38/0x5f
 [<b015434c>] vfs_write+0xd7/0x178
 [<b0154498>] sys_write+0x41/0x6a
 [<b0102b67>] sysenter_past_esp+0x54/0x75
Code: 25 95 fa ff 31 d2 85 c0 74 03 8b 50 14 89 d0 c3 8b 42 2c e8 db 94 fa ff 31
d2 85 c0 74 03 8b 50 14 89 d0 c3 53 8b 58 04 8d 43 08 <39> 43 08 75 20 8d 43 10
39 43 10 75 22 8b 43 48 e8 21 71 f0 ff
---

I rebooted the machine (over ssh) but it did come up successfully.
Unfortunately, the machine was meddled with before I physically attended to it
today but I know that:

(1) Some minor filesystem corruption resulted with root and /home filesystems
(2) The root filesystem corruption (at least) caused problems with
/etc/init.d/local (corrected by touch /forcefsck and reboot)
(3) The home filesystem corruption took mail services out of action (again,
corrected by touch /forcefsck and reboot)

With regard to (3), here's the final entry in /var/log/mail.log generated by
postfix that hints at the kind of corruption caused (this was occurring
consistently for all delivery attempts until fsck fixed things up):

---
Aug 11 14:31:42 serve postfix/local[7129]: 9D3721001A:
to=<sysadmin@recruit2recruit.net>, relay=local, delay=0, status=bounced (maildir
delivery failed: link to
/home/sysadmin/.maildir/new/1123767102.Vfd01I11051c.serve.r2r.local: Invalid
cross-device link)
---

Here is how the filesystem mounts were at the time of the oops:

---
/dev/sda3 on / type ext3 (rw,errors=remount-ro)
proc on /proc type proc (rw)
sysfs on /sys type sysfs (rw)
udev on /dev type tmpfs (rw,nosuid)
devpts on /dev/pts type devpts (rw)
/dev/sda1 on /boot type ext3 (rw)
/dev/vg/usr on /usr type ext3 (rw,nodev,noatime,barrier=1)
/dev/vg/home on /home type ext3 (rw,nosuid,nodev,noatime,barrier=1)
/dev/vg/shares on /shares type ext3 (rw,noexec,nosuid,nodev,noatime,barrier=1)
/dev/vg/opt on /opt type ext3 (rw,nosuid,nodev,noatime,barrier=1)
/dev/vg/var on /var type ext3 (rw,nodev,noatime,barrier=1)
/dev/vg/tmp on /tmp type ext3 (rw,nosuid,nodev,barrier=1)
/dev/vg/squid on /var/cache/squid type ext3
(rw,noexec,nosuid,nodev,noatime,data=writeback,barrier=1)
/dev/vg/backup on /mnt/backup type ext3 (rw,nosuid,nodev,noatime,barrier=1)
none on /dev/shm type tmpfs (rw)
usbfs on /proc/bus/usb type usbfs (rw,devmode=0664,devgid=85)
/dev on /mnt/backup/chroot/dev type none (rw,bind)
/usr/portage on /mnt/backup/chroot/usr/portage type none (rw,bind)
none on /mnt/backup/chroot/proc type proc (rw)
---


Steps to reproduce:

Obviously, I'm not keen to attempt to reproduce this! But the circumstances that
led to this bug were:

(1) Boot with elevator=deadline and run for a while.
(2) Switch block devices to CFQ via sysfs and run for a while.
(3) Switch block devices back to deadline ...
(4) Oops and minor filesystem corruption.

I'll attach information concerning my system to this bug.
Comment 1 Kerin Millar 2005-08-11 10:49:52 UTC
Created attachment 5604 [details]
dmesg-with-oops.log

Complete dmesg log (oops visible at tail end).
Comment 2 Kerin Millar 2005-08-11 10:51:04 UTC
Created attachment 5605 [details]
dotconfig

The .config file used for the 2.6.13-rc6 sources.
Comment 3 Kerin Millar 2005-08-11 10:51:30 UTC
Created attachment 5606 [details]
lspci-vvv

Output from the lspci -vvv command.
Comment 4 Kerin Millar 2005-08-11 10:52:21 UTC
Created attachment 5607 [details]
fdisk-l

Output from the fdisk -l command for /dev/sda, /dev/sdb, /dev/sdc and /dev/sdd.
Comment 5 Kerin Millar 2005-08-11 10:53:05 UTC
Created attachment 5608 [details]
mount

Active mounts (correct at time of oops).
Comment 6 Kerin Millar 2005-08-11 10:53:36 UTC
Created attachment 5609 [details]
mdstat

Output of /proc/mdstat.
Comment 7 Kerin Millar 2005-08-11 10:55:06 UTC
The other thing I want to say at this point is that, although the machine is
reasonably high powered and runs a lot of software, it is seldom put under any
significant load. Certainly, there would have been practically zero load and
minimal I/O activity at the time that this oops was triggered.
Comment 8 Kerin Millar 2005-08-11 11:05:34 UTC
Corrections:

The move was from deadline -> cfq -> deadline -> cfq -> deadline. So the switch
was made twice within the timeframe; I apologise for not being 100% clear but it
may be that the oops occurred on the final switch to cfq (prior to trying to
switch back to deadline). I know only that the oops occurred on either the final
_or_ the penultimate switch process. Also, I meant to say that the machine
*didn't* come up after trying to remotely reboot.
Comment 9 Cal Peake 2005-08-16 23:51:09 UTC
Jens,

I've tracked down the cause of this. It effects both the AS and the Deadline
schedulers. The problem is that e->elevator_data is null when entering the
*_exit_queue functions. Fixing it is beyond my realm but hopefully this helps
you some.
Comment 10 Kerin Millar 2005-11-12 10:49:50 UTC
Well, surely this is now resolved as of the 2.6.14.2 release?

http://www.kernel.org/git/?p=linux/kernel/git/chrisw/stable-queue.git;a=blob;h=1e59c36347ca8d3321abbcfa56e242c274dc6a3b;hb=1876efb4444c275f3aabbc79541edac189e57d91;f=queue/cfq-io-sched-fix.patch

In view of this, I would request closure.
Comment 11 Jens Axboe 2005-11-13 12:20:34 UTC
Yep indeed, 2.6.15-rc1 has it as well.