Bug 14143

Summary: OOPS when setting nr_requests for md devices
Product: IO/Storage Reporter: aCaB (acab)
Component: Block LayerAssignee: Jens Axboe (axboe)
Status: CLOSED CODE_FIX    
Severity: normal CC: akpm, edwin+bugs, rjw
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 2.6.31-rc9-git Subsystem:
Regression: Yes Bisected commit-id:
Bug Depends on:    
Bug Blocks: 13615    
Attachments: Don't allow storing of nr_requests on stacked device

Description aCaB 2009-09-08 08:48:07 UTC
Hi,

Echoing any value to /sys/block/mdX/queue/nr_requests causes the kernel to OOPS and often to panic within 5-20 seconds.

Doing the same for the underlaying block device works fine.

Reproduces easily on all the boxes I've tried.

Example bt:
[ 6336.680154] BUG: unable to handle kernel NULL pointer dereference at (null)                                                                                 
[ 6336.680159] IP: [<ffffffff8102c8fc>] __wake_up_common+0x26/0x72                                                                                             
[ 6336.680167] PGD 8d450067 PUD 8c91e067 PMD 0                                                                                                                 
[ 6336.680171] Oops: 0000 [#1] PREEMPT SMP                                                                                                                     
[ 6336.680174] last sysfs file: /sys/devices/virtual/block/md3/queue/nr_requests                                                                               
[ 6336.680177] CPU 1                                                                                                                                           
[ 6336.680179] Modules linked in: xt_tcpudp nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack iptable_filter ip_tables x_tables fglrx(P) ipv6 cpufreq_conservative cpufreq_powersave cpufreq_stats cpufreq_userspace cpufreq_ondemand microcode kvm_intel kvm fuse acpi_cpufreq freq_table coretemp it87 hwmon_vid loop snd_hda_codec_atihdmi snd_hda_codec_realtek snd_hda_intel snd_hda_codec snd_pcm_oss snd_mixer_oss snd_pcm snd_seq_oss snd_seq_midi_event hid_a4tech snd_seq snd_timer snd_seq_device usbhid i2c_i801 evdev i2c_core snd soundcore button processor snd_page_alloc sr_mod cdrom uhci_hcd ehci_hcd floppy usbcore nls_base thermal fan thermal_sys [last unloaded: scsi_wait_scan]
[ 6336.680225] Pid: 13570, comm: bash Tainted: P           2.6.31-rc9 #3 EP45T-DS3
[ 6336.680227] RIP: 0010:[<ffffffff8102c8fc>]  [<ffffffff8102c8fc>] __wake_up_common+0x26/0x72
[ 6336.680232] RSP: 0018:ffff880072c59de8  EFLAGS: 00010096
[ 6336.680234] RAX: 0000000000000000 RBX: ffff88012d0d0058 RCX: 0000000000000000
[ 6336.680237] RDX: 0000000000000001 RSI: 0000000000000003 RDI: ffff88012d0d0058
[ 6336.680239] RBP: 0000000000000001 R08: ffffffffffffffe8 R09: 000000000000000a
[ 6336.680242] R10: 0000000000066f6b R11: 0000000000000002 R12: 0000000000000000
[ 6336.680244] R13: ffff88012d0d0060 R14: 0000000000000000 R15: 0000000000000000
[ 6336.680247] FS:  00007f0137db96f0(0000) GS:ffff88002803e000(0000) knlGS:0000000000000000
[ 6336.680250] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 6336.680252] CR2: 0000000000000000 CR3: 0000000072c5c000 CR4: 00000000000026e0
[ 6336.680255] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 6336.680257] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[ 6336.680260] Process bash (pid: 13570, threadinfo ffff880072c58000, task ffff88008d430000)
[ 6336.680262] Stack:
[ 6336.680263]  00000003816fb4c0 ffff88012d0d0058 ffff88012d0d0000 0000000000000000
[ 6336.680267] <0> 0000000000000001 0000000000000046 0000000000000003 ffffffff8102ebe5
[ 6336.680271] <0> 0000000000002000 0000000000000005 ffffffff8165e170 0000000000000005
[ 6336.680276] Call Trace:
[ 6336.680280]  [<ffffffff8102ebe5>] ? __wake_up+0x30/0x44
[ 6336.680285]  [<ffffffff811d2c67>] ? queue_requests_store+0x183/0x2aa
[ 6336.680288]  [<ffffffff811d2df2>] ? queue_attr_store+0x64/0x7f
[ 6336.680292]  [<ffffffff81116130>] ? sysfs_write_file+0xd0/0x107
[ 6336.680296]  [<ffffffff810c4f87>] ? vfs_write+0xad/0x169
[ 6336.680299]  [<ffffffff810c50ff>] ? sys_write+0x45/0x6e
[ 6336.680303]  [<ffffffff8100ba2b>] ? system_call_fastpath+0x16/0x1b
[ 6336.680305] Code: 01 00 00 00 c3 41 57 41 89 cf 41 56 4d 89 c6 41 55 4c 8d 6f 08 41 54 55 89 d5 53 48 83 ec 08 89 74 24 04 48 8b 47 08 4c 8d 40 e8 <49> 8b 40 18 48 8d 58 e8 eb 2d 45 8b 20 4c 89 f1 44 89 fa 8b 74
[ 6336.680337] RIP  [<ffffffff8102c8fc>] __wake_up_common+0x26/0x72
[ 6336.680340]  RSP <ffff880072c59de8>
[ 6336.680342] CR2: 0000000000000000
[ 6336.680344] ---[ end trace d5292aec4d1f06a9 ]---
[ 6336.680347] note: bash[13570] exited with preempt_count 2
Comment 1 Andrew Morton 2009-09-11 19:51:59 UTC
I reasigned this to the block layer (might be wrong) and marked it as a regression.
Comment 2 Jens Axboe 2009-09-11 20:43:05 UTC
Created attachment 23078 [details]
Don't allow storing of nr_requests on stacked device

This should resolve it. The issue is that stacked devices don't have a request list, and as such exporting this value does more harm than good. We could make it apply there as well, it would actually make sense if dm/md/etc did that for throttling reasons. But then we'd need to modify the store function as well, since it's currently assuming that it has a request list backing.
Comment 3 Rafael J. Wysocki 2009-10-12 21:28:21 UTC
On Monday 12 October 2009, Chuck Ebbert wrote:
> On Mon, 12 Oct 2009 01:01:05 +0200 (CEST)
> "Rafael J. Wysocki" <rjw@sisk.pl> wrote:
> 
> > This message has been generated automatically as a part of a report
> > of regressions introduced between 2.6.30 and 2.6.31.
> > 
> > The following bug entry is on the current list of known regressions
> > introduced between 2.6.30 and 2.6.31.  Please verify if it still should
> > be listed and let me know (either way).
> > 
> > 
> > Bug-Entry   : http://bugzilla.kernel.org/show_bug.cgi?id=14143
> > Subject             : OOPS when setting nr_requests for md devices
> > Submitter   : aCaB <acab@clamav.net>
> > Date                : 2009-09-08 08:48 (34 days old)
> > 
> 
> Fixed in 2.6.32 by commit b8a9ae77 ("block: don't assume device has a
> request list backing in nr_requests store")
> 
> Also in 2.6.31.1