Bug 12582 - btrfs: __btree_submit_bio_start WARNINGs when running fsstress
Summary: btrfs: __btree_submit_bio_start WARNINGs when running fsstress
Status: RESOLVED OBSOLETE
Alias: None
Product: File System
Classification: Unclassified
Component: btrfs (show other bugs)
Hardware: All Linux
: P1 normal
Assignee: Chris Mason
URL:
Keywords:
: 12850 (view as bug list)
Depends on:
Blocks:
 
Reported: 2009-01-30 09:59 UTC by Eric Whitney
Modified: 2013-04-30 15:49 UTC (History)
6 users (show)

See Also:
Kernel Version: 2.6.38.2
Subsystem:
Regression: No
Bisected commit-id:


Attachments
Debugging patch (849 bytes, patch)
2009-01-30 11:40 UTC, Chris Mason
Details | Diff
Stack trace taken from test system running 2.6.31-rc6 (1.37 KB, application/octet-stream)
2009-08-14 23:04 UTC, Eric Whitney
Details

Description Eric Whitney 2009-01-30 09:59:08 UTC
Latest working kernel version: Warning not observed prior to mainline merge
Earliest failing kernel version:  2.6.29-rc1

Distribution:  uname -a:
Linux bl465cb.lnx.usa.hp.com 2.6.29-rc1-custom #1 SMP Mon Jan 12 18:41:32 EST 2009 x86_64 GNU/Linux

Hardware Environment: dual socket quad core x86_64 AMD system with backplane RAID supplying six volumes for filesystem

Software Environment:  Autotest client code modified to run fstress on btrfs
Filesystem mounted as:  /dev/cciss/c1d5 on /mnt type btrfs (rw)

Problem Description:  Running fstress against a btrfs filesystem will occasionally result in a WARNING.  This behavior is hard to reproduce, but has been observed on both 2.6.29-rc1 and btrfs-unstable (2.6.29-rc3 based, 29 Jan 09) while running against multi-device filesystems.  It has not yet been observed while running against single device filesystems.  No options were specified while making or mounting the filesystems used.

The warning backtraces are different, but they were both triggered by the same line of code.

For 2.6.29-rc1:

[ 3853.883872] ------------[ cut here ]------------
[ 3853.885805] WARNING: at fs/btrfs/disk-io.c:312 __btree_submit_bio_start+0x180/0x260()
[ 3853.889468] Hardware name: ProLiant BL465c G5  
[ 3853.891258] Modules linked in: iptable_filter ip_tables x_tables parport_pc lp parport loop ipmi_devintf serio_raw i2c_piix4 pcspkr ipmi_si i2c_core psmouse shpchp pci_hotplug ipmi_msghandler container button ipv6 evdev ext3 jbd mbcache usbhid hid cciss scsi_mod ohci_hcd bnx2 ehci_hcd uhci_hcd usbcore thermal processor fan thermal_sys fuse
[ 3853.905006] Pid: 5241, comm: btrfs-worker-6 Not tainted 2.6.29-rc1-custom #1
[ 3853.908931] Call Trace:
[ 3853.909962]  [<ffffffff8023d5a2>] warn_slowpath+0xf2/0x130
[ 3853.912823]  [<ffffffff802bf33c>] get_partial_node+0x1c/0x80
[ 3853.915025]  [<ffffffff802c1641>] __slab_alloc+0x2c1/0x440
[ 3853.917222]  [<ffffffff802930b3>] mempool_alloc+0x53/0x130
[ 3853.919632]  [<ffffffff8022f512>] target_load+0x32/0x70
[ 3853.923496]  [<ffffffff8023050f>] update_curr+0x7f/0xc0
[ 3853.925752]  [<ffffffff803a690c>] chksum_update+0xc/0x20
[ 3853.927857]  [<ffffffff803c5a8d>] crc32c+0x4d/0x70
[ 3853.929926]  [<ffffffff803424d0>] btree_get_extent+0x0/0x190
[ 3853.932260]  [<ffffffff803424d0>] btree_get_extent+0x0/0x190
[ 3853.936287]  [<ffffffff80340395>] btree_read_extent_buffer_pages+0x65/0xb0
[ 3853.944112]  [<ffffffff80342100>] __btree_submit_bio_start+0x180/0x260
[ 3853.948414]  [<ffffffff80368311>] worker_loop+0x61/0x160
[ 3853.950581]  [<ffffffff803682b0>] worker_loop+0x0/0x160
[ 3853.953395]  [<ffffffff802524cb>] kthread+0x4b/0x80
[ 3853.956479]  [<ffffffff8020d23a>] child_rip+0xa/0x20
[ 3853.959237]  [<ffffffff80292f80>] mempool_free_slab+0x0/0x10
[ 3853.961657]  [<ffffffff80252480>] kthread+0x0/0x80
[ 3853.963474]  [<ffffffff8020d230>] child_rip+0x0/0x20
[ 3853.965668] ---[ end trace 22a4abd0eab852d1 ]---


For btrfs-unstable, commit 92d7351f0df96ce5b754837cc01b28c694072b22:

[ 4292.650662] ------------[ cut here ]------------
[ 4292.654298] WARNING: at fs/btrfs/disk-io.c:311 __btree_submit_bio_start+0x184/0x260()
[ 4292.658078] Hardware name: ProLiant BL460c G1
[ 4292.659816] Modules linked in: iptable_filter ip_tables x_tables parport_pc lp parport loop ipmi_devintf ipmi_si iTCO_wdt ipmi_msghandler iTCO_vendor_support button container pcspkr i5k_amb i5000_edac psmouse serio_raw edac_core shpchp pci_hotplug ipv6 evdev ext3 jbd mbcache usbhid hid ehci_hcd uhci_hcd usbcore cciss bnx2 scsi_mod thermal processor fan thermal_sys fuse
[ 4292.673572] Pid: 5491, comm: btrfs-worker-4 Not tainted 2.6.29-rc3-btrfs-unstable #1
[ 4292.676965] Call Trace:
[ 4292.677888]  [<ffffffff8023ecdc>] warn_slowpath+0xdc/0x110
[ 4292.681546]  [<ffffffff802326bf>] ? enqueue_task_fair+0x3f/0x110
[ 4292.683870]  [<ffffffff80295a41>] ? mempool_alloc_slab+0x11/0x20
[ 4292.686649]  [<ffffffff803619f2>] ? test_range_bit+0x42/0x100
[ 4292.689109]  [<ffffffff803ab6b0>] ? chksum_update+0x10/0x20
[ 4292.691263]  [<ffffffff803646af>] ? read_extent_buffer_pages+0x6f/0x370
[ 4292.695621]  [<ffffffff80344cab>] ? btree_read_extent_buffer_pages+0x6b/0xb0
[ 4292.699344]  [<ffffffff80346d94>] __btree_submit_bio_start+0x184/0x260
[ 4292.701985]  [<ffffffff8034751c>] run_one_async_start+0x1c/0x20
[ 4292.705868]  [<ffffffff8036c9e0>] worker_loop+0x120/0x1e0
[ 4292.708850]  [<ffffffff8036c8c0>] ? worker_loop+0x0/0x1e0
[ 4292.710843]  [<ffffffff8025436d>] kthread+0x4d/0x80
[ 4292.712791]  [<ffffffff8020d33a>] child_rip+0xa/0x20
[ 4292.715394]  [<ffffffff80254320>] ? kthread+0x0/0x80
[ 4292.717223]  [<ffffffff8020d330>] ? child_rip+0x0/0x20
[ 4292.719231] ---[ end trace 4185b3aed6b8c095 ]---


Steps to reproduce:
1) mkfs a multi-device filesystem
2) mount w/o options
3) run fstress as found in autotest
Comment 1 Chris Mason 2009-01-30 11:10:44 UTC
Memory corruption is the most likely cause of this one, could you please try with CONFIG_SLAB_DEBUG (or slub debug) enabled?
Comment 2 Chris Mason 2009-01-30 11:40:32 UTC
Created attachment 20045 [details]
Debugging patch

Outside of memory corruption, the most likely cause of this is the code that is supposed to mark metadata as written.  Could you please run with this debugging patch and see if the printk triggers?
Comment 3 Eric Whitney 2009-02-10 15:11:50 UTC
Numerous runs with SLUB_DEBUG enabled and separate runs with the debugging patch using two btrfs-unstable kernels between -rc3 and -rc4 have failed to catch this warning.  I'll set up some special stress testing with -rc4 to see if it's still possible to trigger it.
Comment 4 Luca Bonissi 2009-03-03 05:35:31 UTC
I have the same problem with 2.6.29-rc6: every time something is written, I have the following warning:

[64750.358129] ------------[ cut here ]------------
[64750.358136] WARNING: at fs/btrfs/disk-io.c:860 clean_tree_block+0xb2/0xc0()
[64750.358142] Hardware name: HP Vectra
[64750.358146] Modules linked in: i810 drm snd_seq_dummy snd_seq_oss snd_seq_midi_event snd_seq snd_seq_device snd_pcm_oss snd_mixer_oss nfsd lockd auth_rpcgss sunrpc exportfs ipv6 sg fuse 3c59x mii container parport_pc parport rtc_cmos rtc_core rtc_lib thermal processor thermal_sys snd_intel8x0 button hwmon snd_ac97_codec ac97_bus evdev snd_pcm snd_timer psmouse i2c_i801 snd serio_raw soundcore snd_page_alloc i2c_core intel_agp intel_rng uhci_hcd shpchp agpgart iTCO_wdt iTCO_vendor_support
[64750.358220] Pid: 5509, comm: touch Tainted: G        W  2.6.29-rc6d #3
[64750.358226] Call Trace:
[64750.358236]  [<c0123796>] warn_slowpath+0x86/0xa0
[64750.358247]  [<c02a9ba4>] ? set_extent_bit+0x354/0x3d0
[64750.358257]  [<c027c0e2>] ? btrfs_lookup_block_group+0x12/0x20
[64750.358267]  [<c027c90e>] ? update_block_group+0x3e/0x210
[64750.358278]  [<c02acdd1>] ? alloc_extent_buffer+0x31/0x2f0
[64750.358288]  [<c028ebd2>] clean_tree_block+0xb2/0xc0
[64750.358298]  [<c027cfb1>] btrfs_init_new_buffer+0x71/0x120
[64750.358308]  [<c0281e39>] btrfs_alloc_free_block+0xd9/0xf0
[64750.358319]  [<c0275d1e>] __btrfs_cow_block+0x21e/0x960
[64750.358330]  [<c02a8d49>] ? map_extent_buffer+0xb9/0xc0
[64750.358340]  [<c0276aa8>] btrfs_cow_block+0x178/0x240
[64750.358350]  [<c0279cf7>] btrfs_search_slot+0x197/0x8d0
[64750.358359]  [<c02a8f43>] ? free_extent_state+0x33/0x60
[64750.358369]  [<c02a9403>] ? merge_state+0x63/0xf0
[64750.358378]  [<c0274552>] ? btrfs_alloc_path+0x12/0x20
[64750.358388]  [<c02837d2>] __btrfs_inc_extent_ref+0x72/0x210
[64750.358404]  [<c029f708>] ? btrfs_file_extent_disk_bytenr+0xa8/0xd0
[64750.358415]  [<c0286bb9>] btrfs_inc_ref+0x3b9/0x4f0
[64750.358426]  [<c0283760>] ? __btrfs_inc_extent_ref+0x0/0x210
[64750.358437]  [<c0275ed5>] __btrfs_cow_block+0x3d5/0x960
[64750.358447]  [<c02a8d49>] ? map_extent_buffer+0xb9/0xc0
[64750.358457]  [<c0276aa8>] btrfs_cow_block+0x178/0x240
[64750.358467]  [<c0279cf7>] btrfs_search_slot+0x197/0x8d0
[64750.358479]  [<c028b6c6>] btrfs_lookup_inode+0x36/0xb0
[64750.358488]  [<c0274552>] ? btrfs_alloc_path+0x12/0x20
[64750.358498]  [<c0295575>] btrfs_update_inode+0x45/0xd0
[64750.358507]  [<c02965de>] btrfs_dirty_inode+0x4e/0x70
[64750.358516]  [<c019964c>] __mark_inode_dirty+0x2c/0x150
[64750.358526]  [<c019231b>] inode_setattr+0x8b/0x180
[64750.358535]  [<c029446e>] btrfs_setattr+0x2e/0x90
[64750.358544]  [<c0192549>] notify_change+0x139/0x310
[64750.358554]  [<c019c077>] utimes_common+0xb7/0x160
[64750.358564]  [<c019c1e3>] do_utimes+0xc3/0xe0
[64750.358573]  [<c017efee>] ? do_sys_open+0xbe/0xe0
[64750.358582]  [<c019c2d1>] sys_utimensat+0x21/0x70
[64750.358591]  [<c0103ada>] syscall_call+0x7/0xb
[64750.358597] ---[ end trace 9d85e88a01847619 ]---

I mounted with "noatime" option, otherwise the syslog become too big.

There are 2 (single) devices mounted with btrfs, compression is enabled:

/dev/mapper/ST-home    4096000   2951092   1144908  73% /home
/dev/mapper/ST-usr     4825088   2461564   2363524  52% /usr
Comment 5 Chris Mason 2009-03-04 10:48:24 UTC
(In reply to comment #4)
> I have the same problem with 2.6.29-rc6: every time something is written, I
> have the following warning:

This is a different problem, if you compile with SMP support it will go away.  I have a patch queued here to send to Linus to fix it for the next release.
Comment 6 Luca Bonissi 2009-03-06 07:12:08 UTC
Yes, you are right. Warnings disappear when compiling with SMP support.

Thank you,
  Luca
Comment 7 Alan 2009-03-17 08:11:24 UTC
*** Bug 12850 has been marked as a duplicate of this bug. ***
Comment 8 Eric Whitney 2009-08-14 23:04:23 UTC
Created attachment 22719 [details]
Stack trace taken from test system running 2.6.31-rc6

It's finally resurfaced.  I suddenly saw it during 2.6.31-rc5 stress testing, and now in my initial -rc6 testing pass with fsstress on a filesystem mounted with the nodatasum option (also seen with other and no options on -rc5).  Frequency of occurrence seems somewhat higher than it did way back in 2.6.29-rc1.

This was a six device btrfs on an eight core dual socket AMD blade.  2.6.31-rc6 kernel, commit 64f1607ffbbc772685733ea63e6f7f4183df1b16

I'll try to work up some more diagnostic info.  fsstress eventually completed, as have all the other -rc5 instances I've seen.
Comment 9 keith mannthey 2011-04-14 16:24:53 UTC
I am runnin on 2.6.38.2 with btrfs. 

I think this is the same section of code as this bug. 

The system is 2 socket Intel (quad core) with a SAN.  There are 18 separate BTRFS file systems on 18 luns in this configuration.  

This was seen while doing a database restore. 



Apr 13 13:32:56 elm3c46 kernel: ------------[ cut here ]------------
Apr 13 13:32:56 elm3c46 kernel: WARNING: at fs/btrfs/disk-io.c:381 __btree_submit_bio_start+0x2c7/0x300 [btrfs]()
Apr 13 13:32:56 elm3c46 kernel: Hardware name: System x3550 M2 -[7946AC1]-
Apr 13 13:32:56 elm3c46 kernel: Modules linked in: autofs4 sunrpc cpufreq_ondemand acpi_cpufreq freq_table mperf xt_physdev ipt_REJECT nf_conntrack_ipv4 nf_defrag_ipv4 iptable_filter ip_tables ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables ipv6 btrfs zlib_deflate libcrc32c ext3 jbd dm_mirror dm_region_hash dm_log dm_mod kvm_intel kvm serio_raw pcspkr i2c_i801 i2c_core iTCO_wdt iTCO_vendor_support cdc_ether usbnet mii shpchp ioatdma dca i7core_edac edac_core sg bnx2 ext4 mbcache jbd2 sr_mod cdrom sd_mod crc_t10dif pata_acpi ata_generic ata_piix mptsas mptscsih mptbase scsi_transport_sas lpfc scsi_transport_fc scsi_tgt [last unloaded: microcode]
Apr 13 13:32:56 elm3c46 kernel: Pid: 5083, comm: btrfs-worker-2 Not tainted 2.6.38.2 #2
Apr 13 13:32:56 elm3c46 kernel: Call Trace:
Apr 13 13:32:56 elm3c46 kernel: [<ffffffff8106157f>] ? warn_slowpath_common+0x7f/0xc0
Apr 13 13:32:56 elm3c46 kernel: [<ffffffff810615da>] ? warn_slowpath_null+0x1a/0x20
Apr 13 13:32:56 elm3c46 kernel: [<ffffffffa033a267>] ? __btree_submit_bio_start+0x2c7/0x300 [btrfs]
Apr 13 13:32:56 elm3c46 kernel: [<ffffffffa03348d5>] ? run_one_async_start+0x25/0x30 [btrfs]
Apr 13 13:32:56 elm3c46 kernel: [<ffffffffa036887e>] ? worker_loop+0x13e/0x540 [btrfs]
Apr 13 13:32:56 elm3c46 kernel: [<ffffffffa0368740>] ? worker_loop+0x0/0x540 [btrfs]
Apr 13 13:32:56 elm3c46 kernel: [<ffffffffa0368740>] ? worker_loop+0x0/0x540 [btrfs]
Apr 13 13:32:56 elm3c46 kernel: [<ffffffff810826f6>] ? kthread+0x96/0xa0
Apr 13 13:32:56 elm3c46 kernel: [<ffffffff8100cdc4>] ? kernel_thread_helper+0x4/0x10
Apr 13 13:32:56 elm3c46 kernel: [<ffffffff81082660>] ? kthread+0x0/0xa0
Apr 13 13:32:56 elm3c46 kernel: [<ffffffff8100cdc0>] ? kernel_thread_helper+0x0/0x10
Apr 13 13:32:56 elm3c46 kernel: ---[ end trace 7a40dabd588f3795 ]---
Comment 10 Josef Bacik 2013-04-30 15:49:42 UTC
Closing, please reopen if this problem is still affecting you.

Note You need to log in before you can comment on or make changes to this bug.