Latest working kernel version: Warning not observed prior to mainline merge Earliest failing kernel version: 2.6.29-rc1 Distribution: uname -a: Linux bl465cb.lnx.usa.hp.com 2.6.29-rc1-custom #1 SMP Mon Jan 12 18:41:32 EST 2009 x86_64 GNU/Linux Hardware Environment: dual socket quad core x86_64 AMD system with backplane RAID supplying six volumes for filesystem Software Environment: Autotest client code modified to run fstress on btrfs Filesystem mounted as: /dev/cciss/c1d5 on /mnt type btrfs (rw) Problem Description: Running fstress against a btrfs filesystem will occasionally result in a WARNING. This behavior is hard to reproduce, but has been observed on both 2.6.29-rc1 and btrfs-unstable (2.6.29-rc3 based, 29 Jan 09) while running against multi-device filesystems. It has not yet been observed while running against single device filesystems. No options were specified while making or mounting the filesystems used. The warning backtraces are different, but they were both triggered by the same line of code. For 2.6.29-rc1: [ 3853.883872] ------------[ cut here ]------------ [ 3853.885805] WARNING: at fs/btrfs/disk-io.c:312 __btree_submit_bio_start+0x180/0x260() [ 3853.889468] Hardware name: ProLiant BL465c G5 [ 3853.891258] Modules linked in: iptable_filter ip_tables x_tables parport_pc lp parport loop ipmi_devintf serio_raw i2c_piix4 pcspkr ipmi_si i2c_core psmouse shpchp pci_hotplug ipmi_msghandler container button ipv6 evdev ext3 jbd mbcache usbhid hid cciss scsi_mod ohci_hcd bnx2 ehci_hcd uhci_hcd usbcore thermal processor fan thermal_sys fuse [ 3853.905006] Pid: 5241, comm: btrfs-worker-6 Not tainted 2.6.29-rc1-custom #1 [ 3853.908931] Call Trace: [ 3853.909962] [<ffffffff8023d5a2>] warn_slowpath+0xf2/0x130 [ 3853.912823] [<ffffffff802bf33c>] get_partial_node+0x1c/0x80 [ 3853.915025] [<ffffffff802c1641>] __slab_alloc+0x2c1/0x440 [ 3853.917222] [<ffffffff802930b3>] mempool_alloc+0x53/0x130 [ 3853.919632] [<ffffffff8022f512>] target_load+0x32/0x70 [ 3853.923496] [<ffffffff8023050f>] update_curr+0x7f/0xc0 [ 3853.925752] [<ffffffff803a690c>] chksum_update+0xc/0x20 [ 3853.927857] [<ffffffff803c5a8d>] crc32c+0x4d/0x70 [ 3853.929926] [<ffffffff803424d0>] btree_get_extent+0x0/0x190 [ 3853.932260] [<ffffffff803424d0>] btree_get_extent+0x0/0x190 [ 3853.936287] [<ffffffff80340395>] btree_read_extent_buffer_pages+0x65/0xb0 [ 3853.944112] [<ffffffff80342100>] __btree_submit_bio_start+0x180/0x260 [ 3853.948414] [<ffffffff80368311>] worker_loop+0x61/0x160 [ 3853.950581] [<ffffffff803682b0>] worker_loop+0x0/0x160 [ 3853.953395] [<ffffffff802524cb>] kthread+0x4b/0x80 [ 3853.956479] [<ffffffff8020d23a>] child_rip+0xa/0x20 [ 3853.959237] [<ffffffff80292f80>] mempool_free_slab+0x0/0x10 [ 3853.961657] [<ffffffff80252480>] kthread+0x0/0x80 [ 3853.963474] [<ffffffff8020d230>] child_rip+0x0/0x20 [ 3853.965668] ---[ end trace 22a4abd0eab852d1 ]--- For btrfs-unstable, commit 92d7351f0df96ce5b754837cc01b28c694072b22: [ 4292.650662] ------------[ cut here ]------------ [ 4292.654298] WARNING: at fs/btrfs/disk-io.c:311 __btree_submit_bio_start+0x184/0x260() [ 4292.658078] Hardware name: ProLiant BL460c G1 [ 4292.659816] Modules linked in: iptable_filter ip_tables x_tables parport_pc lp parport loop ipmi_devintf ipmi_si iTCO_wdt ipmi_msghandler iTCO_vendor_support button container pcspkr i5k_amb i5000_edac psmouse serio_raw edac_core shpchp pci_hotplug ipv6 evdev ext3 jbd mbcache usbhid hid ehci_hcd uhci_hcd usbcore cciss bnx2 scsi_mod thermal processor fan thermal_sys fuse [ 4292.673572] Pid: 5491, comm: btrfs-worker-4 Not tainted 2.6.29-rc3-btrfs-unstable #1 [ 4292.676965] Call Trace: [ 4292.677888] [<ffffffff8023ecdc>] warn_slowpath+0xdc/0x110 [ 4292.681546] [<ffffffff802326bf>] ? enqueue_task_fair+0x3f/0x110 [ 4292.683870] [<ffffffff80295a41>] ? mempool_alloc_slab+0x11/0x20 [ 4292.686649] [<ffffffff803619f2>] ? test_range_bit+0x42/0x100 [ 4292.689109] [<ffffffff803ab6b0>] ? chksum_update+0x10/0x20 [ 4292.691263] [<ffffffff803646af>] ? read_extent_buffer_pages+0x6f/0x370 [ 4292.695621] [<ffffffff80344cab>] ? btree_read_extent_buffer_pages+0x6b/0xb0 [ 4292.699344] [<ffffffff80346d94>] __btree_submit_bio_start+0x184/0x260 [ 4292.701985] [<ffffffff8034751c>] run_one_async_start+0x1c/0x20 [ 4292.705868] [<ffffffff8036c9e0>] worker_loop+0x120/0x1e0 [ 4292.708850] [<ffffffff8036c8c0>] ? worker_loop+0x0/0x1e0 [ 4292.710843] [<ffffffff8025436d>] kthread+0x4d/0x80 [ 4292.712791] [<ffffffff8020d33a>] child_rip+0xa/0x20 [ 4292.715394] [<ffffffff80254320>] ? kthread+0x0/0x80 [ 4292.717223] [<ffffffff8020d330>] ? child_rip+0x0/0x20 [ 4292.719231] ---[ end trace 4185b3aed6b8c095 ]--- Steps to reproduce: 1) mkfs a multi-device filesystem 2) mount w/o options 3) run fstress as found in autotest
Memory corruption is the most likely cause of this one, could you please try with CONFIG_SLAB_DEBUG (or slub debug) enabled?
Created attachment 20045 [details] Debugging patch Outside of memory corruption, the most likely cause of this is the code that is supposed to mark metadata as written. Could you please run with this debugging patch and see if the printk triggers?
Numerous runs with SLUB_DEBUG enabled and separate runs with the debugging patch using two btrfs-unstable kernels between -rc3 and -rc4 have failed to catch this warning. I'll set up some special stress testing with -rc4 to see if it's still possible to trigger it.
I have the same problem with 2.6.29-rc6: every time something is written, I have the following warning: [64750.358129] ------------[ cut here ]------------ [64750.358136] WARNING: at fs/btrfs/disk-io.c:860 clean_tree_block+0xb2/0xc0() [64750.358142] Hardware name: HP Vectra [64750.358146] Modules linked in: i810 drm snd_seq_dummy snd_seq_oss snd_seq_midi_event snd_seq snd_seq_device snd_pcm_oss snd_mixer_oss nfsd lockd auth_rpcgss sunrpc exportfs ipv6 sg fuse 3c59x mii container parport_pc parport rtc_cmos rtc_core rtc_lib thermal processor thermal_sys snd_intel8x0 button hwmon snd_ac97_codec ac97_bus evdev snd_pcm snd_timer psmouse i2c_i801 snd serio_raw soundcore snd_page_alloc i2c_core intel_agp intel_rng uhci_hcd shpchp agpgart iTCO_wdt iTCO_vendor_support [64750.358220] Pid: 5509, comm: touch Tainted: G W 2.6.29-rc6d #3 [64750.358226] Call Trace: [64750.358236] [<c0123796>] warn_slowpath+0x86/0xa0 [64750.358247] [<c02a9ba4>] ? set_extent_bit+0x354/0x3d0 [64750.358257] [<c027c0e2>] ? btrfs_lookup_block_group+0x12/0x20 [64750.358267] [<c027c90e>] ? update_block_group+0x3e/0x210 [64750.358278] [<c02acdd1>] ? alloc_extent_buffer+0x31/0x2f0 [64750.358288] [<c028ebd2>] clean_tree_block+0xb2/0xc0 [64750.358298] [<c027cfb1>] btrfs_init_new_buffer+0x71/0x120 [64750.358308] [<c0281e39>] btrfs_alloc_free_block+0xd9/0xf0 [64750.358319] [<c0275d1e>] __btrfs_cow_block+0x21e/0x960 [64750.358330] [<c02a8d49>] ? map_extent_buffer+0xb9/0xc0 [64750.358340] [<c0276aa8>] btrfs_cow_block+0x178/0x240 [64750.358350] [<c0279cf7>] btrfs_search_slot+0x197/0x8d0 [64750.358359] [<c02a8f43>] ? free_extent_state+0x33/0x60 [64750.358369] [<c02a9403>] ? merge_state+0x63/0xf0 [64750.358378] [<c0274552>] ? btrfs_alloc_path+0x12/0x20 [64750.358388] [<c02837d2>] __btrfs_inc_extent_ref+0x72/0x210 [64750.358404] [<c029f708>] ? btrfs_file_extent_disk_bytenr+0xa8/0xd0 [64750.358415] [<c0286bb9>] btrfs_inc_ref+0x3b9/0x4f0 [64750.358426] [<c0283760>] ? __btrfs_inc_extent_ref+0x0/0x210 [64750.358437] [<c0275ed5>] __btrfs_cow_block+0x3d5/0x960 [64750.358447] [<c02a8d49>] ? map_extent_buffer+0xb9/0xc0 [64750.358457] [<c0276aa8>] btrfs_cow_block+0x178/0x240 [64750.358467] [<c0279cf7>] btrfs_search_slot+0x197/0x8d0 [64750.358479] [<c028b6c6>] btrfs_lookup_inode+0x36/0xb0 [64750.358488] [<c0274552>] ? btrfs_alloc_path+0x12/0x20 [64750.358498] [<c0295575>] btrfs_update_inode+0x45/0xd0 [64750.358507] [<c02965de>] btrfs_dirty_inode+0x4e/0x70 [64750.358516] [<c019964c>] __mark_inode_dirty+0x2c/0x150 [64750.358526] [<c019231b>] inode_setattr+0x8b/0x180 [64750.358535] [<c029446e>] btrfs_setattr+0x2e/0x90 [64750.358544] [<c0192549>] notify_change+0x139/0x310 [64750.358554] [<c019c077>] utimes_common+0xb7/0x160 [64750.358564] [<c019c1e3>] do_utimes+0xc3/0xe0 [64750.358573] [<c017efee>] ? do_sys_open+0xbe/0xe0 [64750.358582] [<c019c2d1>] sys_utimensat+0x21/0x70 [64750.358591] [<c0103ada>] syscall_call+0x7/0xb [64750.358597] ---[ end trace 9d85e88a01847619 ]--- I mounted with "noatime" option, otherwise the syslog become too big. There are 2 (single) devices mounted with btrfs, compression is enabled: /dev/mapper/ST-home 4096000 2951092 1144908 73% /home /dev/mapper/ST-usr 4825088 2461564 2363524 52% /usr
(In reply to comment #4) > I have the same problem with 2.6.29-rc6: every time something is written, I > have the following warning: This is a different problem, if you compile with SMP support it will go away. I have a patch queued here to send to Linus to fix it for the next release.
Yes, you are right. Warnings disappear when compiling with SMP support. Thank you, Luca
*** Bug 12850 has been marked as a duplicate of this bug. ***
Created attachment 22719 [details] Stack trace taken from test system running 2.6.31-rc6 It's finally resurfaced. I suddenly saw it during 2.6.31-rc5 stress testing, and now in my initial -rc6 testing pass with fsstress on a filesystem mounted with the nodatasum option (also seen with other and no options on -rc5). Frequency of occurrence seems somewhat higher than it did way back in 2.6.29-rc1. This was a six device btrfs on an eight core dual socket AMD blade. 2.6.31-rc6 kernel, commit 64f1607ffbbc772685733ea63e6f7f4183df1b16 I'll try to work up some more diagnostic info. fsstress eventually completed, as have all the other -rc5 instances I've seen.
I am runnin on 2.6.38.2 with btrfs. I think this is the same section of code as this bug. The system is 2 socket Intel (quad core) with a SAN. There are 18 separate BTRFS file systems on 18 luns in this configuration. This was seen while doing a database restore. Apr 13 13:32:56 elm3c46 kernel: ------------[ cut here ]------------ Apr 13 13:32:56 elm3c46 kernel: WARNING: at fs/btrfs/disk-io.c:381 __btree_submit_bio_start+0x2c7/0x300 [btrfs]() Apr 13 13:32:56 elm3c46 kernel: Hardware name: System x3550 M2 -[7946AC1]- Apr 13 13:32:56 elm3c46 kernel: Modules linked in: autofs4 sunrpc cpufreq_ondemand acpi_cpufreq freq_table mperf xt_physdev ipt_REJECT nf_conntrack_ipv4 nf_defrag_ipv4 iptable_filter ip_tables ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables ipv6 btrfs zlib_deflate libcrc32c ext3 jbd dm_mirror dm_region_hash dm_log dm_mod kvm_intel kvm serio_raw pcspkr i2c_i801 i2c_core iTCO_wdt iTCO_vendor_support cdc_ether usbnet mii shpchp ioatdma dca i7core_edac edac_core sg bnx2 ext4 mbcache jbd2 sr_mod cdrom sd_mod crc_t10dif pata_acpi ata_generic ata_piix mptsas mptscsih mptbase scsi_transport_sas lpfc scsi_transport_fc scsi_tgt [last unloaded: microcode] Apr 13 13:32:56 elm3c46 kernel: Pid: 5083, comm: btrfs-worker-2 Not tainted 2.6.38.2 #2 Apr 13 13:32:56 elm3c46 kernel: Call Trace: Apr 13 13:32:56 elm3c46 kernel: [<ffffffff8106157f>] ? warn_slowpath_common+0x7f/0xc0 Apr 13 13:32:56 elm3c46 kernel: [<ffffffff810615da>] ? warn_slowpath_null+0x1a/0x20 Apr 13 13:32:56 elm3c46 kernel: [<ffffffffa033a267>] ? __btree_submit_bio_start+0x2c7/0x300 [btrfs] Apr 13 13:32:56 elm3c46 kernel: [<ffffffffa03348d5>] ? run_one_async_start+0x25/0x30 [btrfs] Apr 13 13:32:56 elm3c46 kernel: [<ffffffffa036887e>] ? worker_loop+0x13e/0x540 [btrfs] Apr 13 13:32:56 elm3c46 kernel: [<ffffffffa0368740>] ? worker_loop+0x0/0x540 [btrfs] Apr 13 13:32:56 elm3c46 kernel: [<ffffffffa0368740>] ? worker_loop+0x0/0x540 [btrfs] Apr 13 13:32:56 elm3c46 kernel: [<ffffffff810826f6>] ? kthread+0x96/0xa0 Apr 13 13:32:56 elm3c46 kernel: [<ffffffff8100cdc4>] ? kernel_thread_helper+0x4/0x10 Apr 13 13:32:56 elm3c46 kernel: [<ffffffff81082660>] ? kthread+0x0/0xa0 Apr 13 13:32:56 elm3c46 kernel: [<ffffffff8100cdc0>] ? kernel_thread_helper+0x0/0x10 Apr 13 13:32:56 elm3c46 kernel: ---[ end trace 7a40dabd588f3795 ]---
Closing, please reopen if this problem is still affecting you.