Bug 60834 - General protection fault in btrfs kernel module
Summary: General protection fault in btrfs kernel module
Status: RESOLVED CODE_FIX
Alias: None
Product: File System
Classification: Unclassified
Component: btrfs (show other bugs)
Hardware: x86-64 Linux
: P1 blocking
Assignee: Josef Bacik
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2013-09-02 13:30 UTC by moritz.stoetter
Modified: 2013-10-31 20:51 UTC (History)
3 users (show)

See Also:
Kernel Version: 3.11.0-031100-generic
Subsystem:
Regression: No
Bisected commit-id:


Attachments

Description moritz.stoetter 2013-09-02 13:30:11 UTC
When sending incremental btrfs snapshots for backups between two machines with 

 ssh root@remote "btrfs send -p OLDSNAPSHOT SNAPSHOT" | btrfs receive SNAPSHOT

we get the following error message in our kern.log:

general protection fault: 0000 [#1] SMP 
Modules linked in: vesafb(F) kvm_amd(F) ttm(F) drm_kms_helper(F) kvm(F) drm(F) ghash_clmulni_intel(F) aesni_intel(F) ablk_helper(F) cryptd(F) i2c_algo_bit(F) sysimgblt(F) joydev(F) lrw(F) aes_x86_64(F) xts(F) gf128mul(F) amd64_edac_mod(F) sysfillrect(F) hid_generic(F) sp5100_tco(F) edac_core(F) psmouse(F) i2c_piix4(F) syscopyarea(F) edac_mce_amd(F) usbhid(F) fam15h_power(F) serio_raw(F) hid(F) microcode(F) k10temp(F) mac_hid(F) nfsd(F) nfs_acl(F) auth_rpcgss(F) lp(F) nfs(F) fscache(F) parport(F) lockd(F) sunrpc(F) btrfs(F) zlib_deflate(F) libcrc32c(F) e1000e(F) arcmsr(F)
CPU 10 
Pid: 10464, comm: btrfs Tainted: GF            3.8.0-29-generic #42~precise1-Ubuntu empty empty/S8230
RIP: 0010:[<ffffffff813602d2>]  [<ffffffff813602d2>] memcpy+0x12/0x110
RSP: 0018:ffff8800a5603ab0  EFLAGS: 00010202
RAX: ffff8800a5603b26 RBX: ffff8817aa2446b8 RCX: 0000000000000002
RDX: 0000000000000002 RSI: 0005080000000000 RDI: ffff8800a5603b26
RBP: ffff8800a5603af8 R08: ffff8800a5603b10 R09: ffff8800a5603b18
R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000002
R13: 0000000000000002 R14: ffff8800a5603b28 R15: 0000000000000010
FS:  00007f20c3c0c740(0000) GS:ffff88201ec80000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 00007f2225a06430 CR3: 0000001238882000 CR4: 00000000000407e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process btrfs (pid: 10464, threadinfo ffff8800a5602000, task ffff881ff30c45c0)
Stack:
 ffffffffa00a28ec 0000000000000000 ffff8817f4839800 ffff8817aa2446b8
 0000000000000000 ffff8817aa2446b8 000000000000002d 0000000000000fcd
 0000000000000ffa ffff8800a5603b58 ffffffffa0098dcc ffff880511d2e000
Call Trace:
 [<ffffffffa00a28ec>] ? read_extent_buffer+0xbc/0x120 [btrfs]
 [<ffffffffa0098dcc>] btrfs_get_token_64+0xdc/0xf0 [btrfs]
 [<ffffffffa00e2946>] is_extent_unchanged.isra.24+0x276/0x3d0 [btrfs]
 [<ffffffffa007cb3f>] ? btree_read_extent_buffer_pages.constprop.121+0xcf/0x120 [btrfs]
 [<ffffffffa00e49ff>] process_extent+0xbf/0xf0 [btrfs]
 [<ffffffffa00e50a0>] changed_cb+0xc0/0x100 [btrfs]
 [<ffffffffa0065455>] btrfs_compare_trees+0x895/0x8f0 [btrfs]
 [<ffffffffa00e2b0e>] ? write_buf+0x6e/0xc0 [btrfs]
 [<ffffffff811846d2>] ? kmem_cache_free+0x42/0x160
 [<ffffffffa00e4fe0>] ? changed_inode+0x4a0/0x4a0 [btrfs]
 [<ffffffffa00e587b>] send_subvol+0x9b/0x150 [btrfs]
 [<ffffffffa00e5b4c>] btrfs_ioctl_send+0x21c/0x430 [btrfs]
 [<ffffffff8103ed0e>] ? physflat_send_IPI_mask+0xe/0x10
 [<ffffffff8108ebc6>] ? resched_task+0x76/0x80
 [<ffffffffa00b583d>] btrfs_ioctl+0x69d/0x860 [btrfs]
 [<ffffffff81091e9a>] ? wake_up_new_task+0xfa/0x190
 [<ffffffff811ad38a>] do_vfs_ioctl+0x8a/0x340
 [<ffffffff811ad6d1>] sys_ioctl+0x91/0xb0
 [<ffffffff81058f96>] ? sys_clone+0x16/0x20
 [<ffffffff816fcc39>] ? stub_clone+0x69/0x90
 [<ffffffff816fc8dd>] system_call_fastpath+0x1a/0x1f
Code: 4e 48 83 c4 08 5b 5d c3 90 e8 fb fd ff ff eb e6 90 90 90 90 90 90 90 90 90 48 89 f8 48 89 d1 48 c1 e9 03 83 e2 07 f3 48 a5 89 d1 <f3> a4 c3 20 4c 8b 06 4c 8b 4e 08 4c 8b 56 10 4c 8b 5e 18 48 8d 
RIP  [<ffffffff813602d2>] memcpy+0x12/0x110
 RSP <ffff8800a5603ab0>
---[ end trace e0e80779c2fbf034 ]---


This causes the nfs-kernel-server to hang. It cannot be stopped or restarted. So far a hard reset is the only way we figured out to solve this.

We are using Ubuntu 12.04.3 LTS and btrfs-tools v0.20-rc1-358-g194aa4a.

The volumes are mounted with the following options: defaults,space_cache

The volumes are 49T and 45T large, about half of which is used.
Comment 1 Josef Bacik 2013-09-03 17:15:32 UTC
That kernel is too old, please either try btrfs-next or 3.11 and let me know if you can still reproduce.
Comment 2 moritz.stoetter 2013-09-04 10:38:52 UTC
We updated the kernel to 3.11.0-031100-generic. The error still occurs. Help is greatly appreciated.
Comment 3 moritz.stoetter 2013-09-04 11:34:32 UTC
The trace for kernel version 3.11 seems to be slightly different:

[ 1395.489447] general protection fault: 0000 [#1] SMP 
[ 1395.489462] Modules linked in: joydev hid_generic ttm drm_kms_helper drm usbhid i2c_algo_bit sysimgblt sp5100_tco psmouse hid sysfillrect ohci_pci amd64_edac_mod i2c_piix4 k10temp fam15h_power serio_raw syscopyarea edac_core edac_mce_amd mac_hid nfsd nfs_acl lp auth_rpcgss nfs parport fscache lockd sunrpc btrfs raid6_pq zlib_deflate xor libcrc32c e1000e ptp arcmsr pps_core
[ 1395.489551] CPU: 3 PID: 7830 Comm: btrfs Not tainted 3.11.0-031100-generic #201309021735
[ 1395.489562] Hardware name: empty empty/S8230, BIOS 'V2.05' 01/17/2013
[ 1395.489572] task: ffff881fed431770 ti: ffff881ff23a4000 task.ti: ffff881ff23a4000
[ 1395.489582] RIP: 0010:[<ffffffff81375412>]  [<ffffffff81375412>] memcpy+0x12/0x110
[ 1395.489600] RSP: 0018:ffff881ff23a5ab0  EFLAGS: 00010202
[ 1395.489608] RAX: ffff881ff23a5b26 RBX: 0000000000000002 RCX: 0000000000000002
[ 1395.489617] RDX: 0000000000000002 RSI: 0005080000000000 RDI: ffff881ff23a5b26
[ 1395.489626] RBP: ffff881ff23a5af8 R08: ffff881ff23a5b10 R09: ffff881ff23a5b18
[ 1395.489635] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000002
[ 1395.489644] R13: ffff881ff23a5b28 R14: ffff88077b8ea8d0 R15: 0000160000000000
[ 1395.489653] FS:  00007f93a962e740(0000) GS:ffff8807ffcc0000(0000) knlGS:0000000000000000
[ 1395.489663] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[ 1395.489672] CR2: 00007effb9756c62 CR3: 00000017d318c000 CR4: 00000000000407e0
[ 1395.489682] Stack:
[ 1395.489687]  ffffffffa00e951a 0000000000000000 ffff881ff2a77000 ffff88077b8ea800
[ 1395.489702]  0000000000000000 ffff88077b8ea800 000000000000002d 0000000000000fcd
[ 1395.489716]  0000000000000ffa ffff881ff23a5b58 ffffffffa00df49c ffff8807c6efb000
[ 1395.489731] Call Trace:
[ 1395.489779]  [<ffffffffa00e951a>] ? read_extent_buffer+0xba/0x120 [btrfs]
[ 1395.489817]  [<ffffffffa00df49c>] btrfs_get_token_64+0xdc/0xf0 [btrfs]
[ 1395.489854]  [<ffffffffa012ba26>] is_extent_unchanged.isra.21+0x276/0x3c0 [btrfs]
[ 1395.489892]  [<ffffffffa012cfbf>] process_extent+0xbf/0xf0 [btrfs]
[ 1395.489926]  [<ffffffffa012e5f0>] changed_cb+0xc0/0x100 [btrfs]
[ 1395.489954]  [<ffffffffa00a8ea5>] btrfs_compare_trees+0x885/0x8e0 [btrfs]
[ 1395.489989]  [<ffffffffa012867e>] ? write_buf+0x6e/0xc0 [btrfs]
[ 1395.490023]  [<ffffffffa012e530>] ? changed_inode+0x4a0/0x4a0 [btrfs]
[ 1395.490059]  [<ffffffffa012e964>] send_subvol+0xa4/0x160 [btrfs]
[ 1395.490093]  [<ffffffffa012ed5b>] btrfs_ioctl_send+0x33b/0x510 [btrfs]
[ 1395.490130]  [<ffffffffa00fd14b>] btrfs_ioctl+0x73b/0xa90 [btrfs]
[ 1395.490144]  [<ffffffff8109caf6>] ? account_user_time+0xa6/0xc0
[ 1395.490157]  [<ffffffff8109d134>] ? vtime_account_user+0x74/0x90
[ 1395.490169]  [<ffffffff811c47cc>] do_vfs_ioctl+0x7c/0x2f0
[ 1395.490182]  [<ffffffff810210a9>] ? syscall_trace_enter+0x29/0x270
[ 1395.490194]  [<ffffffff811c4ad1>] SyS_ioctl+0x91/0xb0
[ 1395.490206]  [<ffffffff81735cef>] tracesys+0xe1/0xe6
[ 1395.490214] Code: 4e 48 83 c4 08 5b 5d c3 90 e8 fb fd ff ff eb e6 90 90 90 90 90 90 90 90 90 48 89 f8 48 89 d1 48 c1 e9 03 83 e2 07 f3 48 a5 89 d1 <f3> a4 c3 20 4c 8b 06 4c 8b 4e 08 4c 8b 56 10 4c 8b 5e 18 48 8d 
[ 1395.490305] RIP  [<ffffffff81375412>] memcpy+0x12/0x110
[ 1395.490317]  RSP <ffff881ff23a5ab0>
[ 1395.490340] ---[ end trace 9783e77851f4a46f ]---
Comment 4 Josef Bacik 2013-09-04 15:30:50 UTC
can you run btrfsck on this fs and put the output in this bugzilla please?  You'll need the latest from git

git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-progs.git
Comment 5 moritz.stoetter 2013-09-05 06:10:35 UTC
Is btrfsck save to use? I am afraid we can't afford to damage the filesystem.
Comment 6 moritz.stoetter 2013-09-06 11:16:43 UTC
The output of btrfsck is as follows:

checking extents
checking free space cache
checking fs roots
root 260 inode 7976661 errors 400
root 46386 inode 7976661 errors 400
Checking filesystem on /dev/sdc
UUID: 4b2a3fb3-6228-41ec-83af-71efedb30c9d
cache and super generation don't match, space cache will be invalidated
found 3351429134818 bytes used err is 1
total csum bytes: 22055692088
total tree bytes: 114229071872
total fs tree bytes: 79489994752
total extent tree bytes: 4553723904
btree space waste bytes: 29677376916
file data blocks allocated: 33948690792448
 referenced 23165541474304
Btrfs v0.20-rc1-358-g194aa4a
Comment 7 moritz.stoetter 2013-09-06 11:18:50 UTC
Btw. This was written to kern.log upon unmounting the btrfs device. Maybe it's related?

[166484.338420] ------------[ cut here ]------------
[166484.338472] WARNING: CPU: 14 PID: 9187 at /home/apw/COD/linux/fs/btrfs/disk-io.c:3423 free_fs_root+0x99/0xa0 [btrfs]()
[166484.338476] Modules linked in: joydev hid_generic sp5100_tco ttm drm_kms_helper psmouse usbhid drm hid amd64_edac_mod i2c_algo_bit edac_core edac_mce_amd serio_raw sysimgblt sysfillrect k10temp fam15h_power i2c_piix4 syscopyarea ohci_pci mac_hid nfsd nfs_acl auth_rpcgss lp nfs fscache lockd parport sunrpc btrfs raid6_pq zlib_deflate xor libcrc32c e1000e ptp pps_core arcmsr
[166484.338529] CPU: 14 PID: 9187 Comm: umount Not tainted 3.11.0-031100-generic #201309021735
[166484.338534] Hardware name: empty empty/S8230, BIOS 'V2.05' 01/17/2013
[166484.338538]  0000000000000d5f ffff881fedda1cb8 ffffffff81720fca 0000000000000007
[166484.338545]  0000000000000000 ffff881fedda1cf8 ffffffff8106534c ffff881fedda1d08
[166484.338551]  ffff880fecf3f800 ffff880fecf3f800 ffff880ff4228000 ffff881fedda1d90
[166484.338557] Call Trace:
[166484.338571]  [<ffffffff81720fca>] dump_stack+0x46/0x58
[166484.338581]  [<ffffffff8106534c>] warn_slowpath_common+0x8c/0xc0
[166484.338587]  [<ffffffff8106539a>] warn_slowpath_null+0x1a/0x20
[166484.338612]  [<ffffffffa00bfa09>] free_fs_root+0x99/0xa0 [btrfs]
[166484.338637]  [<ffffffffa00c3603>] btrfs_drop_and_free_fs_root+0x93/0xc0 [btrfs]
[166484.338664]  [<ffffffffa00c36ff>] del_fs_roots+0xcf/0x130 [btrfs]
[166484.338690]  [<ffffffffa00c3a76>] close_ctree+0x146/0x270 [btrfs]
[166484.338700]  [<ffffffff811cd69e>] ? evict_inodes+0xce/0x130
[166484.338721]  [<ffffffffa009ad99>] btrfs_put_super+0x19/0x20 [btrfs]
[166484.338729]  [<ffffffff811b4bd2>] generic_shutdown_super+0x62/0xf0
[166484.338737]  [<ffffffff811b4cf6>] kill_anon_super+0x16/0x30
[166484.338756]  [<ffffffffa009d50a>] btrfs_kill_super+0x1a/0x90 [btrfs]
[166484.338763]  [<ffffffff811b551d>] deactivate_locked_super+0x4d/0x80
[166484.338769]  [<ffffffff811b5b9e>] deactivate_super+0x4e/0x70
[166484.338776]  [<ffffffff811d16b6>] mntput_no_expire+0x106/0x160
[166484.338781]  [<ffffffff811d2fc9>] SyS_umount+0xa9/0xf0
[166484.338789]  [<ffffffff81735cef>] tracesys+0xe1/0xe6
[166484.338793] ---[ end trace 1196686f16f1bff6 ]---
[166493.452882] VFS: Busy inodes after unmount of sdc. Self-destruct in 5 seconds.  Have a nice day...
Comment 8 Josef Bacik 2013-09-09 18:12:05 UTC
Can you do

gdb btrfs.ko

and when that comes up do

list *(is_extent_unchanged+0x276)

and post that to this bz.
Comment 9 sebastian.erhart 2013-09-21 11:26:19 UTC
(In reply to Josef Bacik from comment #8)
> Can you do
> 
> gdb btrfs.ko
> 
> and when that comes up do
> 
> list *(is_extent_unchanged+0x276)
> 
> and post that to this bz.

We've been sending the incremental snapshots now for a few days without failure. Yesterday we ran into the error again believing our debug tools to be set up properly to get some more information. Unfortunately the debug symbols we used did not match the running kernel. 

To overcome this fact, we compiled a 3.11 mainline kernel with the Ubuntu specific patches applied from [1], enabling the debug symbols via CONIFG_DEBUG_INFO=y and booting our file server with this kernel. 

Given my very limited experience in kernel debugging, could you please just confirm that there is nothing strange about the fact that all symbols in the call trace of comment 3 can now be found in /proc/kallsyms *except* the two symbols is_extent_unchanged and send_subvol?

We'll provide the requested output of 

 gdb /lib/modules/3.11.0-debug/kernel/fs/btrfs/btrfs.ko

as soon as we encounter the failure again.

[1] http://kernel.ubuntu.com/~kernel-ppa/mainline/v3.11-saucy/
Comment 10 David Sterba 2013-09-23 11:47:21 UTC
(In reply to sebastian.erhart from comment #9)
> Given my very limited experience in kernel debugging, could you please just
> confirm that there is nothing strange about the fact that all symbols in the
> call trace of comment 3 can now be found in /proc/kallsyms *except* the two
> symbols is_extent_unchanged and send_subvol?

That's fine, the symbols are declared static and not visible outside of the compiled unit and therefore not listed in kallsyms. The debugger is able to see all symbols, when you run the gdb command bouth should be there.
Comment 11 sebastian.erhart 2013-09-23 14:49:37 UTC
Thank you for your explanations concerning the symbol visibility. 

With the self-built 3.11 kernel (LOCALVERSION=debug), the kernel.log entry upon the failure looks again slightly different:

general protection fault: 0000 [#1] SMP
Modules linked in: nfsd sp5100_tco ttm nfs_acl auth_rpcgss drm_kms_helper drm nfs joydev hid_generic psmouse usbhid amd64_edac_mod i2c_algo_bit hid edac_core fscache sysimgblt i2c_piix4 fam15h_power sysfillrect k10temp ohci_pci serio_raw syscopyarea edac_mce_amd mac_hid lockd lp parport sunrpc btrfs raid6_pq zlib_deflate xor libcrc32c e1000e ptp arcmsr pps_core
CPU: 4 PID: 5181 Comm: btrfs Not tainted 3.11.0-debug #1
Hardware name: empty empty/S8230, BIOS 'V2.05' 01/17/2013
task: ffff881fbb3a1770 ti: ffff881e76ccc000 task.ti: ffff881e76ccc000
RIP: 0010:[<ffffffff81371fb2>]  [<ffffffff81371fb2>] memcpy+0x12/0x110
RSP: 0018:ffff881e76ccd980  EFLAGS: 00010202
RAX: ffff881e76ccd9f6 RBX: 0000000000000002 RCX: 0000000000000002
RDX: 0000000000000002 RSI: 0005080000000000 RDI: ffff881e76ccd9f6
RBP: ffff881e76ccd9c8 R08: ffff881e76ccd9e0 R09: ffff881e76ccd9e8
R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000002
R13: ffff881e76ccd9f8 R14: ffff881c2d1efa50 R15: 0000160000000000
FS:  00007fb7843a4740(0000) GS:ffff880fffc00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 00007fd0f1811aa0 CR3: 0000000fa7c10000 CR4: 00000000000407e0
Stack:
 ffffffffa00dff4a ffff881e76ccdab0 0000000000000000 ffff881c2d1ef980
 0000000000000000 ffff881c2d1ef980 000000000000002d 0000000000000fcd
 0000000000000ffa ffff881e76ccda28 ffffffffa00d5f6c ffff880f177e9000
Call Trace:
 [<ffffffffa00dff4a>] ? read_extent_buffer+0xba/0x120 [btrfs]
 [<ffffffffa00d5f6c>] btrfs_get_token_64+0xdc/0xf0 [btrfs]
 [<ffffffffa0121d80>] process_extent+0x960/0x1050 [btrfs]
 [<ffffffffa0123853>] changed_cb+0xe3/0x930 [btrfs]
 [<ffffffffa0121080>] ? __record_deleted_ref+0xe0/0xe0 [btrfs]
 [<ffffffffa00d5d04>] ? btrfs_get_token_32+0x64/0xf0 [btrfs]
 [<ffffffffa009fbc9>] btrfs_compare_trees+0x899/0x9b0 [btrfs]
 [<ffffffffa0123770>] ? process_all_refs+0x190/0x190 [btrfs]
 [<ffffffffa0124963>] btrfs_ioctl_send+0x8c3/0xc50 [btrfs]
 [<ffffffff81364676>] ? cpumask_next_and+0x36/0x50
 [<ffffffffa00f33dc>] btrfs_ioctl+0x13fc/0x1a40 [btrfs]
 [<ffffffff81048d7e>] ? physflat_send_IPI_mask+0xe/0x10
 [<ffffffff811c029b>] do_vfs_ioctl+0x8b/0x4f0
 [<ffffffff8109b664>] ? vtime_account_user+0x74/0x90
 [<ffffffff811c0791>] SyS_ioctl+0x91/0xb0
 [<ffffffff8171a72f>] tracesys+0xe1/0xe6
Code: 4e 48 83 c4 08 5b 5d c3 90 e8 eb fb ff ff eb e6 90 90 90 90 90 90 90 90 90 48 89 f8 48 89 d1 48 c1 e9 03 83 e2 07 f3 48 a5 89 d1 <f3> a4 c3 20 4c 8b 06 4c 8b 4e 08 4c 8b 56 10 4c 8b 5e 18 48 8d
RIP  [<ffffffff81371fb2>] memcpy+0x12/0x110
 RSP <ffff881e76ccd980>
---[ end trace 498bf4eeaf0e6bff ]---

Note that the symbol is_extent_unchanged does not occur anymore in the call trace. The output of 

gdb /lib/modules/3.11.0-debug/kernel/fs/btrfs/btrfs.ko

for the symbols mentioned in the call trace reads as follows:

list *(read_extent_buffer+0xba)
0x50f4a is in read_extent_buffer (fs/btrfs/extent_io.c:4776).
4771    in fs/btrfs/extent_io.c
list *(btrfs_get_token_64+0xdc)
0x46f6c is in btrfs_get_token_64 (fs/btrfs/struct-funcs.c:134).
129     fs/btrfs/struct-funcs.c: No such file or directory.
list *(process_extent+0x960)
0x92d80 is in process_extent (fs/btrfs/ctree.h:2046).
2041    fs/btrfs/ctree.h: No such file or directory.
list *(changed_cb+0xe3)
0x94853 is in changed_cb (fs/btrfs/send.c:4357).
4352    fs/btrfs/send.c: No such file or directory.
list *(__record_deleted_ref+0xe0)
0x92080 is in __record_changed_deleted_ref (fs/btrfs/send.c:3182).
3177    in fs/btrfs/send.c
list *(btrfs_get_token_32+0x64)
0x46d04 is in btrfs_get_token_32 (fs/btrfs/struct-funcs.c:133).
128     fs/btrfs/struct-funcs.c: No such file or directory.
list *(btrfs_compare_trees+0x899)
0x10bc9 is in btrfs_compare_trees (fs/btrfs/ctree.c:5342).
5337    fs/btrfs/ctree.c: No such file or directory.
list *(process_all_refs+0x190)
0x94770 is in changed_cb (fs/btrfs/send.c:4375).
4370    fs/btrfs/send.c: No such file or directory.
list *(btrfs_ioctl_send+0x8c3)
0x95963 is in btrfs_ioctl_send (fs/btrfs/send.c:4533).
4528    in fs/btrfs/send.c
list *(cpumask_next_and+0x36)
No symbol "cpumask_next_and" in current context.

We further encountered a different behaviour of the btrfs kernel objects concerning a "nm btrfs.ko | grep is_extent_unchanged". For 3.11.0-031100-generic we have the output "00000000000937b0 t is_extent_unchanged.isra.21" while for 3.11.0-debug this symbol does not appear. We used "git checkout v3.11" to compile the 3.11 mainline. Do you have any idea what could cause the difference in the btrfs module?
Comment 12 David Sterba 2013-09-24 17:10:16 UTC
$ cd fs/btrfs/
$ gdb btrfs.ko

I'm suspecting some random corruption that crashes inside memcpy when wrong data get there.

We have seen that in 3.11 bluetooth module causes random memory corruptions that lead to similar crash (overwriting random items in inode data), though your first report is from 3.8.

comment 6:
> root 260 inode 7976661 errors 400
> root 46386 inode 7976661 errors 400

Please locate the file with inode number 7976661, either by 'btrfs inspect ino' or via slower 'find /mnt -inum 7976661', and attach 'stat' output (filename is not important).

However, if it's something like /mnt/@/var/lib/bluetooth/64:27:37:90:1E:7E/lastseen then, we know who to blame.
Comment 13 sebastian.erhart 2013-09-25 07:25:16 UTC
The file with the inode number 7976661 is a plain text file in one of our user's home folder. The output of stat is as follows:

$ stat /mnt/home/user/folder/file.txt

File: `/mnt/home/user/folder/file.txt'
  Size: 303             Blocks: 8          IO Block: 4096   regular file
Device: 1ch/28d Inode: 7976661     Links: 1
Access: (0600/-rw-------)  Uid: ( 2706/ user)   Gid: (  500/   staff)
Access: 2013-04-22 04:28:20.083286219 +0200
Modify: 2013-09-12 18:10:19.747026874 +0200
Change: 2013-09-12 18:10:19.746663849 +0200
 Birth: -

Any suggestions on the further proceeding in such a case are very welcome.
Comment 14 sebastian.erhart 2013-10-10 16:11:46 UTC
We deleted the file with the inode number 7976661 and ran into the same issue again when sending the incremental backup. Please find below the kernel trace and the corresponding output of > gdb btrfs.ko <. Meanwhile we changed the mount options of the btrfs volume to > defaults,noatime,space_cache <. Scrubbing the volume finishes without any errors. Do you have any further idea on what goes wrong here? 

general protection fault: 0000 [#1] SMP 
Modules linked in: nfsd nfs_acl auth_rpcgss ttm sp5100_tco nfs drm_kms_helper drm fscache joydev psmouse hid_generic usbhid amd64_edac_mod edac_core i2c_algo_bit lockd hid sysimgblt sysfillrect i2c_piix4 fam15h_power syscopyarea ohci_pci edac_mce_amd serio_raw k10temp mac_hid lp parport sunrpc btrfs raid6_pq zlib_deflate xor libcrc32c e1000e ptp arcmsr pps_core
CPU: 8 PID: 11207 Comm: btrfs Not tainted 3.11.0-debug #1
Hardware name: empty empty/S8230, BIOS 'V2.05' 01/17/2013
task: ffff881dd8220000 ti: ffff881df4b7a000 task.ti: ffff881df4b7a000
RIP: 0010:[<ffffffff81371fb2>]  [<ffffffff81371fb2>] memcpy+0x12/0x110
RSP: 0018:ffff881df4b7b980  EFLAGS: 00010202
RAX: ffff881df4b7b9f6 RBX: 0000000000000002 RCX: 0000000000000002
RDX: 0000000000000002 RSI: 0005080000000000 RDI: ffff881df4b7b9f6
RBP: ffff881df4b7b9c8 R08: ffff881df4b7b9e0 R09: ffff881df4b7b9e8
R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000002
R13: ffff881df4b7b9f8 R14: ffff880d103aa150 R15: 0000160000000000
FS:  00007fe4edc1e740(0000) GS:ffff88201ec00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 000000000061cd64 CR3: 00000017a8ed7000 CR4: 00000000000407e0
Stack:
 ffffffffa00dff4a ffff881df4b7bab0 0000000000000000 ffff880d103aa080
 0000000000000000 ffff880d103aa080 000000000000002d 0000000000000fcd
 0000000000000ffa ffff881df4b7ba28 ffffffffa00d5f6c ffff880bdfa47000
Call Trace:
 [<ffffffffa00dff4a>] ? read_extent_buffer+0xba/0x120 [btrfs]
 [<ffffffffa00d5f6c>] btrfs_get_token_64+0xdc/0xf0 [btrfs]
 [<ffffffffa0121d80>] process_extent+0x960/0x1050 [btrfs]
 [<ffffffffa0123853>] changed_cb+0xe3/0x930 [btrfs]
 [<ffffffffa0121080>] ? __record_deleted_ref+0xe0/0xe0 [btrfs]
 [<ffffffffa00d5d04>] ? btrfs_get_token_32+0x64/0xf0 [btrfs]
 [<ffffffffa009fbc9>] btrfs_compare_trees+0x899/0x9b0 [btrfs]
 [<ffffffffa0123770>] ? process_all_refs+0x190/0x190 [btrfs]
 [<ffffffffa0124963>] btrfs_ioctl_send+0x8c3/0xc50 [btrfs]
 [<ffffffff81364676>] ? cpumask_next_and+0x36/0x50
 [<ffffffffa00f33dc>] btrfs_ioctl+0x13fc/0x1a40 [btrfs]
 [<ffffffff81048d7e>] ? physflat_send_IPI_mask+0xe/0x10
 [<ffffffff811c029b>] do_vfs_ioctl+0x8b/0x4f0
 [<ffffffff8109b664>] ? vtime_account_user+0x74/0x90
 [<ffffffff811c0791>] SyS_ioctl+0x91/0xb0
 [<ffffffff8171a72f>] tracesys+0xe1/0xe6
Code: 4e 48 83 c4 08 5b 5d c3 90 e8 eb fb ff ff eb e6 90 90 90 90 90 90 90 90 90 48 89 f8 48 89 d1 48 c1 e9 03 83 e2 07 f3 48 a5 89 d1 <f3> a4 c3 20 4c 8b 06 4c 8b 4e 08 4c 8b 56 10 4c 8b 5e 18 48 8d 
RIP  [<ffffffff81371fb2>] memcpy+0x12/0x110
 RSP <ffff881df4b7b980>
---[ end trace a0ef2243ebd5cf3d ]---



list *(read_extent_buffer+0xba)
0x50f4a is in read_extent_buffer (fs/btrfs/extent_io.c:4776).
4771         WARN_ON(start > eb->len);
4772         WARN_ON(start + len > eb->start + eb->len);
4773 
4774         offset = (start_offset + start) & ((unsigned long)PAGE_CACHE_SIZE - 1);
4775 
4776         while (len > 0) {
4777                 page = extent_buffer_page(eb, i);
4778 
4779                 cur = min(len, (PAGE_CACHE_SIZE - offset));
4780                 kaddr = page_address(page);

list *(changed_cb+0xe3)
0x94853 is in changed_cb (fs/btrfs/send.c:4357).
4352 
4353         BUG_ON(sctx->cur_ino != sctx->cmp_key->objectid);
4354 
4355         if (!sctx->cur_inode_new_gen && !sctx->cur_inode_deleted) {
4356                 if (result != BTRFS_COMPARE_TREE_DELETED)
4357                         ret = process_extent(sctx, sctx->left_path,
4358                                         sctx->cmp_key);
4359         }
4360 
4361         return ret;
Comment 15 Josef Bacik 2013-10-31 20:51:13 UTC
Please test this patch

[PATCH] Btrfs: check file extent type before anything else

and verify it fixes your problem.  If it does not, please re-open this bugzilla.

Note You need to log in before you can comment on or make changes to this bug.