Hi I don't actually know that this is a Virtual machine problem but I've only seen the problem so far in a VM of mine, specifically when I try to do an xfsdump. I think 100% of the time when I do and XFS dump (without changing any kernel paramters from the fedora 23 defaults), xfsdump fails and I get the following errors pasted below in the log. I don't know if this is a regression or not but it occurs with kernel 4.4.4 and kernel 4.4.3. I just booted with the fedora rescue kernel which is 4.2.8. This problem does not seem to happen with the rescue kernel, either because the issue is a regression or because there are less modules doing stuff in the rescue kernel (the cifs module for instance won't be loaded in the rescue kernel). I have also noticed that if I set vm.swappiness to 1 or 0, frequency of the problem is reduced but the problem is not outright solved. If I have posted to the wrong place then my apologies but please tell me where I really should be sending this bug report. Thank you for your time. Michael Walton Here are the log entries Mar 11 22:29:11 ourdeploy kernel: XFS (vdb1): Mounting V5 Filesystem Mar 11 22:29:11 ourdeploy kernel: XFS (vdb1): Starting recovery (logdev: internal) Mar 11 22:29:11 ourdeploy kernel: XFS (vdb1): Ending recovery (logdev: internal) Mar 11 22:29:17 ourdeploy audit[1]: SERVICE_STOP pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=rolekit comm="systemd" exe="/us r/lib/systemd/systemd" hostname=? addr=? terminal=? res=success' Mar 11 22:37:14 ourdeploy kernel: page:ffffea00000267c0 count:0 mapcount:-127 mapping: (null) index:0x0 Mar 11 22:37:14 ourdeploy kernel: flags: 0x1ffff800000000() Mar 11 22:37:14 ourdeploy kernel: page dumped because: VM_BUG_ON_PAGE(atomic_read(&page->_count) == 0) Mar 11 22:37:14 ourdeploy kernel: ------------[ cut here ]------------ Mar 11 22:37:14 ourdeploy kernel: kernel BUG at include/linux/mm.h:342! Mar 11 22:37:14 ourdeploy kernel: invalid opcode: 0000 [#1] SMP Mar 11 22:37:14 ourdeploy kernel: Modules linked in: nls_utf8 cifs dns_resolver fscache nf_conntrack_netbios_ns nf_conntrack_broadcas t nf_conntrack_tftp ip6t_rpfilter ip6t_REJECT nf_reject_ipv6 xt_conntrack ip_set nfnetlink ebtable_broute bridge stp llc ebtable_filt er ebtable_nat ebtables ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 ip6table_raw ip6table_mangle ip6table_security ip6t able_filter ip6_tables iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack iptable_raw iptable_mangle iptabl e_security snd_hda_codec_generic snd_hda_intel snd_hda_codec ppdev snd_hda_core iosf_mbi crct10dif_pclmul snd_hwdep crc32_pclmul snd_ pcm joydev snd_timer virtio_balloon parport_pc snd parport acpi_cpufreq i2c_piix4 soundcore tpm_tis tpm nfsd nfs_acl lockd grace auth _rpcgss sunrpc xfs libcrc32c virtio_console Mar 11 22:37:14 ourdeploy kernel: virtio_net virtio_blk qxl drm_kms_helper ttm drm crc32c_intel serio_raw virtio_pci ata_generic vir tio_ring virtio pata_acpi Mar 11 22:37:14 ourdeploy kernel: CPU: 0 PID: 1458 Comm: xfsdump Not tainted 4.4.4-301.fc23.x86_64 #1 Mar 11 22:37:14 ourdeploy kernel: Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.8.2-20150714_191134- 04/01/2014 Mar 11 22:37:14 ourdeploy kernel: task: ffff88003e1d9e00 ti: ffff880028ccc000 task.ti: ffff880028ccc000 Mar 11 22:37:14 ourdeploy kernel: RIP: 0010:[<ffffffff811b5008>] [<ffffffff811b5008>] __free_pages+0x38/0x40 Mar 11 22:37:14 ourdeploy kernel: RSP: 0018:ffff880028ccfa48 EFLAGS: 00010246 Mar 11 22:37:14 ourdeploy kernel: RAX: 0000000000000044 RBX: ffff88003f547900 RCX: 0000000000000006 Mar 11 22:37:14 ourdeploy kernel: RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff880042c0dff0 Mar 11 22:37:14 ourdeploy kernel: RBP: ffff880028ccfa48 R08: 0000000000000000 R09: 0000000000000254 Mar 11 22:37:14 ourdeploy kernel: R10: 0000000000000001 R11: 0000000000000254 R12: 0000000000000001 Mar 11 22:37:14 ourdeploy kernel: R13: ffffffffa0180d4d R14: ffff880028ccfb38 R15: 0000000000000001 Mar 11 22:37:14 ourdeploy kernel: FS: 00007f57b020a780(0000) GS:ffff880042c00000(0000) knlGS:0000000000000000 Mar 11 22:37:14 ourdeploy kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Mar 11 22:37:14 ourdeploy kernel: CR2: 00007f21861ec024 CR3: 0000000030025000 CR4: 00000000000406f0 Mar 11 22:37:14 ourdeploy kernel: Stack: Mar 11 22:37:14 ourdeploy kernel: ffff880028ccfa70 ffffffffa017fed3 ffff880036d073c0 0000000000000000 Mar 11 22:37:14 ourdeploy kernel: 0000000000010015 ffff880028ccfab0 ffffffffa0180d4d ffff88003f547900 Mar 11 22:37:14 ourdeploy kernel: 0000000000010015 0000000000000001 0000000000010014 ffff880036d073c0 Mar 11 22:37:14 ourdeploy kernel: Call Trace: Mar 11 22:37:14 ourdeploy kernel: [<ffffffffa017fed3>] xfs_buf_free+0x73/0x130 [xfs] Mar 11 22:37:14 ourdeploy kernel: [<ffffffffa0180d4d>] xfs_buf_get_map+0x22d/0x280 [xfs] Mar 11 22:37:14 ourdeploy kernel: [<ffffffffa018187d>] xfs_buf_read_map+0x2d/0x180 [xfs] Mar 11 22:37:14 ourdeploy kernel: [<ffffffffa0181a22>] xfs_buf_readahead_map+0x52/0x70 [xfs] Mar 11 22:37:14 ourdeploy kernel: [<ffffffffa0158bc1>] xfs_btree_reada_bufs+0x61/0x80 [xfs] Mar 11 22:37:14 ourdeploy kernel: [<ffffffffa01962d8>] xfs_bulkstat_ichunk_ra.isra.4+0xe8/0x140 [xfs] Mar 11 22:37:14 ourdeploy kernel: [<ffffffffa016e8b3>] ? xfs_inobt_get_rec+0x33/0xc0 [xfs] Mar 11 22:37:14 ourdeploy kernel: [<ffffffffa0196973>] xfs_bulkstat+0x303/0x670 [xfs] Mar 11 22:37:14 ourdeploy kernel: [<ffffffff81244c3d>] ? __dentry_kill+0x13d/0x1b0 Mar 11 22:37:14 ourdeploy kernel: [<ffffffffa0196640>] ? xfs_bulkstat_one_int+0x310/0x310 [xfs]
Created attachment 209201 [details] Test Patch
Seems your warn may be triggered due the page count being set to zero when certain functions are called thus causing a the virtual memory management subsystem to complain when this happens. If you can build a kernel with the patch I just created in order to make sure that this is the underlying issue.
(In reply to nickkrause from comment #2) > Seems your warn may be triggered due the page count being set to zero when > certain functions are called thus causing a the virtual memory management > subsystem to complain when this happens. If you can build a kernel with the > patch I just created in order to make sure that this is the underlying > issue. Hello nickrause, I compiled the kernel with your proposed patch. Sadly I don't believe it helped (except to exclude one angle of attack from consideration). During the entire boot I only got "This page has a refcount of 1" and never any other number.
Created attachment 209481 [details] Journal entires with stack trace
Created attachment 209501 [details] Test Patch
Ok that's good for new then at least I known there was a refcount to the pages allocated by xfs now. Let's see if this patch does anything. Seems them that there may be a double free lower with: _xfs_buf_free_pages(bp);
Michael, do not run with that second patch. Nick is not knowledgeable about xfs, and is likely to you do more harm than good.
Eric, Rather then make some assumptions about my work, why don't you actually read the conversation between me and Michael. Grow up and stop making assumptions about my work, this is getting very annoying and you and others are acting like high school girls feeding the community about me. But I guess it's too hard for you to stop your assumptions and grow up.
Nick, I read the conversation and I read your patch. It will cause a memory leak, and nothing else.
No your absolutely right about that patch. I misread and though a _free_page call was happening their :). On the other hand it's clear where hitting a bug due to a page count of zero being due to a call to __free_pages. P.S. Sorry about losing my temper.
Eric, Tell me if this makes more sense: error = xfs_buf_allocate_memory(new_bp, flags); if (error) { xfs_buf_free(new_bp); return NULL; } Seems this could be a error if the allocation fails as the page count should be zero. This means that calling xfs_buf_free here double unfrees the buffer's pages.
Hi again everyone, I reverted to kernel 4.2.3 and xfsdump worked six times in a row, so I am considering this to most definitely be a regression.
Does your 4.2.3 tree have this git commit with id,12d0714d4bdd591d9cd3bce692c831da2c2a0cfc.
Hi Nick, before I contemplate questions like the one you're asking I want to figure out exactly at which kernel the regression occurs, so later today or tommorrow I'll try out kernels one by one until I establish exactly when the regression occured. The reason why I chose 4.2.3 for the initial test is because its the base kernel of Fedora 23. I'm guessing that 4.3.1 is going to be the point of regression but we'll see.
Thanks Michael - we haven't really had time to dig into this one yet, so doing more bisecting on your end will be helpful to narrow down where things changed.
I agree with Eric. Can you try and bisect this issue as it seems to be a regression from other a year ago.
Hello again kernel pros, I have tried out a bunch of different kernels now. To make a long story short I installed 4.3.6 and encountered no bug and when I unstalled kernel 4.4, I did encounter the bug. I installed kernel 4.3.6 from https://kojipkgs.fedoraproject.org/packages/kernel/4.3.6/201.fc22/ and kernel 4.4 from https://kojipkgs.fedoraproject.org/packages/kernel/4.4.0/1.fc24/ (Neither one was specifically an f23 kernel but I don't think that matters). Therefore, in my holy opinion, the problem was introduced in kernel 4.4. BTW thanks for looking into this.
Try getting a mainline kernel and reverting before the commit id, b79f4a1c68bb99152d0785ee4ea3ab4396cdacc6.
Hi Nick, I'm not particularly familiar with git and bisecting kernels so if I'm saying something idiotic than I'll admit it in advance....At any rate I looked at commit b79f4a1c68bb99152d0785ee4ea3ab4396cdacc6 and it's tagged as 4.5 so I don't understand why I would revert this particular one.
Basically a tag means that it was in that version not when it was commited. I looked at the commit logs and lined them up with your sorta of bisect plus trace to find what seemed like a possible commit causing your regression.
Hi others i'm still working on bisecting. Hopefully will report with something very concrete soon (hopefully).
Created attachment 210291 [details] Git bisect log
Hi again kernel wizards, I used git bisect on the development kernel (or tried to anyway). Supposing that I haven't screwed up, the commit where things break is: f77cf4e4cc9d40310a7224a1a67c733aeec78836 This isn't an xfs commit so either that commit has a bug or it exposes a bug in xfs or at the very least it subtly breaks xfs somehow (if I haven't screwed up my bisection of course). Thanks, I'm hoping that you guys and look into this now.
guys, I apologise I am wrong. I was doing seven xfsdumps in a row as a test of goodness, but it turns out that seven in a row isn't enough. Sorry for this for posting too soon :(
Well, I kept at it. Here where things stand now. Two repeat bisection attempts have landed me at d0164adc89f6bb374d304ffcc375c6d2652fe67d I really can produce the bug here and I really cannot produce it at the previous commit (11 xfsdumps with swappiness at 0 and no kernel errors).
Created attachment 210581 [details] Revert Commit
See if reverting the commit with my just attached patch fixes the issue.
Michael, it's probably obvious but don't bother with Nick's patch, because a) you already tested with that patch properly removed, and b) the attached patch quite obviously won't compile. Thanks for the bisect; I'm on vacation this week so I really don't have time to look very closely, but that's an interesting failure point. It might be obvious to others why it would cause a problem, but not to me, yet. Michel, can you tell from your backtraces & your binary, which of the 3 calls to xfs_buf_free in xfs_buf_get_map led to the BUG?
Well.... I wanted to take Eric's advice first just because I consider Eric to be the authority here. (BTW, Eric I don't understand why you are being almost abusive toward Nick who hasn't demonstrated one iota of mean-spiritedness, but perhaps there is back-story that I am not familiar with...) So not being a kernel hacker and not really being familiar with modern usage of C, I inserted the following code: #pragma GCC push_options #pragma GCC optimize ("O0") void xfs_buf_free1(xfs_buf_t *bp) { xfs_buf_free(bp); } void xfs_buf_free2(xfs_buf_t *bp) { xfs_buf_free(bp); } void xfs_buf_free3(xfs_buf_t *bp) { xfs_buf_free(bp); } #pragma GCC pop_options That way I could call specific ones from xfs_buf_get_map and hopefully see which one got called without the functions getting optimised out. Well this is what happened: I couldn't produce the bug any more...I've done a total of 19 xfsdumps of the root file system in a row now.. This leads me to speculate that underlying this all may be some sort of compiler optimimization problem affecting xfs_buf.c (at least for code running on VM processors).Anyway this is all just speculation by an idiot. I don't know what's going on but if things work, they work so I'll consider this a provisional fix until I know better.
Argh, sorry again. I posted findings to early again :(. I kept on doing xfsdumps and it went finally wrong at the 25th dump....I guess this makes my whole bisection now suspect. Anyway it was the third call to xfs_buf_free that did it since xfs_buf_free3 appeared in the stack trace.
Michael, thanks for the update. Knowing that it's the 3rd call definitely narrows down the issue. In this case we have searched for a cached buffer, failed to find one, allocated a new one, looked up the same range again, this time we found a cached buffer which appeared in the interim, freed the one we just allocated, and hit your BUG_ON when we are freeing its associated pages. This seems a bit odd, because on this path nothing should have happened with the pages in question; they were simply allocated and almost immediately freed, as far as I can tell. I'll think on this some more. (It's not my intention to be hostile to Nick, only to save you the wasted time that many others have suffered as a result of Nick's unwillingness to even compile-test patches before he proposes them for subsystems which he does not understand. If there is more than a whiff of annoyance, it is only because the behavior has gone on for a very long time, despite reasonable politeness and attempts at mentoring in the beginning. I won't spend more time on this, but a bit of googling will turn up the back story).
Michael, one other thing that might be interesting, since you are running Fedora kernels, is to grab the kernel-debug variant of a failing version, which has more memory debugging turned on and will better detect a use after free etc.
Doing more tests. It's not only the third call to xfs_buf_free. It turns out that most of the time it's the first (I haven't seen a fail on the second call yet). I've attached another stack trace. It's probably not that useful. (Note there are some functions that end in a 1. These are functions that I added that call the identically-named ones without the 1, just to make sure i get a complete stack trace.
Created attachment 210781 [details] Stack trace of fail on first call of xfs_buf_free1
Created attachment 210921 [details] New trace
Hi again, I'm running the upstream kernel for debugging purposes but I enabled a whole bunch of memory debug flags. I think that I caught one extra thing this time round. I don't think I've seen this exactly: invalid opcode: 0000 [#1] SMP DEBUG_PAGEALLOC Doing a bit of googling leads me to think that maybe turning off C-states on the host might help. Does anyone think that might be useful? I have to be physically present at the server do it so all I can do is speculate it right now.
Also turning off hyperthreading is the other thing
OK no luck playing with bios settings or messing with C state kernel paramaters . I'm relenting and switching to weekly tar archive for backing up in my vm's. I'll just keep my eyes peeled and hope that there's some kind of answer to this in the future. BTW I don't necessarily believe that this is an xfs bug now (It could be a hardware issue) but xfsdump is exposing whatever it is in a big way.
If you'd like to provide an xfs_metadump image of the fs, that might be a first step to trying to reproduce it here. Can provide it on a side-channel if you'd rather not attach it to the bug. With that, your xfsdump invocation, and some machine details, we might be able to reproduce it and make more progress. Thanks, -Eric
Created attachment 211331 [details] hardware_info
Created attachment 211341 [details] xml dump of vm
(In reply to Eric Sandeen from comment #39) > If you'd like to provide an xfs_metadump image of the fs, that might be a > first step to trying to reproduce it here. > > Can provide it on a side-channel if you'd rather not attach it to the bug. > With that, your xfsdump invocation, and some machine details, we might be > able to reproduce it and make more progress. > > Thanks, > -Eric Hi folks, I am happy to provide any info you want. I would have submitted the xfs_metadump but it's way bigger than the maximum allowable attachment size. So far I have given you the VM spec from virsh xmldump and I've give you my hardware info. Note that for a file system I've got two sata drives turned into striping raid. The raid system is then the made the sole physical volume of logical volume group. The vm's backing store is one of those logical volumes, and the vm sees it as a hard drive which it uses as a physical volume in its own logical volume group. The vm's root file system occupies a logical volume in the vm's volume group. I have only seen this issue in the VM. XFS dump is working fine on the host. Please tell me where I can send the meta_dump and I will happily oblige. BTW I am also happy to send along the entire root file system as an xfsdump file (xfsdump from the gparted disk always worked 100%) after I exclude the sensitive files (private keys and keytab files) from the dump.
The metadump should compress well, but you might want to provide it on a side channel. There shouldn't be any info leaks with current xfsprogs, but an abundance of caution makes sense. Something like dropbox or google drive might be a decent temporary place to host the compressed metadump image. Only failing in the VM is ... odd.
(In reply to Eric Sandeen from comment #43) > The metadump should compress well, but you might want to provide it on a > side channel. There shouldn't be any info leaks with current xfsprogs, but > an abundance of caution makes sense. Something like dropbox or google drive > might be a decent temporary place to host the compressed metadump image. > > Only failing in the VM is ... odd. I've shared the metadump with you on google drive.
Thanks, I grabbed it. Can you please also share the xfsdump commandline you used. I will try to get to this eventually but would be good to collect all info.
(In reply to Eric Sandeen from comment #45) > Thanks, I grabbed it. Can you please also share the xfsdump commandline you > used. I will try to get to this eventually but would be good to collect all > info. Hi Eric, Specifically, the command I used was: xfsdump -f /mnt/spillover/dumptest2 / For testing purposes I added a second lv as a backing store to the VM, then, inside the VM I paritioned it with a single partition, and then I put an xfs file system on the partition then I mounted it in /mnt/spillover in the VM. Note that I initially encountered the bug when I was trying to dump to a file in an external CIFS file system so the choice of where to dump to doesn't seem to be a factor. I started testing dumping to local storage because I wanted to keep my testing relatively uncomplicated.