Upgrading a Fedora system using bcache to Fedora 30 results in massive filestem corruption. This is caused specifically caused by kernels built for Fedora 30, while the same kernels built for Fedora 29 work just fine. A notable difference between Fedora 29 and Fedora 30 is the fact that Fedora 29 uses gcc8 and Fedora uses gcc9.
More details can be found in the linked Fedora bug report.
Same issue for me.
I can confirm this issue
Please look at https://bugzilla.redhat.com/show_bug.cgi?id=1708315 to see the progresses that have been made (Thahks to Rolf). In summary : The problem occurs only if the kernel has been compiled with gcc 9.x.y. The problem doesn't occur in "none" mode, but in writethrough or in writeback mode. This shows that the frontend metadata is not read properly (driver returns a wrong block if the block data is present in the frontend).
I confirm the issue kernel 5.1.3 built with gcc 9.1.0 causes data loss when bcache has cache device. Without attached cache device it works.
Created attachment 283141 [details] preprocessed output of bch_btree_insert_key() I believe I may have identified the issue. The problematic file for me was bset.o (when bset.o was compiled by GCC 8.3.0 and the rest by GCC 9.1.0, the issue disappeared). The bug was present even when bset.o was compiled by GCC 9.1.0 with -O0 (I had to comment out pr_debug() in btree_mergesort() to make it compile). Looking at the assembly of bset.o, the only obvious changes were in btree_mergesort() and bch_btree_insert_key(). I then generated the assembly output of bset.c by both GCC versions. When the code of bch_btree_insert_key() generated by GCC 9 was replaced by the code from GCC 8, the issue disappeared again. Further examination of the assembly revealed that the uninitialized space of `iter` is now shared by another temporary (whereas previously they occupied two separate locations, at least in the assembly compiled with -O0). As far as I can see, the real issue is in the PRECEDING_KEY() macro; it returns an address of a temporary which goes out of scope at the end of the compound statement expresion (see the attached preprocessed version of bch_btree_insert_key()). As a result, the arguments `iter` and `search` of bch_btree_iter_init() point to overlapping memory locations. When bch_btree_iter_init() writes to `iter`, `search` is clobbered. I did only a very limited testing in a virtual machine, with the cache and the backing device both being loop devices backed by files on tmpfs. The actual test consisted of formatting /dev/bcache0 to Btrfs and mounting it: # mkfs.btrfs -f /dev/bcache0 # mount /dev/bcache0 /mnt # <-- The failure was here. # umount /mnt There wasn't any success reproducing it with the fio script provided in Red Hat's Bugzilla.
Congratulations Lukãs, Could you send a pacth, such a way we could test it ? Thank you
Created attachment 283145 [details] quick patch for the issue I was just initializing `iter` to zero in bch_btree_insert_key() to force the compiler to not reuse the memory location (it worked for -O0) like so: struct btree_iter iter = {0}; Anyway, you can test the attached patch I made just now.
This doesn't remove the problem of keeping a pointer to a temporary for too long - the pointer on line 444 is dangling when used on line 446. I'm working on a patch that removes all uses of &KEY(...) because I find the exact rules for when it is OK confusing.
I don't think that's the case, the KEY() macro does not use compound statement expression (it's not in its own block, this should be clearly visible from the preprocessed source), so the temporary from it should last until the end of the inline function.
You are right, compound literals last until the end of the block they are created in. Anyway, I've tested your patch, and things looked fine. I will do more testing tomorrow.
Here I post my reply on linux-bcache@mailing list to here. Thanks to Rolf Fokkens <rolf@rolffokkens.nl> to notice me for this bugzilla entry. Hi folks, From the above bugzilla.kernel.org links, it seems people are all on the same/correct track :-) I don't analyze the assembly code, because I can stably reproduce the problem: two adjacent keys don't merge. And I find the problem is from macro PRECEDING_KEY(), which misleading bch_btree_insert_key() when it calls bch_btree_iter_init(). Misleading means returned 'm' always indicates the inserting key is the first key in the btree node because prev in the while-loop is NULL. For the second adjacent key, obviously prev point should not be NULL, so the suspicious point is PRECEDING_KEY(), its code is, 437 #define PRECEDING_KEY(_k) \ 438 ({ \ 439 struct bkey *_ret = NULL; \ 440 \ 441 if (KEY_INODE(_k) || KEY_OFFSET(_k)) { \ 442 _ret = &KEY(KEY_INODE(_k), KEY_OFFSET(_k), 0); \ 443 \ 444 if (!_ret->low) \ 445 _ret->high--; \ 446 _ret->low--; \ 447 } \ 448 \ 449 _ret; \ 450 }) The problem is at line 442, KEY() announced a on-stack temporary variable in struct bkey {high = KEY_INODE(_k), low = KEY_OFFSET(_k), ptr = NULL}. But its life range is in line 442-447. When address of this on-stack variable returns from PRECEDING_KEY() and sent into bch_btree_iter_init(), this stack space of this on-stack variable is overwritten by stackframe of bch_btree_iter_init(). To confirm the above suspicion, I modify bch_btree_insert_key() to not use PRECEDING_KEY(), and store the preceding key in a local variable of bch_btree_insert_key() like this, @@ -884,27 +894,78 @@ unsigned int bch_btree_insert_key(struct btree_keys *b, struct bkey *k, struct bset *i = bset_tree_last(b)->data; struct bkey *m, *prev = NULL; struct btree_iter iter; + struct bkey preceding_key = ZERO_KEY; + struct bkey *preceding_key_p = NULL; BUG_ON(b->ops->is_extents && !KEY_SIZE(k)); - m = bch_btree_iter_init(b, &iter, b->ops->is_extents - ? PRECEDING_KEY(&START_KEY(k)) - : PRECEDING_KEY(k)); + + if (b->ops->is_extents) { + struct bkey tmp_k; + tmp_k = START_KEY(k); + if (KEY_INODE(&tmp_k) || KEY_OFFSET(&tmp_k)) { + preceding_key = KEY(KEY_INODE(&tmp_k), KEY_OFFSET(&tmp_k), 0); + + if (!preceding_key.low) + preceding_key.high--; + preceding_key.low--; + } + preceding_key_p = &preceding_key; + } else { + if (KEY_INODE(k) || KEY_OFFSET(k)) { + preceding_key = KEY(KEY_INODE(k), KEY_OFFSET(k), 0); + if (!preceding_key.low) + preceding_key.high--; + preceding_key.low--; + } + preceding_key_p = &preceding_key; + } + + m = bch_btree_iter_init(b, &iter, preceding_key_p); Now I can see the preceding key is reported and adjacent keys can be merged. So far my simple fio testing works fine, but I need to go through all locations where KEY() and PROCEDING_KEY() are used and find a proper way to fix. As I guessed this is a bcache bug and triggered by gcc9. I have no clear idea why it works fine before (maybe depends on some compiling option ?), but definitely this code should be fixed. Thanks. -- Coly Li
The above change is only a POC, not a final patch. Now I am thinking of how to fix it in a more elegant way, once done I will post patches to linux-bcache@vger.kernel.org and linux-block@vger.kernel.org. Thanks. Coly Li
This bug is fixed and commit is merged in Linux v5.2 by the following commit, commit 31b90956b124240aa8c63250243ae1a53585c5e2 Author: Coly Li <colyli@suse.de> Date: Mon Jun 10 06:13:34 2019 +0800 bcache: fix stack corruption by PRECEDING_KEY() It seems I am not able to set the bug status to fixed, so I leave message here to indicate this one can be closed as fixed. Thanks. Coly Li
As the reporter I apparently am able to change the status.
Will there be an updated installer image?
I don't know the answer. However, there is a workaround. 1/ Disable caching on your bcache device(s) (cache_mode = none) 2/ Upgrade your system 3/ Update your installation 4/ A soon as your kernel version is at least 5.1.12, re-enable caching (cache_mode = writethrough or writeback) 5/ get rid of any kernel version that can contain the issue (i.e. kernel version <= 5.1.11 compiled with gcc9) Regards,
Not official, but there's a SIG creating updated images: https://jbwillia.wordpress.com/2019/07/03/f30-20190628-updated-isos-released/ Current has 5.1.15-200 kernel - https://dl.fedoraproject.org/pub/alt/live-respins/