Bug 219237
Summary: | [REGRESSION]: cephfs: file corruption when reading content via in-kernel ceph client | ||
---|---|---|---|
Product: | File System | Reporter: | Christian Ebner (c.ebner) |
Component: | Other | Assignee: | fs_other |
Status: | NEW --- | ||
Severity: | high | CC: | c.ebner, dhowells, mail+kernel-bugzilla |
Priority: | P3 | ||
Hardware: | All | ||
OS: | Linux | ||
Kernel Version: | Subsystem: | ||
Regression: | Yes | Bisected commit-id: | 92b6cc5d |
Attachments: |
ftrace with ceph filter
ceph conf netfs tracing dd netfs tracing sha256sum Patch for stable (6.11.1) |
Description
Christian Ebner
2024-09-04 14:47:58 UTC
Is local caching through cachefiles being used? What are the mount parameters? This showed up an error after 37 ops: /root/xfstests-dev/ltp/fsx -d -N 10000 -S 40 -P /tmp /xfstest.test/112.0 and then the system got a NULL-pointer oops in __rmqueue_pcplist when I tried running it again. (In reply to David Howells from comment #1) > Is local caching through cachefiles being used? > > What are the mount parameters? There is no caching layer as described in [0] active. Output of ``` $ cat /proc/fs/fscache/{caches,cookies,requests,stats,volumes} CACHE REF VOLS OBJS ACCES S NAME ======== ===== ===== ===== ===== = =============== COOKIE VOLUME REF ACT ACC S FL DEF ======== ======== === === === = == ================ REQUEST OR REF FL ERR OPS COVERAGE ======== == === == ==== === ========= Netfs : DR=0 RA=140 RF=0 WB=0 WBZ=0 Netfs : BW=0 WT=0 DW=0 WP=0 Netfs : ZR=0 sh=0 sk=0 Netfs : DL=548 ds=548 df=0 di=0 Netfs : RD=0 rs=0 rf=0 Netfs : UL=0 us=0 uf=0 Netfs : WR=0 ws=0 wf=0 Netfs : rr=0 sr=0 wsc=0 -- FS-Cache statistics -- Cookies: n=0 v=0 vcol=0 voom=0 Acquire: n=0 ok=0 oom=0 LRU : n=0 exp=0 rmv=0 drp=0 at=0 Invals : n=0 Updates: n=0 rsz=0 rsn=0 Relinqs: n=0 rtr=0 drop=0 NoSpace: nwr=0 ncr=0 cull=0 IO : rd=0 wr=0 mis=0 VOLUME REF nCOOK ACC FL CACHE KEY ======== ===== ===== === == =============== ================ ``` Also, disabling caching by stetting `client_cache_size` to 0 and `client_oc` to false as found in [1] did not change the corrupted read behavior. The following fstab entry has been used by an user affected by the corruption: ``` 10.10.10.14,10.10.10.13,10.10.10.12:/ /mnt/cephfs/hdd-ceph-fs ceph name={username},secret={secret},fs=hdd-ceph-fs,_netdev 0 0 ``` My local reproducer uses a systemd.mount unit to mount the cephfs, with the ceph.conf as attached: ``` # /run/systemd/system/mnt-pve-cephfs.mount [Unit] Description=/mnt/pve/cephfs DefaultDependencies=no Requires=system.slice Wants=network-online.target Before=umount.target remote-fs.target After=systemd-journald.socket system.slice network.target -.mount remote-fs-pre.target network-online.target Conflicts=umount.target [Mount] Where=/mnt/pve/cephfs What=172.16.0.2,172.16.0.3,172.16.0.4:/ Type=ceph Options=name=admin,secretfile=/etc/pve/priv/ceph/cephfs.secret,conf=/etc/pve/ceph.conf,fs=cephfs ``` [0] https://www.kernel.org/doc/html/latest/filesystems/caching/fscache.html [1] https://docs.ceph.com/en/latest/cephfs/client-config-ref/#client-config-reference Created attachment 306827 [details]
ceph conf
Okay, the oops fsx was showing up was in a different set of patches, so irrelevant to this report. Can you capture some netfs tracing? If you can turn on: for i in read sreq rreq failure write write_iter folio; do echo 1 >/sys/kernel/debug/tracing/events/netfs/netfs_$i/enable done this will show what's going on inside netfslib. Note that if you look at it, a lot of the lines have a request ID and subrequest index in the form of "R=<request_id>[<subreq_index>]". Created attachment 306860 [details]
netfs tracing dd
Trace generated while running:
```
sysctl vm.drop_caches=3
dd if=/mnt/pve/cephfs/proxmox-backup-server_3.2-1.iso of=/tmp/test.out bs=8M count=1
```
with default 8M value for the `rasize`. This leads to the first 4M containing the correct data, while the latter 4M are corrupted by all zeros.
Created attachment 306861 [details]
netfs tracing sha256sum
Trace generated while running:
```
sysctl vm.drop_caches=3
sha256sum /mnt/pve/cephfs/proxmox-backup-server_3.2-1.iso
```
with default 8M value for the `rasize`, leading to the corrupt checksum.
Note: This and above was performed on linux v6.11-rc6. If you require also different output, e.g. with different values for `rasize` please let me know.
Further testing with the reproducer and the current mainline kernel shows that the issue might be fixed. Bisection of the possible fix points to ee4cdf7b ("netfs: Speed up buffered reading"). Could this additional information help to boil down the part that fixes the cephfs issues so that the fix can be backported to current stable? Created attachment 306948 [details]
Patch for stable (6.11.1)
I am happy to report a possible fix regarding this issue:
The attached patch does fix the issue for me on current stable as well as the current Proxmox VE kernel based on Ubuntu Kernel 6.8.12.
Is this patch complete and if so, can this be included to stable?
|