Created attachment 298509 [details] kernel config Reproducer: server: openafs-1.8.7 (gentoo) host_a: openafs 1.8.8 (gentoo) host_b: kafs, kernel version 5.14.0-rc7-00089-g77dd11439b86 (gentoo) 1) host_a cd /afs/mycell/tmp/ git clone git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git cd linux git status "...nothing to commit, working tree clean" 2) host_b, start kafs and check git status modprobe kafs rootcell=mycell:192.168.11.10 mount -t afs none /afs -o dyn cd /afs/mycell/tmp/linux git status "Refresh index: ... nothing to commit, working tree clean" No corruptions yet... 3) host_a cd /afs/mycell/tmp/linux git status "nothing to commit, working tree clean" No corruptions yet... 4) host_b git status ... deleted: virt/kvm/mmu_lock.h deleted: virt/kvm/vfio.c deleted: virt/kvm/vfio.h deleted: virt/lib/Kconfig deleted: virt/lib/Makefile deleted: virt/lib/irqbypass.c Untracked files: (use "git add <file>..." to include in what will be committed) .clang-format .cocciconfig .get_maintainer.ignore .gitattributes .gitignore .mailmap COPYING ... Now git working directory is corrupted 5) host_a cd /afs/mycell/tmp/linux git status ... deleted: virt/kvm/vfio.h deleted: virt/lib/Kconfig deleted: virt/lib/Makefile deleted: virt/lib/irqbypass.c Untracked files: (use "git add <file>..." to include in what will be committed) .clang-format .cocciconfig .get_maintainer. ... If both host_a and host_b are using openafs -> no corruptions
Is this report related to your e-mail to openafs-devel.at.openafs.org? https://lists.openafs.org/pipermail/openafs-devel/2021-August/020727.html
Please raise the debug level of the OpenAFS fileserver to 125 and enable the audit file. Then repeat the failure and provide the FileLog and FileAuditLog output generated during the reproduction.
su, 2021-08-29 kello 13:58 +0000, bugzilla-daemon@bugzilla.kernel.org kirjoitti: > https://bugzilla.kernel.org/show_bug.cgi?id=214217 > > Jeffrey Altman (jaltman@auristor.com) changed: > > What |Removed |Added > ---------------------------------------------------------------------------- > CC| |dhowells@redhat.com, > | |jaltman@auristor.com > > --- Comment #1 from Jeffrey Altman (jaltman@auristor.com) --- > Is this report related to your e-mail to openafs-devel.at.openafs.org? > > https://lists.openafs.org/pipermail/openafs-devel/2021-August/020727.html > Yes it is -Markus
Created attachment 298511 [details] auditlog su, 2021-08-29 kello 14:18 +0000, bugzilla-daemon@bugzilla.kernel.org kirjoitti: > https://bugzilla.kernel.org/show_bug.cgi?id=214217 > > --- Comment #2 from Jeffrey Altman (jaltman@auristor.com) --- > Please raise the debug level of the OpenAFS fileserver to 125 and enable the > audit file. Then repeat the failure and provide the FileLog and > FileAuditLog > output generated during the reproduction. > This time git repository contains only one file host: z600 is openafs host: t470 is kafs masu@z600 foo % git clone /tmp/test Cloning into 'test'... done. masu@z600 foo % cd test masu@z600 test % git s On branch master Your branch is up to date with 'origin/master'. nothing to commit, working tree clean masu@z600 test % git s On branch master Your branch is up to date with 'origin/master'. nothing to commit, working tree clean masu@z600 test % git s On branch master Your branch is up to date with 'origin/master'. Changes to be committed: (use "git restore --staged <file>..." to unstage) deleted: foo.txt Untracked files: (use "git add <file>..." to include in what will be committed) foo.txt masu@t470 tmp % cd /afs/station.com/tmp/foo/test masu@t470 test % git s On branch master Your branch is up to date with 'origin/master'. nothing to commit, working tree clean masu@t470 test % git s On branch master Your branch is up to date with 'origin/master'. Changes to be committed: (use "git restore --staged <file>..." to unstage) deleted: foo.txt Untracked files: (use "git add <file>..." to include in what will be committed) foo.txt masu@t470 test %
Created attachment 298513 [details] FileLog
In the openafs-devel the CallPreamble failure log message indicates that the fileserver is failing an RPC issued by the client not because there was anything wrong with the issued RPC but because the fileserver is incapable of processing multiple RPCs issued in parallel by a client before the fileserver has established the identity of the client. Each of these failures results in a VBUSY error which indicates that the RPC was not processed and should be retried by the client. (OpenAFS will retry after a 30 second sleep. OpenAFS 1.8 is more likely to trigger this behavior than OpenAFS 1.6 and earlier.) Actions of the client should not be able to corrupt a volume (an abstract fileserver private data structure). The OpenAFS fileserver triggers a salvage operation under a number of circumstances that are unrelated to volume corruption. When a volume is salvaged more than SALVAGE_COUNT_MAX (16) times since the fileserver start (or the most recent administrator initiated salvage request) the volume will be taken offline until the administrator initiated a manual salvage request or the fileserver is restarted. RPCs that are issued which trigger a salvage request will fail with a VSALVAGING (113) error if the SALVAGE_COUNT_MAX limit has not been reached or VSALVAGE (101) if SALVAGE_COUNT_MAX has been exceeded. If RPCs fail with VSALVAGING or VSALVAGE, the client knows the RPC was processed and failed unexpectedly. The client doesn't know if the server state is altered or unaltered. All of the locally cached data and metadata must be discarded. This can result in data corruption from the perspective of end user applications. A quick attempt to reproduce your observed "git status" behavior with one of my repositories failed to reproduce. Sometimes with git, the repository matters when attempting to reproduce filesystem behavior. Is the repository you are seeing the failures with a public repository that we can clone locally?
"Comment 6" was written before the arrival of "Comment 4" and "Comment 5". The new FileLog does not include any CallPreamble failures or Salvage requests. The kafs problem is therefore unrelated to the limitations described in "Comment 6".
su, 2021-08-29 kello 15:33 +0000, bugzilla-daemon@bugzilla.kernel.org kirjoitti: > https://bugzilla.kernel.org/show_bug.cgi?id=214217 > > --- Comment #7 from Jeffrey Altman (jaltman@auristor.com) --- > "Comment 6" was written before the arrival of "Comment 4" and "Comment 5". > > The new FileLog does not include any CallPreamble failures or Salvage > requests. > The kafs problem is therefore unrelated to the limitations described in > "Comment 6". > This was the case were git status report that file has deleted. Volume didn't go offline Next I will try to reproduce the case were kafs causes volume to go offline. Last time it was: 1) host_a (openafs client) cd /afs/some/path git clone git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git 2) host_b (kafs) cd /afs/some/path/linux git status (and after this command I have to salvage)
su, 2021-08-29 kello 19:00 +0300, markus.suvanto@gmail.com kirjoitti: > su, 2021-08-29 kello 15:33 +0000, bugzilla-daemon@bugzilla.kernel.org > kirjoitti: > > https://bugzilla.kernel.org/show_bug.cgi?id=214217 > > > > --- Comment #7 from Jeffrey Altman (jaltman@auristor.com) --- > > "Comment 6" was written before the arrival of "Comment 4" and "Comment 5". > > > > The new FileLog does not include any CallPreamble failures or Salvage > requests. > > The kafs problem is therefore unrelated to the limitations described in > > "Comment 6". > > > > This was the case were > git status > report that file has deleted. Volume didn't go offline > > Next I will try to reproduce the case were kafs causes > volume to go offline. > > Last time it was: > > 1) host_a (openafs client) > cd /afs/some/path > git clone git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git > > 2) host_b (kafs) > cd /afs/some/path/linux > git status (and after this command I have to salvage) > > This is the case where files are still OK, but server has to salvage volume after kafs client issues the command "git status" FileLog and auditlog: https://drive.google.com/drive/folders/1-8UYUU5cFqW_BCVs6hLUf6ms1oDVRAyp?usp=sharing masu@z600 foo % git clone git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git Cloning into 'linux'... remote: Enumerating objects: 15, done. remote: Counting objects: 100% (15/15), done. remote: Compressing objects: 100% (12/12), done. remote: Total 8254650 (delta 8), reused 3 (delta 3), pack-reused 8254635 Receiving objects: 100% (8254650/8254650), 2.23 GiB | 2.00 MiB/s, done. Resolving deltas: 100% (6750253/6750253), done. Updating files: 100% (72866/72866), done. masu@z600 foo % cd linux masu@z600 linux % git status On branch master Your branch is up to date with 'origin/master'. nothing to commit, working tree clean masu@t470 ~ % cd /afs/station.com/tmp/foo/linux masu@t470 linux % git status error: error reading from .git/objects/pack/pack-5e813c51d12b6847bbc0fcd97c2bca66da50079c.pack: Vastapään I/O-virhe error: error reading from .git/objects/pack/pack-5e813c51d12b6847bbc0fcd97c2bca66da50079c.pack: Vastapään I/O-virhe fatal: failed to read object 7564ae1682bae84b10e025026dcd080e34dc98ce: Vastapään I/O-virhe Now volume is offline and salvaged (automaticly at this point I think...) Files are OK masu@z600 linux % git status On branch master Your branch is up to date with 'origin/master'. nothing to commit, working tree clean
In the case where the volume goes offline the lifecycle of problematic file is as follows: Sun Aug 29 19:02:19 2021 [21] EVENT AFS_SRX_CrFile CODE 0 NAME masu HOST 192.168.11.93 ID 1001 FID 2303380852:59521:3263108 STR tmp_pack_82kHnk FID 2303380852:491776:3263114 Sun Aug 29 19:24:27 2021 [20] EVENT AFS_SRX_StData CODE 0 NAME masu HOST 192.168.11.93 ID 1001 FID 2303380852:491776:3263114 Sun Aug 29 19:24:28 2021 [20] EVENT AFS_SRX_StData CODE 0 NAME masu HOST 192.168.11.93 ID 1001 FID 2303380852:491776:3263114 Sun Aug 29 19:24:29 2021 [20] EVENT AFS_SRX_StData CODE 0 NAME masu HOST 192.168.11.93 ID 1001 FID 2303380852:491776:3263114 Sun Aug 29 19:24:30 2021 [20] EVENT AFS_SRX_StData CODE 0 NAME masu HOST 192.168.11.93 ID 1001 FID 2303380852:491776:3263114 Sun Aug 29 19:24:32 2021 [20] EVENT AFS_SRX_StData CODE 0 NAME masu HOST 192.168.11.93 ID 1001 FID 2303380852:491776:3263114 Sun Aug 29 19:24:33 2021 [20] EVENT AFS_SRX_StData CODE 0 NAME masu HOST 192.168.11.93 ID 1001 FID 2303380852:491776:3263114 Sun Aug 29 19:24:37 2021 [20] EVENT AFS_SRX_StData CODE 0 NAME masu HOST 192.168.11.93 ID 1001 FID 2303380852:491776:3263114 Sun Aug 29 19:24:38 2021 [20] EVENT AFS_SRX_StData CODE 0 NAME masu HOST 192.168.11.93 ID 1001 FID 2303380852:491776:3263114 Sun Aug 29 19:24:39 2021 [20] EVENT AFS_SRX_StData CODE 0 NAME masu HOST 192.168.11.93 ID 1001 FID 2303380852:491776:3263114 Sun Aug 29 19:24:41 2021 [20] EVENT AFS_SRX_StData CODE 0 NAME masu HOST 192.168.11.93 ID 1001 FID 2303380852:491776:3263114 Sun Aug 29 19:24:42 2021 [20] EVENT AFS_SRX_StData CODE 0 NAME masu HOST 192.168.11.93 ID 1001 FID 2303380852:491776:3263114 Sun Aug 29 19:24:43 2021 [20] EVENT AFS_SRX_StData CODE 0 NAME masu HOST 192.168.11.93 ID 1001 FID 2303380852:491776:3263114 Sun Aug 29 19:24:44 2021 [20] EVENT AFS_SRX_StData CODE 0 NAME masu HOST 192.168.11.93 ID 1001 FID 2303380852:491776:3263114 Sun Aug 29 19:24:45 2021 [20] EVENT AFS_SRX_StData CODE 0 NAME masu HOST 192.168.11.93 ID 1001 FID 2303380852:491776:3263114 Sun Aug 29 19:24:47 2021 [20] EVENT AFS_SRX_StData CODE 0 NAME masu HOST 192.168.11.93 ID 1001 FID 2303380852:491776:3263114 Sun Aug 29 19:24:50 2021 [20] EVENT AFS_SRX_StData CODE 0 NAME masu HOST 192.168.11.93 ID 1001 FID 2303380852:491776:3263114 Sun Aug 29 19:24:52 2021 [20] EVENT AFS_SRX_StData CODE 0 NAME masu HOST 192.168.11.93 ID 1001 FID 2303380852:491776:3263114 Sun Aug 29 19:24:54 2021 [20] EVENT AFS_SRX_StData CODE 0 NAME masu HOST 192.168.11.93 ID 1001 FID 2303380852:491776:3263114 Sun Aug 29 19:24:55 2021 [20] EVENT AFS_SRX_StData CODE 0 NAME masu HOST 192.168.11.93 ID 1001 FID 2303380852:491776:3263114 Sun Aug 29 19:24:56 2021 [20] EVENT AFS_SRX_StData CODE 0 NAME masu HOST 192.168.11.93 ID 1001 FID 2303380852:491776:3263114 Sun Aug 29 19:24:57 2021 [20] EVENT AFS_SRX_StData CODE 0 NAME masu HOST 192.168.11.93 ID 1001 FID 2303380852:491776:3263114 Sun Aug 29 19:24:58 2021 [20] EVENT AFS_SRX_StData CODE 0 NAME masu HOST 192.168.11.93 ID 1001 FID 2303380852:491776:3263114 Sun Aug 29 19:24:59 2021 [20] EVENT AFS_SRX_StData CODE 0 NAME masu HOST 192.168.11.93 ID 1001 FID 2303380852:491776:3263114 Sun Aug 29 19:25:00 2021 [20] EVENT AFS_SRX_StData CODE 0 NAME masu HOST 192.168.11.93 ID 1001 FID 2303380852:491776:3263114 Sun Aug 29 19:25:02 2021 [20] EVENT AFS_SRX_StData CODE 0 NAME masu HOST 192.168.11.93 ID 1001 FID 2303380852:491776:3263114 Sun Aug 29 19:25:03 2021 [20] EVENT AFS_SRX_StData CODE 0 NAME masu HOST 192.168.11.93 ID 1001 FID 2303380852:491776:3263114 Sun Aug 29 19:25:07 2021 [20] EVENT AFS_SRX_StData CODE 0 NAME masu HOST 192.168.11.93 ID 1001 FID 2303380852:491776:3263114 Sun Aug 29 19:25:08 2021 [20] EVENT AFS_SRX_StData CODE 0 NAME masu HOST 192.168.11.93 ID 1001 FID 2303380852:491776:3263114 Sun Aug 29 19:25:09 2021 [20] EVENT AFS_SRX_StData CODE 0 NAME masu HOST 192.168.11.93 ID 1001 FID 2303380852:491776:3263114 Sun Aug 29 19:25:10 2021 [20] EVENT AFS_SRX_StData CODE 0 NAME masu HOST 192.168.11.93 ID 1001 FID 2303380852:491776:3263114 Sun Aug 29 19:25:14 2021 [20] EVENT AFS_SRX_StData CODE 0 NAME masu HOST 192.168.11.93 ID 1001 FID 2303380852:491776:3263114 Sun Aug 29 19:25:15 2021 [20] EVENT AFS_SRX_StData CODE 0 NAME masu HOST 192.168.11.93 ID 1001 FID 2303380852:491776:3263114 Sun Aug 29 19:25:16 2021 [20] EVENT AFS_SRX_StData CODE 0 NAME masu HOST 192.168.11.93 ID 1001 FID 2303380852:491776:3263114 Sun Aug 29 19:25:17 2021 [20] EVENT AFS_SRX_StData CODE 0 NAME masu HOST 192.168.11.93 ID 1001 FID 2303380852:491776:3263114 Sun Aug 29 19:25:18 2021 [20] EVENT AFS_SRX_StData CODE 0 NAME masu HOST 192.168.11.93 ID 1001 FID 2303380852:491776:3263114 Sun Aug 29 19:25:19 2021 [20] EVENT AFS_SRX_StData CODE 0 NAME masu HOST 192.168.11.93 ID 1001 FID 2303380852:491776:3263114 Sun Aug 29 19:25:19 2021 [20] EVENT AFS_SRX_Link CODE 0 NAME masu HOST 192.168.11.93 ID 1001 FID 2303380852:59521:3263108 STR pack-5e813c51d12b6847bbc0fcd97c2bca66da50079c.pack FID 2303380852:491776:3263114 Sun Aug 29 19:25:19 2021 [20] EVENT AFS_SRX_FchStat CODE 0 NAME masu HOST 192.168.11.93 ID 1001 FID 2303380852:491776:3263114 Sun Aug 29 19:25:20 2021 [20] EVENT AFS_SRX_BIFchSt CODE 0 NAME masu HOST 192.168.11.93 ID 1001 FIDS 2 FID 2303380852:491776:3263114 FID 2303380852:491778:3263115 Sun Aug 29 19:31:39 2021 [10] EVENT AFS_SRX_BIFchSt CODE 0 NAME masu HOST 192.168.11.201 ID 1001 FIDS 3 FID 2303380852:59521:3263108 FID 2303380852:491776:3263114 FID 2303380852:491778:3263115 Sun Aug 29 19:31:39 2021 [10] EVENT AFS_SRX_FchData CODE 0 NAME masu HOST 192.168.11.201 ID 1001 FID 2303380852:491776:3263114 Sun Aug 29 19:31:39 2021 [10] EVENT AFS_SRX_FchData CODE 0 NAME masu HOST 192.168.11.201 ID 1001 FID 2303380852:491776:3263114 Sun Aug 29 19:31:40 2021 [10] EVENT AFS_SRX_FchData CODE 49733380 NAME masu HOST 192.168.11.201 ID 1001 FID 2303380852:491776:3263114 Sun Aug 29 19:31:40 2021 [10] EVENT AFS_SRX_FchData CODE 106 NAME masu HOST 192.168.11.201 ID 1001 FID 2303380852:491776:3263114 Sun Aug 29 19:31:40 2021 [10] EVENT AFS_SRX_FchData CODE 106 NAME masu HOST 192.168.11.201 ID 1001 FID 2303380852:491776:3263114 Sun Aug 29 19:31:40 2021 [10] EVENT AFS_SRX_FchData CODE 106 NAME masu HOST 192.168.11.201 ID 1001 FID 2303380852:491776:3263114 Sun Aug 29 19:31:40 2021 [10] EVENT AFS_SRX_FchData CODE 106 NAME masu HOST 192.168.11.201 ID 1001 FID 2303380852:491776:3263114 Sun Aug 29 19:31:40 2021 [10] EVENT AFS_SRX_FchData CODE 106 NAME masu HOST 192.168.11.201 ID 1001 FID 2303380852:491776:3263114 Sun Aug 29 19:31:40 2021 [10] EVENT AFS_SRX_FchData CODE 106 NAME masu HOST 192.168.11.201 ID 1001 FID 2303380852:491776:3263114 Sun Aug 29 19:31:40 2021 [10] EVENT AFS_SRX_FchData CODE 106 NAME masu HOST 192.168.11.201 ID 1001 FID 2303380852:491776:3263114 Sun Aug 29 19:31:40 2021 [10] EVENT AFS_SRX_FchData CODE 106 NAME masu HOST 192.168.11.201 ID 1001 FID 2303380852:491776:3263114 Sun Aug 29 19:31:40 2021 [10] EVENT AFS_SRX_FchData CODE 106 NAME masu HOST 192.168.11.201 ID 1001 FID 2303380852:491776:3263114 Sun Aug 29 19:31:40 2021 [10] EVENT AFS_SRX_FchData CODE 106 NAME masu HOST 192.168.11.201 ID 1001 FID 2303380852:491776:3263114 Sun Aug 29 19:31:40 2021 [10] EVENT AFS_SRX_FchData CODE 106 NAME masu HOST 192.168.11.201 ID 1001 FID 2303380852:491776:3263114 Sun Aug 29 19:31:43 2021 [10] EVENT AFS_SRX_FchData CODE 106 NAME masu HOST 192.168.11.201 ID 1001 FID 2303380852:491776:3263114 Sun Aug 29 19:31:46 2021 [10] EVENT AFS_SRX_FchData CODE 106 NAME masu HOST 192.168.11.201 ID 1001 FID 2303380852:491776:3263114 Sun Aug 29 19:31:49 2021 [10] EVENT AFS_SRX_FchData CODE 106 NAME masu HOST 192.168.11.201 ID 1001 FID 2303380852:491776:3263114 Sun Aug 29 19:31:52 2021 [10] EVENT AFS_SRX_FchData CODE 106 NAME masu HOST 192.168.11.201 ID 1001 FID 2303380852:491776:3263114 Sun Aug 29 19:31:55 2021 [10] EVENT AFS_SRX_FchData CODE 106 NAME masu HOST 192.168.11.201 ID 1001 FID 2303380852:491776:3263114 Sun Aug 29 19:31:58 2021 [10] EVENT AFS_SRX_FchData CODE 106 NAME masu HOST 192.168.11.201 ID 1001 FID 2303380852:491776:3263114 Sun Aug 29 19:32:01 2021 [10] EVENT AFS_SRX_FchData CODE 106 NAME masu HOST 192.168.11.201 ID 1001 FID 2303380852:491776:3263114 Sun Aug 29 19:32:04 2021 [10] EVENT AFS_SRX_FchData CODE 49733380 NAME masu HOST 192.168.11.201 ID 1001 FID 2303380852:491776:3263114 Sun Aug 29 19:32:13 2021 [10] EVENT AFS_SRX_FchStat CODE 0 NAME masu HOST 192.168.11.201 ID 1001 FID 2303380852:491776:3263114 Sun Aug 29 19:32:13 2021 [10] EVENT AFS_SRX_FchData CODE 49733380 NAME masu HOST 192.168.11.201 ID 1001 FID 2303380852:491776:3263114 The error code 49733380 is UAEIO and 106 is VOFFLINE. The RPC that results in the salvage request was issued at 10:31:39 and is a FetchData. In other words, a read-only operation. The FileLog events for the request are: 6.3263114 Sun Aug 29 19:31:39 2021 SRXAFS_FetchData, Fid = 2303380852.491776.3263114, Host 192.168.11.201:7001, Id 1001 Sun Aug 29 19:31:39 2021 CheckRights: len=0, for host=192.168.11.201:7001 Sun Aug 29 19:31:39 2021 FetchData_RXStyle: Pos 18446744071815340032, Len 3154 Sun Aug 29 19:31:39 2021 FetchData_RXStyle: file size 2400758866 Sun Aug 29 19:31:39 2021 Volume 2303380852 now offline, must be salvaged. Sun Aug 29 19:31:39 2021 VOffline: Volume 2303380852 (tmp) is now offline Sun Aug 29 19:31:39 2021 Scheduling salvage for volume 2303380852 on part /vicepc over SALVSYNC Sun Aug 29 19:31:40 2021 SRXAFS_FetchData returns 5 The requested "Pos" is LLONG_MAX or 0x7fffffffff. The OpenAFS fileserver should return zero bytes for an attempt to read 3154 bytes beyond LLONG_MAX. It should not be taking the volume offline for a salvage. The request from kafs is unusual but the volume going offline is a bug in the OpenAFS fileserver.
(In reply to Markus Suvanto from comment #4) > masu@t470 test % git s > On branch master > Your branch is up to date with 'origin/master'. > > Changes to be committed: > (use "git restore --staged <file>..." to unstage) > deleted: foo.txt > > Untracked files: > (use "git add <file>..." to include in what will be committed) > foo.txt I think this is probably a separate bug. This is a problem with silly-rename in kafs and can be reduced to: machine 1: touch a machine 2: touch b; mv -f b a machine 1: stat a where machine 1 is kafs. The "stat a" fails with ENOENT.
Fixed in v5.15-rc3
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=d9fb678414c048e185eaddadd18d75f5e8832ff3