Bug 214217 - kafs corrupts data
Summary: kafs corrupts data
Status: RESOLVED CODE_FIX
Alias: None
Product: File System
Classification: Unclassified
Component: Other (show other bugs)
Hardware: All Linux
: P1 normal
Assignee: fs_other
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2021-08-29 11:20 UTC by Markus Suvanto
Modified: 2022-05-20 20:33 UTC (History)
5 users (show)

See Also:
Kernel Version: 5.14.0-rc7-00089-g77dd11439b86
Subsystem:
Regression: No
Bisected commit-id:


Attachments
kernel config (218.23 KB, text/plain)
2021-08-29 11:20 UTC, Markus Suvanto
Details
auditlog (48.00 KB, text/plain)
2021-08-29 15:22 UTC, Markus Suvanto
Details
FileLog (151.33 KB, text/plain)
2021-08-29 15:22 UTC, Markus Suvanto
Details

Description Markus Suvanto 2021-08-29 11:20:00 UTC
Created attachment 298509 [details]
kernel config

Reproducer:

server: openafs-1.8.7 (gentoo)
host_a: openafs 1.8.8 (gentoo)
host_b: kafs, kernel version 5.14.0-rc7-00089-g77dd11439b86 (gentoo)

1) host_a
cd /afs/mycell/tmp/
git clone git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
cd linux 
git status

"...nothing to commit, working tree clean"

2) host_b, start kafs and check git status

modprobe kafs rootcell=mycell:192.168.11.10
mount -t afs none /afs -o dyn

cd /afs/mycell/tmp/linux
git status

"Refresh index: ... nothing to commit, working tree clean"
No corruptions yet...

3) host_a
cd /afs/mycell/tmp/linux
git status
"nothing to commit, working tree clean"
No corruptions yet... 

4) host_b
git status

        ...
	deleted:    virt/kvm/mmu_lock.h
	deleted:    virt/kvm/vfio.c
	deleted:    virt/kvm/vfio.h
	deleted:    virt/lib/Kconfig
	deleted:    virt/lib/Makefile
	deleted:    virt/lib/irqbypass.c

Untracked files:
  (use "git add <file>..." to include in what will be committed)
	.clang-format
	.cocciconfig
	.get_maintainer.ignore
	.gitattributes
	.gitignore
	.mailmap
	COPYING
        ...

Now git working directory is corrupted

5) host_a
cd /afs/mycell/tmp/linux
git status

        ...
	deleted:    virt/kvm/vfio.h
	deleted:    virt/lib/Kconfig
	deleted:    virt/lib/Makefile
	deleted:    virt/lib/irqbypass.c

Untracked files:
  (use "git add <file>..." to include in what will be committed)
	.clang-format
	.cocciconfig
	.get_maintainer.
        ...


If both host_a and host_b are using openafs -> no corruptions
Comment 1 Jeffrey Altman 2021-08-29 13:58:06 UTC
Is this report related to your e-mail to openafs-devel.at.openafs.org?

https://lists.openafs.org/pipermail/openafs-devel/2021-August/020727.html
Comment 2 Jeffrey Altman 2021-08-29 14:18:18 UTC
Please raise the debug level of the OpenAFS fileserver to 125 and enable the audit  file.   Then repeat the failure and provide the FileLog and FileAuditLog output generated during the reproduction.
Comment 3 Markus Suvanto 2021-08-29 14:33:53 UTC
su, 2021-08-29 kello 13:58 +0000, bugzilla-daemon@bugzilla.kernel.org kirjoitti:
> https://bugzilla.kernel.org/show_bug.cgi?id=214217
> 
> Jeffrey Altman (jaltman@auristor.com) changed:
> 
>            What    |Removed                     |Added
> ----------------------------------------------------------------------------
>                  CC|                            |dhowells@redhat.com,
>                    |                            |jaltman@auristor.com
> 
> --- Comment #1 from Jeffrey Altman (jaltman@auristor.com) ---
> Is this report related to your e-mail to openafs-devel.at.openafs.org?
> 
> https://lists.openafs.org/pipermail/openafs-devel/2021-August/020727.html
> 
Yes it is

-Markus
Comment 4 Markus Suvanto 2021-08-29 15:22:47 UTC
Created attachment 298511 [details]
auditlog

su, 2021-08-29 kello 14:18 +0000, bugzilla-daemon@bugzilla.kernel.org kirjoitti:
> https://bugzilla.kernel.org/show_bug.cgi?id=214217
> 
> --- Comment #2 from Jeffrey Altman (jaltman@auristor.com) ---
> Please raise the debug level of the OpenAFS fileserver to 125 and enable the
> audit  file.   Then repeat the failure and provide the FileLog and
> FileAuditLog
> output generated during the reproduction.
> 

This time git repository contains only one file

host: z600 is openafs
host: t470 is kafs

masu@z600 foo % git clone /tmp/test 
Cloning into 'test'...
done.
masu@z600 foo % cd test 
masu@z600 test % git s
On branch master
Your branch is up to date with 'origin/master'.

nothing to commit, working tree clean
masu@z600 test % git s
On branch master
Your branch is up to date with 'origin/master'.

nothing to commit, working tree clean
masu@z600 test % git s
On branch master
Your branch is up to date with 'origin/master'.

Changes to be committed:
  (use "git restore --staged <file>..." to unstage)
	deleted:    foo.txt

Untracked files:
  (use "git add <file>..." to include in what will be committed)
	foo.txt




masu@t470 tmp % cd /afs/station.com/tmp/foo/test 
masu@t470 test % git s
On branch master
Your branch is up to date with 'origin/master'.

nothing to commit, working tree clean
masu@t470 test % git s
On branch master
Your branch is up to date with 'origin/master'.

Changes to be committed:
  (use "git restore --staged <file>..." to unstage)
	deleted:    foo.txt

Untracked files:
  (use "git add <file>..." to include in what will be committed)
	foo.txt

masu@t470 test %
Comment 5 Markus Suvanto 2021-08-29 15:22:47 UTC
Created attachment 298513 [details]
FileLog
Comment 6 Jeffrey Altman 2021-08-29 15:27:34 UTC
In the openafs-devel the CallPreamble failure log message indicates that the fileserver is failing an RPC issued by the client not because there was anything wrong with the issued RPC but because the fileserver is incapable of processing multiple RPCs issued in parallel by a client before the fileserver has established the identity of the client.  Each of these failures results in a VBUSY error which indicates that the RPC was not processed and should be retried by the client.  (OpenAFS will retry after a 30 second sleep.  OpenAFS 1.8 is more likely to trigger this behavior than OpenAFS 1.6 and earlier.)

Actions of the client should not be able to corrupt a volume (an abstract fileserver private data structure).  The OpenAFS fileserver triggers a salvage operation under a number of circumstances that are unrelated to volume corruption.  When a volume is salvaged more than SALVAGE_COUNT_MAX (16) times since the fileserver start (or the most recent administrator initiated salvage request) the volume will be taken offline until the administrator initiated a manual salvage request or the fileserver is restarted.   RPCs that are issued which trigger a salvage request will fail with a VSALVAGING (113) error if the SALVAGE_COUNT_MAX limit has not been reached or VSALVAGE (101) if SALVAGE_COUNT_MAX has been exceeded.

If RPCs fail with VSALVAGING or VSALVAGE, the client knows the RPC was processed and failed unexpectedly.  The client doesn't know if the server state is altered or unaltered.  All of the locally cached data and metadata must be discarded.  This can result in data corruption from the perspective of end user applications.  

A quick attempt to reproduce your observed "git status" behavior with one of my repositories failed to reproduce.   Sometimes with git, the repository matters when attempting to reproduce filesystem behavior.   Is the repository you are seeing the failures with a public repository that we can clone locally?
Comment 7 Jeffrey Altman 2021-08-29 15:33:50 UTC
"Comment 6" was written before the arrival of "Comment 4" and "Comment 5". 

The new FileLog does not include any CallPreamble failures or Salvage requests.  The kafs problem is therefore unrelated to the limitations described in "Comment 6".
Comment 8 Markus Suvanto 2021-08-29 16:00:28 UTC
su, 2021-08-29 kello 15:33 +0000, bugzilla-daemon@bugzilla.kernel.org kirjoitti:
> https://bugzilla.kernel.org/show_bug.cgi?id=214217
> 
> --- Comment #7 from Jeffrey Altman (jaltman@auristor.com) ---
> "Comment 6" was written before the arrival of "Comment 4" and "Comment 5". 
> 
> The new FileLog does not include any CallPreamble failures or Salvage
> requests.
>  The kafs problem is therefore unrelated to the limitations described in
> "Comment 6".
> 

This was the case were
git status
report that file has deleted. Volume didn't go offline

Next I will try to reproduce the case were kafs causes
volume to go offline.

Last time it was:

1) host_a (openafs client)
cd /afs/some/path
git clone git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git

2) host_b (kafs)
cd /afs/some/path/linux
git status  (and after this command I have to salvage)
Comment 9 Markus Suvanto 2021-08-29 17:01:38 UTC
su, 2021-08-29 kello 19:00 +0300, markus.suvanto@gmail.com kirjoitti:
> su, 2021-08-29 kello 15:33 +0000, bugzilla-daemon@bugzilla.kernel.org
> kirjoitti:
> > https://bugzilla.kernel.org/show_bug.cgi?id=214217
> > 
> > --- Comment #7 from Jeffrey Altman (jaltman@auristor.com) ---
> > "Comment 6" was written before the arrival of "Comment 4" and "Comment 5". 
> > 
> > The new FileLog does not include any CallPreamble failures or Salvage
> requests.
> >  The kafs problem is therefore unrelated to the limitations described in
> > "Comment 6".
> > 
> 
> This was the case were
> git status
> report that file has deleted. Volume didn't go offline
> 
> Next I will try to reproduce the case were kafs causes
> volume to go offline.
> 
> Last time it was:
> 
> 1) host_a (openafs client)
> cd /afs/some/path
> git clone git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
> 
> 2) host_b (kafs)
> cd /afs/some/path/linux
> git status  (and after this command I have to salvage)
> 
> 

This is the case where files are still OK, but server has to salvage volume after
kafs client issues the command "git status"

FileLog and auditlog:
https://drive.google.com/drive/folders/1-8UYUU5cFqW_BCVs6hLUf6ms1oDVRAyp?usp=sharing


masu@z600 foo % git clone git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git

Cloning into 'linux'...
remote: Enumerating objects: 15, done.
remote: Counting objects: 100% (15/15), done.
remote: Compressing objects: 100% (12/12), done.
remote: Total 8254650 (delta 8), reused 3 (delta 3), pack-reused 8254635
Receiving objects: 100% (8254650/8254650), 2.23 GiB | 2.00 MiB/s, done.
Resolving deltas: 100% (6750253/6750253), done.
Updating files: 100% (72866/72866), done.

masu@z600 foo % cd linux 
masu@z600 linux % git status
On branch master
Your branch is up to date with 'origin/master'.

nothing to commit, working tree clean


masu@t470 ~ % cd /afs/station.com/tmp/foo/linux
masu@t470 linux % git status
error: error reading from .git/objects/pack/pack-5e813c51d12b6847bbc0fcd97c2bca66da50079c.pack: Vastapään I/O-virhe
error: error reading from .git/objects/pack/pack-5e813c51d12b6847bbc0fcd97c2bca66da50079c.pack: Vastapään I/O-virhe
fatal: failed to read object 7564ae1682bae84b10e025026dcd080e34dc98ce: Vastapään I/O-virhe

Now volume is offline and salvaged (automaticly at this point I think...)
Files are OK

masu@z600 linux % git status
On branch master
Your branch is up to date with 'origin/master'.

nothing to commit, working tree clean
Comment 10 Jeffrey Altman 2021-08-29 19:10:14 UTC
In the case where the volume goes offline the lifecycle of problematic file is as follows:

Sun Aug 29 19:02:19 2021 [21] EVENT AFS_SRX_CrFile CODE 0 NAME masu HOST 192.168.11.93 ID 1001 FID 2303380852:59521:3263108 STR tmp_pack_82kHnk FID 2303380852:491776:3263114
Sun Aug 29 19:24:27 2021 [20] EVENT AFS_SRX_StData CODE 0 NAME masu HOST 192.168.11.93 ID 1001 FID 2303380852:491776:3263114
Sun Aug 29 19:24:28 2021 [20] EVENT AFS_SRX_StData CODE 0 NAME masu HOST 192.168.11.93 ID 1001 FID 2303380852:491776:3263114
Sun Aug 29 19:24:29 2021 [20] EVENT AFS_SRX_StData CODE 0 NAME masu HOST 192.168.11.93 ID 1001 FID 2303380852:491776:3263114
Sun Aug 29 19:24:30 2021 [20] EVENT AFS_SRX_StData CODE 0 NAME masu HOST 192.168.11.93 ID 1001 FID 2303380852:491776:3263114
Sun Aug 29 19:24:32 2021 [20] EVENT AFS_SRX_StData CODE 0 NAME masu HOST 192.168.11.93 ID 1001 FID 2303380852:491776:3263114
Sun Aug 29 19:24:33 2021 [20] EVENT AFS_SRX_StData CODE 0 NAME masu HOST 192.168.11.93 ID 1001 FID 2303380852:491776:3263114
Sun Aug 29 19:24:37 2021 [20] EVENT AFS_SRX_StData CODE 0 NAME masu HOST 192.168.11.93 ID 1001 FID 2303380852:491776:3263114
Sun Aug 29 19:24:38 2021 [20] EVENT AFS_SRX_StData CODE 0 NAME masu HOST 192.168.11.93 ID 1001 FID 2303380852:491776:3263114
Sun Aug 29 19:24:39 2021 [20] EVENT AFS_SRX_StData CODE 0 NAME masu HOST 192.168.11.93 ID 1001 FID 2303380852:491776:3263114
Sun Aug 29 19:24:41 2021 [20] EVENT AFS_SRX_StData CODE 0 NAME masu HOST 192.168.11.93 ID 1001 FID 2303380852:491776:3263114
Sun Aug 29 19:24:42 2021 [20] EVENT AFS_SRX_StData CODE 0 NAME masu HOST 192.168.11.93 ID 1001 FID 2303380852:491776:3263114
Sun Aug 29 19:24:43 2021 [20] EVENT AFS_SRX_StData CODE 0 NAME masu HOST 192.168.11.93 ID 1001 FID 2303380852:491776:3263114
Sun Aug 29 19:24:44 2021 [20] EVENT AFS_SRX_StData CODE 0 NAME masu HOST 192.168.11.93 ID 1001 FID 2303380852:491776:3263114
Sun Aug 29 19:24:45 2021 [20] EVENT AFS_SRX_StData CODE 0 NAME masu HOST 192.168.11.93 ID 1001 FID 2303380852:491776:3263114
Sun Aug 29 19:24:47 2021 [20] EVENT AFS_SRX_StData CODE 0 NAME masu HOST 192.168.11.93 ID 1001 FID 2303380852:491776:3263114
Sun Aug 29 19:24:50 2021 [20] EVENT AFS_SRX_StData CODE 0 NAME masu HOST 192.168.11.93 ID 1001 FID 2303380852:491776:3263114
Sun Aug 29 19:24:52 2021 [20] EVENT AFS_SRX_StData CODE 0 NAME masu HOST 192.168.11.93 ID 1001 FID 2303380852:491776:3263114
Sun Aug 29 19:24:54 2021 [20] EVENT AFS_SRX_StData CODE 0 NAME masu HOST 192.168.11.93 ID 1001 FID 2303380852:491776:3263114
Sun Aug 29 19:24:55 2021 [20] EVENT AFS_SRX_StData CODE 0 NAME masu HOST 192.168.11.93 ID 1001 FID 2303380852:491776:3263114
Sun Aug 29 19:24:56 2021 [20] EVENT AFS_SRX_StData CODE 0 NAME masu HOST 192.168.11.93 ID 1001 FID 2303380852:491776:3263114
Sun Aug 29 19:24:57 2021 [20] EVENT AFS_SRX_StData CODE 0 NAME masu HOST 192.168.11.93 ID 1001 FID 2303380852:491776:3263114
Sun Aug 29 19:24:58 2021 [20] EVENT AFS_SRX_StData CODE 0 NAME masu HOST 192.168.11.93 ID 1001 FID 2303380852:491776:3263114
Sun Aug 29 19:24:59 2021 [20] EVENT AFS_SRX_StData CODE 0 NAME masu HOST 192.168.11.93 ID 1001 FID 2303380852:491776:3263114
Sun Aug 29 19:25:00 2021 [20] EVENT AFS_SRX_StData CODE 0 NAME masu HOST 192.168.11.93 ID 1001 FID 2303380852:491776:3263114
Sun Aug 29 19:25:02 2021 [20] EVENT AFS_SRX_StData CODE 0 NAME masu HOST 192.168.11.93 ID 1001 FID 2303380852:491776:3263114
Sun Aug 29 19:25:03 2021 [20] EVENT AFS_SRX_StData CODE 0 NAME masu HOST 192.168.11.93 ID 1001 FID 2303380852:491776:3263114
Sun Aug 29 19:25:07 2021 [20] EVENT AFS_SRX_StData CODE 0 NAME masu HOST 192.168.11.93 ID 1001 FID 2303380852:491776:3263114
Sun Aug 29 19:25:08 2021 [20] EVENT AFS_SRX_StData CODE 0 NAME masu HOST 192.168.11.93 ID 1001 FID 2303380852:491776:3263114
Sun Aug 29 19:25:09 2021 [20] EVENT AFS_SRX_StData CODE 0 NAME masu HOST 192.168.11.93 ID 1001 FID 2303380852:491776:3263114
Sun Aug 29 19:25:10 2021 [20] EVENT AFS_SRX_StData CODE 0 NAME masu HOST 192.168.11.93 ID 1001 FID 2303380852:491776:3263114
Sun Aug 29 19:25:14 2021 [20] EVENT AFS_SRX_StData CODE 0 NAME masu HOST 192.168.11.93 ID 1001 FID 2303380852:491776:3263114
Sun Aug 29 19:25:15 2021 [20] EVENT AFS_SRX_StData CODE 0 NAME masu HOST 192.168.11.93 ID 1001 FID 2303380852:491776:3263114
Sun Aug 29 19:25:16 2021 [20] EVENT AFS_SRX_StData CODE 0 NAME masu HOST 192.168.11.93 ID 1001 FID 2303380852:491776:3263114
Sun Aug 29 19:25:17 2021 [20] EVENT AFS_SRX_StData CODE 0 NAME masu HOST 192.168.11.93 ID 1001 FID 2303380852:491776:3263114
Sun Aug 29 19:25:18 2021 [20] EVENT AFS_SRX_StData CODE 0 NAME masu HOST 192.168.11.93 ID 1001 FID 2303380852:491776:3263114
Sun Aug 29 19:25:19 2021 [20] EVENT AFS_SRX_StData CODE 0 NAME masu HOST 192.168.11.93 ID 1001 FID 2303380852:491776:3263114
Sun Aug 29 19:25:19 2021 [20] EVENT AFS_SRX_Link CODE 0 NAME masu HOST 192.168.11.93 ID 1001 FID 2303380852:59521:3263108 STR pack-5e813c51d12b6847bbc0fcd97c2bca66da50079c.pack FID 2303380852:491776:3263114
Sun Aug 29 19:25:19 2021 [20] EVENT AFS_SRX_FchStat CODE 0 NAME masu HOST 192.168.11.93 ID 1001 FID 2303380852:491776:3263114
Sun Aug 29 19:25:20 2021 [20] EVENT AFS_SRX_BIFchSt CODE 0 NAME masu HOST 192.168.11.93 ID 1001 FIDS 2 FID 2303380852:491776:3263114 FID 2303380852:491778:3263115
Sun Aug 29 19:31:39 2021 [10] EVENT AFS_SRX_BIFchSt CODE 0 NAME masu HOST 192.168.11.201 ID 1001 FIDS 3 FID 2303380852:59521:3263108 FID 2303380852:491776:3263114 FID 2303380852:491778:3263115
Sun Aug 29 19:31:39 2021 [10] EVENT AFS_SRX_FchData CODE 0 NAME masu HOST 192.168.11.201 ID 1001 FID 2303380852:491776:3263114
Sun Aug 29 19:31:39 2021 [10] EVENT AFS_SRX_FchData CODE 0 NAME masu HOST 192.168.11.201 ID 1001 FID 2303380852:491776:3263114
Sun Aug 29 19:31:40 2021 [10] EVENT AFS_SRX_FchData CODE 49733380 NAME masu HOST 192.168.11.201 ID 1001 FID 2303380852:491776:3263114
Sun Aug 29 19:31:40 2021 [10] EVENT AFS_SRX_FchData CODE 106 NAME masu HOST 192.168.11.201 ID 1001 FID 2303380852:491776:3263114
Sun Aug 29 19:31:40 2021 [10] EVENT AFS_SRX_FchData CODE 106 NAME masu HOST 192.168.11.201 ID 1001 FID 2303380852:491776:3263114
Sun Aug 29 19:31:40 2021 [10] EVENT AFS_SRX_FchData CODE 106 NAME masu HOST 192.168.11.201 ID 1001 FID 2303380852:491776:3263114
Sun Aug 29 19:31:40 2021 [10] EVENT AFS_SRX_FchData CODE 106 NAME masu HOST 192.168.11.201 ID 1001 FID 2303380852:491776:3263114
Sun Aug 29 19:31:40 2021 [10] EVENT AFS_SRX_FchData CODE 106 NAME masu HOST 192.168.11.201 ID 1001 FID 2303380852:491776:3263114
Sun Aug 29 19:31:40 2021 [10] EVENT AFS_SRX_FchData CODE 106 NAME masu HOST 192.168.11.201 ID 1001 FID 2303380852:491776:3263114
Sun Aug 29 19:31:40 2021 [10] EVENT AFS_SRX_FchData CODE 106 NAME masu HOST 192.168.11.201 ID 1001 FID 2303380852:491776:3263114
Sun Aug 29 19:31:40 2021 [10] EVENT AFS_SRX_FchData CODE 106 NAME masu HOST 192.168.11.201 ID 1001 FID 2303380852:491776:3263114
Sun Aug 29 19:31:40 2021 [10] EVENT AFS_SRX_FchData CODE 106 NAME masu HOST 192.168.11.201 ID 1001 FID 2303380852:491776:3263114
Sun Aug 29 19:31:40 2021 [10] EVENT AFS_SRX_FchData CODE 106 NAME masu HOST 192.168.11.201 ID 1001 FID 2303380852:491776:3263114
Sun Aug 29 19:31:40 2021 [10] EVENT AFS_SRX_FchData CODE 106 NAME masu HOST 192.168.11.201 ID 1001 FID 2303380852:491776:3263114
Sun Aug 29 19:31:43 2021 [10] EVENT AFS_SRX_FchData CODE 106 NAME masu HOST 192.168.11.201 ID 1001 FID 2303380852:491776:3263114
Sun Aug 29 19:31:46 2021 [10] EVENT AFS_SRX_FchData CODE 106 NAME masu HOST 192.168.11.201 ID 1001 FID 2303380852:491776:3263114
Sun Aug 29 19:31:49 2021 [10] EVENT AFS_SRX_FchData CODE 106 NAME masu HOST 192.168.11.201 ID 1001 FID 2303380852:491776:3263114
Sun Aug 29 19:31:52 2021 [10] EVENT AFS_SRX_FchData CODE 106 NAME masu HOST 192.168.11.201 ID 1001 FID 2303380852:491776:3263114
Sun Aug 29 19:31:55 2021 [10] EVENT AFS_SRX_FchData CODE 106 NAME masu HOST 192.168.11.201 ID 1001 FID 2303380852:491776:3263114
Sun Aug 29 19:31:58 2021 [10] EVENT AFS_SRX_FchData CODE 106 NAME masu HOST 192.168.11.201 ID 1001 FID 2303380852:491776:3263114
Sun Aug 29 19:32:01 2021 [10] EVENT AFS_SRX_FchData CODE 106 NAME masu HOST 192.168.11.201 ID 1001 FID 2303380852:491776:3263114
Sun Aug 29 19:32:04 2021 [10] EVENT AFS_SRX_FchData CODE 49733380 NAME masu HOST 192.168.11.201 ID 1001 FID 2303380852:491776:3263114
Sun Aug 29 19:32:13 2021 [10] EVENT AFS_SRX_FchStat CODE 0 NAME masu HOST 192.168.11.201 ID 1001 FID 2303380852:491776:3263114
Sun Aug 29 19:32:13 2021 [10] EVENT AFS_SRX_FchData CODE 49733380 NAME masu HOST 192.168.11.201 ID 1001 FID 2303380852:491776:3263114

The error code 49733380 is UAEIO and 106 is VOFFLINE.

The RPC that results in the salvage request was issued at 10:31:39 and is a FetchData.  In other words, a read-only operation.   The FileLog events for the request are:

6.3263114
Sun Aug 29 19:31:39 2021 SRXAFS_FetchData, Fid = 2303380852.491776.3263114, Host 192.168.11.201:7001, Id 1001
Sun Aug 29 19:31:39 2021 CheckRights: len=0, for host=192.168.11.201:7001
Sun Aug 29 19:31:39 2021 FetchData_RXStyle: Pos 18446744071815340032, Len 3154
Sun Aug 29 19:31:39 2021 FetchData_RXStyle: file size 2400758866
Sun Aug 29 19:31:39 2021 Volume 2303380852 now offline, must be salvaged.
Sun Aug 29 19:31:39 2021 VOffline: Volume 2303380852 (tmp) is now offline
Sun Aug 29 19:31:39 2021 Scheduling salvage for volume 2303380852 on part /vicepc over SALVSYNC
Sun Aug 29 19:31:40 2021 SRXAFS_FetchData returns 5

The requested "Pos" is LLONG_MAX or 0x7fffffffff.   The OpenAFS fileserver should return zero bytes for an attempt to read 3154 bytes beyond LLONG_MAX.   It should not be taking the volume offline for a salvage.   The request from kafs is unusual but the volume going offline is a bug in the OpenAFS fileserver.
Comment 11 David Howells 2021-09-01 16:48:03 UTC
(In reply to Markus Suvanto from comment #4)
> masu@t470 test % git s
> On branch master
> Your branch is up to date with 'origin/master'.
> 
> Changes to be committed:
>   (use "git restore --staged <file>..." to unstage)
>       deleted:    foo.txt
> 
> Untracked files:
>   (use "git add <file>..." to include in what will be committed)
>       foo.txt

I think this is probably a separate bug.

This is a problem with silly-rename in kafs and can be reduced to:

machine 1: touch a
machine 2: touch b; mv -f b a
machine 1: stat a

where machine 1 is kafs.  The "stat a" fails with ENOENT.
Comment 12 David Howells 2021-09-21 06:26:49 UTC
Fixed in v5.15-rc3

Note You need to log in before you can comment on or make changes to this bug.