Bug 214927 - re-mount read-write (mount -oremount,rw) of read-only filesystem rejected with EROFS, but block device is not read-only
Summary: re-mount read-write (mount -oremount,rw) of read-only filesystem rejected wit...
Status: NEW
Alias: None
Product: File System
Classification: Unclassified
Component: ext3 (show other bugs)
Hardware: All Linux
: P1 normal
Assignee: fs_ext3@kernel-bugs.osdl.org
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2021-11-03 09:49 UTC by Ulrich.Windl
Modified: 2021-11-03 15:13 UTC (History)
1 user (show)

See Also:
Kernel Version: 4.12.14-122.91-default (SLES12 SP5)
Subsystem:
Regression: No
Bisected commit-id:


Attachments

Description Ulrich.Windl 2021-11-03 09:49:27 UTC
I think there's a kernel or filesystem bug related to ext3:
We run a SLES12 SP5 Xen PVM that gets its system disk from a sparse file located on a SLES15 SP2 Xen host using OCFS2 (the host is a node in a pacemaker cluster).

The OCFS2 filesystem became full or almost full, and thus the ext3 filesystem(s) experienced write errors, remounting to read-only.
So far, so good, but:
The errors behavior of ext 3 was set to "continue", so I wonder why it had been set to read-only at all.
Next, after having extended the OCFS2 filesystem size, any remount-attempt fails with:

mount: /: cannot remount /dev/... read-write, is write-protected.

I strace-d the mount command and the mount syscall is returning EROFS (Read-only filesystem).
However in the VM configuration on the host the disk is marked read-write, the disk in the VM is flagged read-write, also the VG, LVs, etc. I checked the /sys/block/*/ro, too: It's all "0".

I also did an fsck (which succeeded), but still after that the error is the same.
Interestingly I noticed that after a *failed* remount attempt, the filesystem (that is mounted read-only) got the error flag being set again.

The only conclusion I have is that there is at least one kernel bug regarding the read-only status of the block device.
Comment 1 Ulrich.Windl 2021-11-03 09:58:00 UTC
The last error I saw in syslog was:
kernel: print_req_error: I/O error, dev xvdb, sector 30704496
kernel: Aborting journal on device dm-4-8.
Comment 2 Ulrich.Windl 2021-11-03 10:06:19 UTC
Having a second look, it seems the first error was concurrently with fstrim, so possibly fstrim can play a role in the scenario:
kernel: print_req_error: I/O error, dev xvdb, sector 29730024
fstrim[8864]: fstrim: /var: FITRIM ioctl failed: Input/output error
fstrim[8864]: fstrim: /tmp: FITRIM ioctl failed: Input/output error
...
(the final message was like 13 minutes after those above)
Comment 3 Theodore Tso 2021-11-03 15:13:35 UTC
In the case when there is a I/O error while trying to write to the journal, there's nothing that can be done safely other than to force the file system to be read-only.

When there is a file system which has aborted or otherwise has run into errors, you have to unmount the file system before it is safe to remount it read/write.  In fact, the ideal procedure is to umount the file system, run fsck, and then remount the file system.

In the case of the root file system, you can't unmount the file system, so it is acceptable to remount it read/only (if that hasn't been done automatically), run fsck, and then reboot.

The reason for this because while the file system is mounted, there may be file system corruption which was fixed by fsck, but for which some corruption (for example, a corrupted refcount) is still present in memory.  So it is not safe to take a mounted root file system, and modify it using fsck, and then remount it read/write.   You have to reboot so it can be freshly mounted on the reboot.

Note You need to log in before you can comment on or make changes to this bug.