Bug 76261 - ext4_da_writepages err -30 after remount ro during shutdown
Summary: ext4_da_writepages err -30 after remount ro during shutdown
Status: NEW
Alias: None
Product: File System
Classification: Unclassified
Component: ext4 (show other bugs)
Hardware: ARM Linux
: P1 blocking
Assignee: fs_ext4@kernel-bugs.osdl.org
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2014-05-15 05:01 UTC by kun.chen
Modified: 2016-08-03 03:14 UTC (History)
4 users (show)

See Also:
Kernel Version: 3.10.20
Subsystem:
Regression: No
Bisected commit-id:


Attachments
some error log (12.58 KB, application/octet-stream)
2016-08-03 02:34 UTC, yongjun xiao
Details

Description kun.chen 2014-05-15 05:01:50 UTC
Our android 4.4.2 based on linux 3.10.20

when shutdown, it will call sync, echo "u" > /proc/sysrq-trigger. But I find a problem that sometime shutdown will hang and sometimes shutdown need long times.

Check the kmsg log there are ext4 error occur, such as:

[76.209728]-(3)[1:init]SysRq: -(3)[1:init]Emergency Remount R/O
[76.282979] (3)[21:kworker/3:0]Emergency Remount Complete
[77.188953] (0)[84:kworker/u8:1]EXT4-fs (mmcblk0p15): ext4_da_writepages: jbd2_start: 8192 pages, ino 131067; err -30
[77.205402] (0)[84:kworker/u8:1]EXT4-fs (mmcblk0p15): ext4_da_writepages: jbd2_start: 8192 pages, ino 131067; err -30
[78.220353] (3)[3075:kworker/u8:5]EXT4-fs (mmcblk0p15): ext4_da_writepages: jbd2_start: 8192 pages, ino 131067; err -30
[79.246658] (1)[3075:kworker/u8:5]EXT4-fs (mmcblk0p15): ext4_da_writepages: jbd2_start: 8192 pages, ino 131067; err -30
[80.261003] (3)[3075:kworker/u8:5]EXT4-fs (mmcblk0p15): ext4_da_writepages: jbd2_start: 8192 pages, ino 131067; err -30
[80.272068] (3)[3075:kworker/u8:5]EXT4-fs (mmcblk0p15): ext4_da_writepages: jbd2_start: 1024 pages, ino 131067; err -30
[81.300966] (1)[3075:kworker/u8:5]EXT4-fs (mmcblk0p15): ext4_da_writepages: jbd2_start: 8192 pages, ino 131067; err -30
[81.688616] (0)[3075:kworker/u8:5]EXT4-fs (mmcblk0p15): ext4_da_writepages: jbd2_start: 8192 pages, ino 131067; err -30
......
[90.601634] (0)[3075:kworker/u8:5]EXT4-fs (mmcblk0p15): ext4_da_writepages: jbd2_start: 1024 pages, ino 131067; err -30
[91.642286] (0)[3075:kworker/u8:5]EXT4-fs (mmcblk0p15): ext4_da_writepages: jbd2_start: 8192 pages, ino 131067; err -30

err 30 is -EROFS, error due to writeback to read-only filesystem.
As part of remount, I think linux will writeback all drity datas. Should no writeback occur after remount R/O. Not sure why I see these errors.
Comment 1 Theodore Tso 2014-05-15 15:11:11 UTC
Does upstream android really handle a shutdown using "echo u > /proc/sysrq-trigger", or is that something you added locally?   If it's upstream, I need to have some words with the Google Android team....

sysrq-u is documented to be "Emergency Remount R/O".  It's not something that is designed to be used under normal circumstances.  So if upstream android is doing this, it's arguably a bug.

sysrq-u is implemented using do_emergency_remount() in fs/super.c, and it calls do_remount_sb with force set to 1 (the last argument is "force").

			do_remount_sb(sb, MS_RDONLY, NULL, 1);

The "force" option omits the call to sb_prepare_remount_readonly(), which is what forces the writeback to happen.  The reason why it is omitted is that sysrq-u is designed to be used when the system is unstable, and so the writeback for one of the superblocks might block forever, and this would prevent the subsequent superblocks from getting remounted read-only.

So people who use sysrq-u are usually advised to do an sysrq-s, which is an "emergency sync".   (This is avoids doing a data integrity writeback, since again it's designed for emergencies, when part of the system may be partially deadlocked, and so the goal is to save what you can, not to try to save everything.  Perfection, if it causes the sync to stop and never complete, and hence not sync out some buffers that could be saved, is the enemy of the good here.)

The real issue here is that sysrq-u and sysrq-u is designed for emergency war room triage, what Dr. Hawkeye Pierce in M*A*S*H referred to as "meatball surgery".  It is battlefield surgery, and it is *not* what you would expect in a hospital surgery room located thousands of miles from a war zone.  So if android is using this under normal situations, it's an abuse of what these interfaces are intended to be used for.
Comment 2 kun.chen 2014-05-16 01:50:01 UTC
You can refer this:
https://android.googlesource.com/platform/system/core/+/master/libcutils/android_reboot.c
During shutdown in android, it realy will goto emergency_remount() flow. 

And from the ksmg log, it seems ext4 writeback continued almost 15 seconds after emergency_remount() completed. Not sure why still has so many writeback occurs, even it used emergency_remount(), still has so many writeback occurs, and takes so long time, this will lead to shutdown hang or delay
Comment 3 kun.chen 2014-05-16 03:32:40 UTC
And if not used echo u > /proc/sysrq-trigger, what's the better solution?
Comment 4 Theodore Tso 2014-05-16 03:36:05 UTC
You use the equivalent of umount -a (which is what Linux distributions that don't have a pathological fear of GPL'ed code use --- i.e., everyone else except Android).   So this is just the matter of writing code which attempts to unmount all of the file systems, one by one.
Comment 5 kun.chen 2014-05-16 03:47:37 UTC
OK.
And what about that problem, after emergency remount complete, still has so long time writeback, it seems like a issue because delay shutdown time. Maybe need to discard all the dirty pages since already remount ro
Comment 6 Theodore Tso 2014-05-16 03:49:28 UTC
Looking at the Android source code, they already have code to parse and iterate over /proc/mounts.  So what I'd recommend is that this code (a) try to do a normal umount, (b) if (a) fails, try to do a remount read/only, then (c) if (a) (b) fails, call syncfs() to force the file system to write back all of dirty data.

Then after it has iterated over all of the file systems, it can then try to do the emergency remount read/only command.

The other thing I'd suggest is that android do a better job requesting that processes cleanly shutdown, and then after a decent interval, force kill all of the remaining processes before doing the file system umount code path as described above.  If the other processes were cleanly shut down, then there won't be any further disk writes, and it should be possible for the file systems to be cleanly unmounted, assuming that the last remaining process chdir's itself to the root directory, and then remounts the root file system read/only as the final step.

Again, this is what a proper Unix/Linux system would do for a clean shutdown.
Comment 7 Theodore Tso 2014-05-16 03:52:50 UTC
Look, if you're complaining about long shutdown times, you can do a forced shutdown.  You will then lose data.  If you care about not losing data, then you have to wait for all of the data to be written out to flash.  There's no magic pixie dust here.  You can have a clean shutdown that might take a little while, and no data loss, or you can live with the potential risk of data loss if you can't stand waiting even a tiny bit of time for the shutdown to finish up.

The other question is why is some user process writing so much to the file system before the system is shutdown.  Is this really some ill-behaved user application which is burning huge amounts of flash lifespan and battery life by constantly writing to flash?   Or is there some other bug going on which is causing system processes to write huge amounts of logs or other junk to the file system unnecessarily?
Comment 8 kun.chen 2014-05-16 04:27:15 UTC
Sorry, I'm not say clear.

Ext4 was mounted as dealloc, when applications write, they don't know shutdown happened later. So after emergency remount ro, flush worker begin write dirty data. This flow maybe normal. But as system already remounted to read only, whether we want or not, already can't flush dirty data back, so still writeback seems unnecessary.
Comment 9 kun.chen 2014-05-16 05:10:47 UTC
Also refer to this patch:http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/fs/super.c?id=807612db2f9940b9fa6deaef054eb16d51bd3e00

Just fix writeback dirty data after remount complete. So I think maybe the sync flow can be improved on 3.10.20 even if used emergency_remount().
Comment 10 Theodore Tso 2014-05-16 12:11:39 UTC
So see my earlier answer.   If this is a normal shutdown, you must use syncfs() or the normal umount to unmount the file system.   Emergency_remount is designed for emergencies when it may not be possible to do a blocking writeback.   You could change emergency_remount to pass in a force value of 0 instead of 1.  However, this will break non-android use cases where the system administrator really needs the current behaviour in a real emergency.  Worse, even for the android use case, there may be times when there may be some kind of I/O error or other malfunction, where a blocking writeblock could stall the umount process indefinitely.

So the right answer really is to use the algorithm I've outlined in comment #6.
Comment 11 Theodore Tso 2014-05-16 13:44:36 UTC
Hi Ken, commit 807612db2f9940 is not going to help you, since emergency_remount() is calling do_remount_sb with force == 0.

Note that if applications are still writing to the file system at the time of the shutdown, the application files are at risk of getting corrupted.   It doesn't actually really matter if sb_prepare_readonly_remount() causes those writes to return EROFS as opposed to returning success and then having the writes fail.  It might quiet the the error messages in the system log, but it's not going to allow the applications to shutdown cleanly --- because if this patch was going to make a difference, it implies that applications are not getting shut down cleanly before android triggers the emergency_remount.
Comment 12 kun.chen 2014-05-19 03:41:21 UTC
Hi Theodore,

How about these informations in the kmsg:

[76.209728]-(3)[1:init]SysRq: -(3)[1:init]Emergency Remount R/O
[76.282979] (3)[21:kworker/3:0]Emergency Remount Complete
......
[80.272068] (3)[3075:kworker/u8:5]EXT4-fs (mmcblk0p15): ext4_da_writepages: jbd2_start: 1024 pages, ino 131067; err -30
......
[85.407025] (1)[3075:kworker/u8:5]EXT4-fs (mmcblk0p15): ext4_da_writepages: jbd2_start: 1024 pages, ino 131067; err -30
......
[90.601634] (0)[3075:kworker/u8:5]EXT4-fs (mmcblk0p15): ext4_da_writepages: jbd2_start: 1024 pages, ino 131067; err -30

"ext4_da_writepages: jbd2_start: 1024 pages, ino 131067; err -30" was presented every 5 seconds, so means these dirty pages old than 30 seconds? If so, these datas were written before shutdown, but did not be written out after sync() and emergency_remount(), does it normal?
Comment 13 Theodore Tso 2014-05-19 12:46:35 UTC
Well, let's gather some data.  In android_reboot.c can you add the following:

static void get_debug_mem(void)
{
    int fd, cnt = 0;
    fd = open("/proc/sysrq-trigger", O_WRONLY);
    if (fd < 0) {
        return;
    }
    write(fd, "m", 1);
    close(fd);
}

And then in android reboot.c, make the beginning of the function read as follows:

{
	get_debug_info();
	sync;
	get_debug_info();
	sync;
	get_debug_info();
	sync;
	get_debug_info();
	remount_ro();
	...

And then send the output in the kmesg?

That should give us information about what might be going on.  It's
possible that get_nr_dirty_pages() or some other part of the machinery
used by fs/fs-writeback.c's writeback_inodes_sb() is somehow
incorrectly not syncing all of the dirty pages.

The other possibility is that the sync() system call is supposed to
send out to disk at least as many dirty pages as were present at the
time of the sync() system call was issued, but it is ***not*** a data
integrity operation.  Looking at the high level code, the main concern
of that codepath is to make sure that sync() would eventually return,
and not loop forever, even if some other user process was constantly
dirtying pages.  It simply wasn't intended for the use case that
Android is trying to use it for.

From my reading of the VFS writeback code, it's possible that if there
were pages getting dirtied at the time of the sync() system call, that
some of the pages that were dirtied after sync() system call was
issued would be sent out to disk, but some of the older pages that
were dirtied before sync() system call might not make it out to disk.

Is it possible for you to enable ftrace on your kernel?  It might be
useful to see if in fact there is any attempts to be writing to one or
more file systems at the time when android_reboot() gets called, and
the kernel tracing facilities might help you accomplish that.

     	    	    		      	   	 - Ted
Comment 14 Theodore Tso 2014-05-19 13:41:15 UTC
On Mon, May 19, 2014 at 03:41:21AM +0000, bugzilla-daemon@bugzilla.kernel.org wrote:
> [80.272068] (3)[3075:kworker/u8:5]EXT4-fs (mmcblk0p15): ext4_da_writepages:
> jbd2_start: 1024 pages, ino 131067; err -30

Oh, and since we now have a file system an inode number, you can
investigate what this inode might actually be.  Suggested commands:

debugfs /dev/mmcblk0p15
debugfs: ncheck 131067

and

debugfs: stat <131067>

Cheers,

					- Ted
Comment 15 yongjun xiao 2016-08-03 02:10:02 UTC
Hi all,
   I meet the same bug with kernel3.10.65 as the following:
>[26287.698059] c1 EXT4-fs (mmcblk0p24): ext4_da_writepages: jbd2_start: 2048
>pages, ino 59319; err -30
[26287.698089] c1 EXT4-fs (mmcblk0p24): ext4_da_writepages: jbd2_start: 2048 pages, ino 59319; err -30
[26287.698089] c1 EXT4-fs (mmcblk0p24): ext4_da_writepages: jbd2_start: 2048 pages, ino 59319; err -30
[26287.698120] c1 EXT4-fs (mmcblk0p24): ext4_da_writepages: jbd2_start: 2048 pages, ino 59319; err -30
[26287.510345] c1 EXT4-fs (mmcblk0p24): ext4_da_writepages: jbd2_start: 1024 pages, ino 59319; err -30
......

And at last caused kernel crash.
you can see many threads blocked:

crash> ps | grep UN
      1      0   0  dc054000  UN   0.1    1168    508  init
     25      2   3  dc08b300  UN   0.0       0      0  [kworker/3:0]
     88      2   0  dc9e1980  UN   0.0       0      0  [sprd_tb_thread]
     89      2   2  dc9e1dc0  UN   0.0       0      0  [sprd_hotplug]
    194      1   0  dd2c1100  UN   0.2    2208    840  perfprofd
    210      1   0  dd2c3300  UN   0.0     376     84  batterysrv
    300      1   0  dd60f740  UN   0.2    6876   1160  collect_apr
   1465    256   2  d803c000  UN  14.9  820188  78000  system_server
   1478    256   3  d825e640  UN  14.9  820188  78000  android.bg
   1479    256   3  d803d980  UN  14.9  820188  78000  ActivityManager
   1489    256   1  dd293b80  UN  14.9  820188  78000  CpuTracker
   1532    256   0  dd67b300  UN  14.9  820188  78000  InputReader
   1552    256   1  d8031980  UN  14.9  820188  78000  UEventObserver
   1615    256   2  d8046ec0  UN   9.6  598176  50240  ndroid.systemui
   2034    256   2  d9713b80  UN   3.7  529732  19416  .android.dialer
   2145    256   0  d9100880  UN  14.9  820188  78000  watchdog
   2592    256   3  d7a10440  UN  14.9  820188  78000  Thread-164
   2616    256   2  d97b9100  UN  14.9  820188  78000  Error dump: sys
   2638      1   0  d61b8440  UN   0.4  243412   2116  ylog
crash> 
crash> bt 1
PID: 1      TASK: dc054000  CPU: 0   COMMAND: "init"
 #0 [<c05a8364>] (__schedule) from [<c05a6a08>]
 #1 [<c05a6a08>] (schedule_timeout) from [<c05a7a80>]
 #2 [<c05a7a80>] (__down) from [<c003fab4>]
 #3 [<c003fab4>] (down) from [<c0389110>]
 #4 [<c0389110>] (_mmc_blk_suspend) from [<c037e824>]
 #5 [<c037e824>] (mmc_bus_shutdown) from [<c024a8e4>]
 #6 [<c024a8e4>] (device_shutdown) from [<c0030924>]
 #7 [<c0030924>] (kernel_restart) from [<c0030ad8>]
 #8 [<c0030ad8>] (sys_reboot) from [<c000f1c0>]
    pc : [<0005c4c0>]    lr : [<00016071>]    psr: 200e0010
    sp : be8f79b8  ip : be8f79c8  fp : 0007c657
    r10: 0007adb9  r9 : 0007beb8  r8 : 0007c64a
    r7 : 00000058  r6 : 00000000  r5 : 00000000  r4 : 00000000
    r3 : be8f7a07  r2 : a1b2c3d4  r1 : 28121969  r0 : fee1dead
    Flags: nzCv  IRQs on  FIQs on  Mode USER_32  ISA ARM
crash> 
>when reboot,init thread was blocked.

crash> bt 25
PID: 25     TASK: dc08b300  CPU: 3   COMMAND: "kworker/3:0"
 #0 [<c05a8364>] (__schedule) from [<c05a885c>]
 #1 [<c05a885c>] (io_schedule) from [<c00eb68c>]
 #2 [<c00eb68c>] (sleep_on_buffer) from [<c05a6c10>]
 #3 [<c05a6c10>] (__wait_on_bit) from [<c05a6cd4>]
 #4 [<c05a6cd4>] (out_of_line_wait_on_bit) from [<c00eef4c>]
 #5 [<c00eef4c>] (__sync_dirty_buffer) from [<c014d054>]
 #6 [<c014d054>] (ext4_commit_super) from [<c014d1f4>]
 #7 [<c014d1f4>] (ext4_mark_recovery_complete) from [<c014eb24>]
 #8 [<c014eb24>] (ext4_remount) from [<c00c9360>]
 #9 [<c00c9360>] (do_remount_sb) from [<c00c9490>]
#10 [<c00c9490>] (do_emergency_remount) from [<c0035bb0>]
#11 [<c0035bb0>] (process_one_work) from [<c00367fc>]
#12 [<c00367fc>] (worker_thread) from [<c003b2f0>]
#13 [<c003b2f0>] (kthread) from [<c000f258>]
crash>

Is this bug fixed?
Thanks!
Comment 16 yongjun xiao 2016-08-03 02:34:35 UTC
Created attachment 227411 [details]
some error log

There are two logs from different phone.
Comment 17 yongjun xiao 2016-08-03 02:39:10 UTC
(In reply to yongjun xiao from comment #16)
> Created attachment 227411 [details]
> some error log
> 
> There are two logs from different phone.

some logs are lost, so the only error logs as the attachement.
But we dump the crash file memory,this may be help.
Comment 18 kun.chen 2016-08-03 03:14:16 UTC
Finally my customer defer this issue, so I don't have platform to collect the debug info anymore

Note You need to log in before you can comment on or make changes to this bug.