Our android 4.4.2 based on linux 3.10.20 when shutdown, it will call sync, echo "u" > /proc/sysrq-trigger. But I find a problem that sometime shutdown will hang and sometimes shutdown need long times. Check the kmsg log there are ext4 error occur, such as: [76.209728]-(3)[1:init]SysRq: -(3)[1:init]Emergency Remount R/O [76.282979] (3)[21:kworker/3:0]Emergency Remount Complete [77.188953] (0)[84:kworker/u8:1]EXT4-fs (mmcblk0p15): ext4_da_writepages: jbd2_start: 8192 pages, ino 131067; err -30 [77.205402] (0)[84:kworker/u8:1]EXT4-fs (mmcblk0p15): ext4_da_writepages: jbd2_start: 8192 pages, ino 131067; err -30 [78.220353] (3)[3075:kworker/u8:5]EXT4-fs (mmcblk0p15): ext4_da_writepages: jbd2_start: 8192 pages, ino 131067; err -30 [79.246658] (1)[3075:kworker/u8:5]EXT4-fs (mmcblk0p15): ext4_da_writepages: jbd2_start: 8192 pages, ino 131067; err -30 [80.261003] (3)[3075:kworker/u8:5]EXT4-fs (mmcblk0p15): ext4_da_writepages: jbd2_start: 8192 pages, ino 131067; err -30 [80.272068] (3)[3075:kworker/u8:5]EXT4-fs (mmcblk0p15): ext4_da_writepages: jbd2_start: 1024 pages, ino 131067; err -30 [81.300966] (1)[3075:kworker/u8:5]EXT4-fs (mmcblk0p15): ext4_da_writepages: jbd2_start: 8192 pages, ino 131067; err -30 [81.688616] (0)[3075:kworker/u8:5]EXT4-fs (mmcblk0p15): ext4_da_writepages: jbd2_start: 8192 pages, ino 131067; err -30 ...... [90.601634] (0)[3075:kworker/u8:5]EXT4-fs (mmcblk0p15): ext4_da_writepages: jbd2_start: 1024 pages, ino 131067; err -30 [91.642286] (0)[3075:kworker/u8:5]EXT4-fs (mmcblk0p15): ext4_da_writepages: jbd2_start: 8192 pages, ino 131067; err -30 err 30 is -EROFS, error due to writeback to read-only filesystem. As part of remount, I think linux will writeback all drity datas. Should no writeback occur after remount R/O. Not sure why I see these errors.
Does upstream android really handle a shutdown using "echo u > /proc/sysrq-trigger", or is that something you added locally? If it's upstream, I need to have some words with the Google Android team.... sysrq-u is documented to be "Emergency Remount R/O". It's not something that is designed to be used under normal circumstances. So if upstream android is doing this, it's arguably a bug. sysrq-u is implemented using do_emergency_remount() in fs/super.c, and it calls do_remount_sb with force set to 1 (the last argument is "force"). do_remount_sb(sb, MS_RDONLY, NULL, 1); The "force" option omits the call to sb_prepare_remount_readonly(), which is what forces the writeback to happen. The reason why it is omitted is that sysrq-u is designed to be used when the system is unstable, and so the writeback for one of the superblocks might block forever, and this would prevent the subsequent superblocks from getting remounted read-only. So people who use sysrq-u are usually advised to do an sysrq-s, which is an "emergency sync". (This is avoids doing a data integrity writeback, since again it's designed for emergencies, when part of the system may be partially deadlocked, and so the goal is to save what you can, not to try to save everything. Perfection, if it causes the sync to stop and never complete, and hence not sync out some buffers that could be saved, is the enemy of the good here.) The real issue here is that sysrq-u and sysrq-u is designed for emergency war room triage, what Dr. Hawkeye Pierce in M*A*S*H referred to as "meatball surgery". It is battlefield surgery, and it is *not* what you would expect in a hospital surgery room located thousands of miles from a war zone. So if android is using this under normal situations, it's an abuse of what these interfaces are intended to be used for.
You can refer this: https://android.googlesource.com/platform/system/core/+/master/libcutils/android_reboot.c During shutdown in android, it realy will goto emergency_remount() flow. And from the ksmg log, it seems ext4 writeback continued almost 15 seconds after emergency_remount() completed. Not sure why still has so many writeback occurs, even it used emergency_remount(), still has so many writeback occurs, and takes so long time, this will lead to shutdown hang or delay
And if not used echo u > /proc/sysrq-trigger, what's the better solution?
You use the equivalent of umount -a (which is what Linux distributions that don't have a pathological fear of GPL'ed code use --- i.e., everyone else except Android). So this is just the matter of writing code which attempts to unmount all of the file systems, one by one.
OK. And what about that problem, after emergency remount complete, still has so long time writeback, it seems like a issue because delay shutdown time. Maybe need to discard all the dirty pages since already remount ro
Looking at the Android source code, they already have code to parse and iterate over /proc/mounts. So what I'd recommend is that this code (a) try to do a normal umount, (b) if (a) fails, try to do a remount read/only, then (c) if (a) (b) fails, call syncfs() to force the file system to write back all of dirty data. Then after it has iterated over all of the file systems, it can then try to do the emergency remount read/only command. The other thing I'd suggest is that android do a better job requesting that processes cleanly shutdown, and then after a decent interval, force kill all of the remaining processes before doing the file system umount code path as described above. If the other processes were cleanly shut down, then there won't be any further disk writes, and it should be possible for the file systems to be cleanly unmounted, assuming that the last remaining process chdir's itself to the root directory, and then remounts the root file system read/only as the final step. Again, this is what a proper Unix/Linux system would do for a clean shutdown.
Look, if you're complaining about long shutdown times, you can do a forced shutdown. You will then lose data. If you care about not losing data, then you have to wait for all of the data to be written out to flash. There's no magic pixie dust here. You can have a clean shutdown that might take a little while, and no data loss, or you can live with the potential risk of data loss if you can't stand waiting even a tiny bit of time for the shutdown to finish up. The other question is why is some user process writing so much to the file system before the system is shutdown. Is this really some ill-behaved user application which is burning huge amounts of flash lifespan and battery life by constantly writing to flash? Or is there some other bug going on which is causing system processes to write huge amounts of logs or other junk to the file system unnecessarily?
Sorry, I'm not say clear. Ext4 was mounted as dealloc, when applications write, they don't know shutdown happened later. So after emergency remount ro, flush worker begin write dirty data. This flow maybe normal. But as system already remounted to read only, whether we want or not, already can't flush dirty data back, so still writeback seems unnecessary.
Also refer to this patch:http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/fs/super.c?id=807612db2f9940b9fa6deaef054eb16d51bd3e00 Just fix writeback dirty data after remount complete. So I think maybe the sync flow can be improved on 3.10.20 even if used emergency_remount().
So see my earlier answer. If this is a normal shutdown, you must use syncfs() or the normal umount to unmount the file system. Emergency_remount is designed for emergencies when it may not be possible to do a blocking writeback. You could change emergency_remount to pass in a force value of 0 instead of 1. However, this will break non-android use cases where the system administrator really needs the current behaviour in a real emergency. Worse, even for the android use case, there may be times when there may be some kind of I/O error or other malfunction, where a blocking writeblock could stall the umount process indefinitely. So the right answer really is to use the algorithm I've outlined in comment #6.
Hi Ken, commit 807612db2f9940 is not going to help you, since emergency_remount() is calling do_remount_sb with force == 0. Note that if applications are still writing to the file system at the time of the shutdown, the application files are at risk of getting corrupted. It doesn't actually really matter if sb_prepare_readonly_remount() causes those writes to return EROFS as opposed to returning success and then having the writes fail. It might quiet the the error messages in the system log, but it's not going to allow the applications to shutdown cleanly --- because if this patch was going to make a difference, it implies that applications are not getting shut down cleanly before android triggers the emergency_remount.
Hi Theodore, How about these informations in the kmsg: [76.209728]-(3)[1:init]SysRq: -(3)[1:init]Emergency Remount R/O [76.282979] (3)[21:kworker/3:0]Emergency Remount Complete ...... [80.272068] (3)[3075:kworker/u8:5]EXT4-fs (mmcblk0p15): ext4_da_writepages: jbd2_start: 1024 pages, ino 131067; err -30 ...... [85.407025] (1)[3075:kworker/u8:5]EXT4-fs (mmcblk0p15): ext4_da_writepages: jbd2_start: 1024 pages, ino 131067; err -30 ...... [90.601634] (0)[3075:kworker/u8:5]EXT4-fs (mmcblk0p15): ext4_da_writepages: jbd2_start: 1024 pages, ino 131067; err -30 "ext4_da_writepages: jbd2_start: 1024 pages, ino 131067; err -30" was presented every 5 seconds, so means these dirty pages old than 30 seconds? If so, these datas were written before shutdown, but did not be written out after sync() and emergency_remount(), does it normal?
Well, let's gather some data. In android_reboot.c can you add the following: static void get_debug_mem(void) { int fd, cnt = 0; fd = open("/proc/sysrq-trigger", O_WRONLY); if (fd < 0) { return; } write(fd, "m", 1); close(fd); } And then in android reboot.c, make the beginning of the function read as follows: { get_debug_info(); sync; get_debug_info(); sync; get_debug_info(); sync; get_debug_info(); remount_ro(); ... And then send the output in the kmesg? That should give us information about what might be going on. It's possible that get_nr_dirty_pages() or some other part of the machinery used by fs/fs-writeback.c's writeback_inodes_sb() is somehow incorrectly not syncing all of the dirty pages. The other possibility is that the sync() system call is supposed to send out to disk at least as many dirty pages as were present at the time of the sync() system call was issued, but it is ***not*** a data integrity operation. Looking at the high level code, the main concern of that codepath is to make sure that sync() would eventually return, and not loop forever, even if some other user process was constantly dirtying pages. It simply wasn't intended for the use case that Android is trying to use it for. From my reading of the VFS writeback code, it's possible that if there were pages getting dirtied at the time of the sync() system call, that some of the pages that were dirtied after sync() system call was issued would be sent out to disk, but some of the older pages that were dirtied before sync() system call might not make it out to disk. Is it possible for you to enable ftrace on your kernel? It might be useful to see if in fact there is any attempts to be writing to one or more file systems at the time when android_reboot() gets called, and the kernel tracing facilities might help you accomplish that. - Ted
On Mon, May 19, 2014 at 03:41:21AM +0000, bugzilla-daemon@bugzilla.kernel.org wrote: > [80.272068] (3)[3075:kworker/u8:5]EXT4-fs (mmcblk0p15): ext4_da_writepages: > jbd2_start: 1024 pages, ino 131067; err -30 Oh, and since we now have a file system an inode number, you can investigate what this inode might actually be. Suggested commands: debugfs /dev/mmcblk0p15 debugfs: ncheck 131067 and debugfs: stat <131067> Cheers, - Ted
Hi all, I meet the same bug with kernel3.10.65 as the following: >[26287.698059] c1 EXT4-fs (mmcblk0p24): ext4_da_writepages: jbd2_start: 2048 >pages, ino 59319; err -30 [26287.698089] c1 EXT4-fs (mmcblk0p24): ext4_da_writepages: jbd2_start: 2048 pages, ino 59319; err -30 [26287.698089] c1 EXT4-fs (mmcblk0p24): ext4_da_writepages: jbd2_start: 2048 pages, ino 59319; err -30 [26287.698120] c1 EXT4-fs (mmcblk0p24): ext4_da_writepages: jbd2_start: 2048 pages, ino 59319; err -30 [26287.510345] c1 EXT4-fs (mmcblk0p24): ext4_da_writepages: jbd2_start: 1024 pages, ino 59319; err -30 ...... And at last caused kernel crash. you can see many threads blocked: crash> ps | grep UN 1 0 0 dc054000 UN 0.1 1168 508 init 25 2 3 dc08b300 UN 0.0 0 0 [kworker/3:0] 88 2 0 dc9e1980 UN 0.0 0 0 [sprd_tb_thread] 89 2 2 dc9e1dc0 UN 0.0 0 0 [sprd_hotplug] 194 1 0 dd2c1100 UN 0.2 2208 840 perfprofd 210 1 0 dd2c3300 UN 0.0 376 84 batterysrv 300 1 0 dd60f740 UN 0.2 6876 1160 collect_apr 1465 256 2 d803c000 UN 14.9 820188 78000 system_server 1478 256 3 d825e640 UN 14.9 820188 78000 android.bg 1479 256 3 d803d980 UN 14.9 820188 78000 ActivityManager 1489 256 1 dd293b80 UN 14.9 820188 78000 CpuTracker 1532 256 0 dd67b300 UN 14.9 820188 78000 InputReader 1552 256 1 d8031980 UN 14.9 820188 78000 UEventObserver 1615 256 2 d8046ec0 UN 9.6 598176 50240 ndroid.systemui 2034 256 2 d9713b80 UN 3.7 529732 19416 .android.dialer 2145 256 0 d9100880 UN 14.9 820188 78000 watchdog 2592 256 3 d7a10440 UN 14.9 820188 78000 Thread-164 2616 256 2 d97b9100 UN 14.9 820188 78000 Error dump: sys 2638 1 0 d61b8440 UN 0.4 243412 2116 ylog crash> crash> bt 1 PID: 1 TASK: dc054000 CPU: 0 COMMAND: "init" #0 [<c05a8364>] (__schedule) from [<c05a6a08>] #1 [<c05a6a08>] (schedule_timeout) from [<c05a7a80>] #2 [<c05a7a80>] (__down) from [<c003fab4>] #3 [<c003fab4>] (down) from [<c0389110>] #4 [<c0389110>] (_mmc_blk_suspend) from [<c037e824>] #5 [<c037e824>] (mmc_bus_shutdown) from [<c024a8e4>] #6 [<c024a8e4>] (device_shutdown) from [<c0030924>] #7 [<c0030924>] (kernel_restart) from [<c0030ad8>] #8 [<c0030ad8>] (sys_reboot) from [<c000f1c0>] pc : [<0005c4c0>] lr : [<00016071>] psr: 200e0010 sp : be8f79b8 ip : be8f79c8 fp : 0007c657 r10: 0007adb9 r9 : 0007beb8 r8 : 0007c64a r7 : 00000058 r6 : 00000000 r5 : 00000000 r4 : 00000000 r3 : be8f7a07 r2 : a1b2c3d4 r1 : 28121969 r0 : fee1dead Flags: nzCv IRQs on FIQs on Mode USER_32 ISA ARM crash> >when reboot,init thread was blocked. crash> bt 25 PID: 25 TASK: dc08b300 CPU: 3 COMMAND: "kworker/3:0" #0 [<c05a8364>] (__schedule) from [<c05a885c>] #1 [<c05a885c>] (io_schedule) from [<c00eb68c>] #2 [<c00eb68c>] (sleep_on_buffer) from [<c05a6c10>] #3 [<c05a6c10>] (__wait_on_bit) from [<c05a6cd4>] #4 [<c05a6cd4>] (out_of_line_wait_on_bit) from [<c00eef4c>] #5 [<c00eef4c>] (__sync_dirty_buffer) from [<c014d054>] #6 [<c014d054>] (ext4_commit_super) from [<c014d1f4>] #7 [<c014d1f4>] (ext4_mark_recovery_complete) from [<c014eb24>] #8 [<c014eb24>] (ext4_remount) from [<c00c9360>] #9 [<c00c9360>] (do_remount_sb) from [<c00c9490>] #10 [<c00c9490>] (do_emergency_remount) from [<c0035bb0>] #11 [<c0035bb0>] (process_one_work) from [<c00367fc>] #12 [<c00367fc>] (worker_thread) from [<c003b2f0>] #13 [<c003b2f0>] (kthread) from [<c000f258>] crash> Is this bug fixed? Thanks!
Created attachment 227411 [details] some error log There are two logs from different phone.
(In reply to yongjun xiao from comment #16) > Created attachment 227411 [details] > some error log > > There are two logs from different phone. some logs are lost, so the only error logs as the attachement. But we dump the crash file memory,this may be help.
Finally my customer defer this issue, so I don't have platform to collect the debug info anymore