Bug 86471
Summary: | drastic reduction of write performance after the termination of mysqld | ||
---|---|---|---|
Product: | File System | Reporter: | Markus Königshaus (m.koenigshaus) |
Component: | ext4 | Assignee: | fs_ext4 (fs_ext4) |
Status: | RESOLVED MOVED | ||
Severity: | normal | CC: | j-p-t, tytso |
Priority: | P1 | ||
Hardware: | ARM | ||
OS: | Linux | ||
Kernel Version: | 3.14.21 | Subsystem: | |
Regression: | No | Bisected commit-id: |
Description
Markus Königshaus
2014-10-17 13:14:23 UTC
The mballoc_debug is only debugging the block allocator, which I suspect is not the source of the slow down. It might be better to use blktrace so we can see what sort of I/O requests are being sent when mysql is shut down. Does the performance go back to an acceptable level some time after the mysql daemon is shut down? It might be that mysql is simply writing out a lot of things at shutdown, and this is causing the eMMC device to be slowed down due to competing I/O traffic sent to it. Another tool that is useful for monitoring write performance is to use "iostat -x 1". This will show you how much I/O is being done by the whole system, as well as seeing if there is any backlog on the I/O request queues. Your using dd to measure write performance is much like complaining that a single car that is trying to get from point A to point B during rush hour is taking a long time. That does not necessarily mean that the highway is defective! It could be that there is simply a lot of competing traffic (i.e., other cars, aka other write requests from other processes on the system) which is slowing down the traffic on the system. Hi Theodore, even after a very long wait, the performance remain so low. The performance is also low when you copy files. I have used dd, so I create only reproducible write access. I have now tested the ext3 FS, same result. After some debugging I found out, there is a delay in start_this_handle (look at the timing), with echo 5 > /sys/kernel/debug/jbd/jbd-debug I get: >> before mysqlshutdown: ... [ 40.400755] fs/jbd/transaction.c: (journal_stop, 1380): h_ref 2 -> 1 [ 40.400755] [ 40.400765] fs/jbd/transaction.c: (journal_stop, 1384): Handle df03e000 going down [ 40.400765] [ 40.400789] fs/jbd/transaction.c: (journal_start, 290): from journal_start [ 40.400789] [ 40.400801] fs/jbd/transaction.c: (start_this_handle, 110): New handle df03e000 going live. [ 40.400801] [ 40.400813] fs/jbd/transaction.c: (start_this_handle, 229): Handle df03e000 given 14 credits (total 21, free 14286) ... >> after mysqlshutdown: ... [ 61.383886] fs/jbd/transaction.c: (journal_stop, 1380): h_ref 2 -> 1 [ 61.383886] [ 61.383896] fs/jbd/transaction.c: (journal_stop, 1384): Handle df03e000 going down [ 61.383896] [ 61.393528] fs/jbd/transaction.c: (start_this_handle, 110): New handle df03e000 going live. [ 61.393528] [ 61.393541] fs/jbd/transaction.c: (start_this_handle, 229): Handle df03e000 given 14 credits (total 21, free 14271) ... I try to isolate the problem in more detail next week, I would welcome tips and help for my try. Regards, Markus It looks like a problem in the writeback - mechanism, not ext4 related, sorry: cd /sys/kernel/debug/tracing/ echo function > current_tracer echo writeback* > set_ftrace_filter >> before mysqlshutdown # tracer: function # # entries-in-buffer/entries-written: 3/3 #P:1 # # _-----=> irqs-off # / _----=> need-resched # | / _---=> hardirq/softirq # || / _--=> preempt-depth # ||| / delay # TASK-PID CPU# |||| TIMESTAMP FUNCTION # | | | |||| | | dd-3666 [000] ...1 1338.484467: writeback_in_progress <-balance_dirty_pages_ratelimited kworker/u2:2-1001 [000] ...2 1338.557118: writeback_sb_inodes <-__writeback_inodes_wb kworker/u2:2-1001 [000] ...2 1338.557160: writeback_sb_inodes <-__writeback_inodes_wb >> after mysqlshutdown: # tracer: function # # entries-in-buffer/entries-written: 384/384 #P:1 # # _-----=> irqs-off # / _----=> need-resched # | / _---=> hardirq/softirq # || / _--=> preempt-depth # ||| / delay # TASK-PID CPU# |||| TIMESTAMP FUNCTION # | | | |||| | | dd-3733 [000] ...1 1403.061232: writeback_in_progress <-balance_dirty_pages_ratelimited kworker/u2:2-1001 [000] ...2 1403.061277: writeback_sb_inodes <-__writeback_inodes_wb dd-3733 [000] ...1 1403.071042: writeback_in_progress <-balance_dirty_pages_ratelimited dd-3733 [000] ...1 1403.071328: writeback_in_progress <-balance_dirty_pages_ratelimited kworker/u2:2-1001 [000] ...2 1403.071350: writeback_sb_inodes <-__writeback_inodes_wb dd-3733 [000] ...1 1403.077546: writeback_in_progress <-balance_dirty_pages_ratelimited [ ... repeat 368 times ... ] dd-3733 [000] ...1 1404.327043: writeback_in_progress <-balance_dirty_pages_ratelimited dd-3733 [000] ...1 1404.327306: writeback_in_progress <-balance_dirty_pages_ratelimited kworker/u2:2-1001 [000] ...2 1404.327336: writeback_sb_inodes <-__writeback_inodes_wb dd-3733 [000] ...1 1404.337125: writeback_in_progress <-balance_dirty_pages_ratelimited someone an idea, where to report this problem? Regards, Markus Are you sure this is actually a problem? Could it just be that when mysqld shuts down, it's issuing a huge number of buffered writes which is just taking a long, long time to get written out to your eMMC device (which is typically very, *very* slow to handle writes). I had suggested using blktrace to see if I/O is going on. You could also try looking using iostat or the dstat program. If it's just that I/O is still happening, then things may be working normally --- just that eMMC might be a bad match with a high performance database workload. Hello Theodore, I really wait a long time and I have the kernel dropped all Caches with /proc/sys/vm/drop_caches, so I do not thing, there are left buffered writes from mysqld. I can reproduce that behavior, before mysql shutdown I get 14 MB/s, after I get 400 kB/s with dd. I will try to compile blktrace for our system later, not enough time for that now. Can blktrace show me how often the writeback - mechanism writes the cache back to e.mmc - card? Markus
I think I have found the error. The kernel calls apparently for the same page dec_zone_page_state on (page, NR_FILE_DIRTY) twice, which means that the dirty page counter underflows (is getting negative):
cat /proc/meminfo
MemTotal: 511492 kB
MemFree: 406972 kB
MemAvailable: 450536 kB
Buffers: 1840 kB
Cached: 52140 kB
SwapCached: 0 kB
Active: 54348 kB
Inactive: 34456 kB
Active(anon): 34904 kB
Inactive(anon): 5440 kB
Active(file): 19444 kB
Inactive(file): 29016 kB
Unevictable: 0 kB
Mlocked: 0 kB
HighTotal: 0 kB
HighFree: 0 kB
LowTotal: 511492 kB
LowFree: 406972 kB
SwapTotal: 0 kB
SwapFree: 0 kB
>> Dirty: 4294967148 kB
Writeback: 0 kB
AnonPages: 34844 kB
Mapped: 13992 kB
Shmem: 5520 kB
Slab: 9196 kB
SReclaimable: 4976 kB
SUnreclaim: 4220 kB
KernelStack: 696 kB
PageTables: 1024 kB
NFS_Unstable: 0 kB
Bounce: 0 kB
WritebackTmp: 0 kB
CommitLimit: 255744 kB
Committed_AS: 365452 kB
VmallocTotal: 499712 kB
VmallocUsed: 3512 kB
VmallocChunk: 495964 kB
The writeback - mechanism thinks that too much pages are dirty and disables the writeback - cache.
Who can I send this bug?
Markus
Addition: I have identified the code that generates the underflow: cancel_dirty_page in mm/truncate.c. To reproduce the error, you can perform the following change: [...] if (mapping && mapping_cap_account_dirty (mapping)) { ++WARN_ON ((int) global_page_state(NR_FILE_DIRTY) <0); dec_zone_page_state (page, NR_FILE_DIRTY); [...] Test result: [ 21.427121] ------------[ cut here ]------------ [ 21.431769] WARNING: CPU: 0 PID: 309 at mm/truncate.c:78 cancel_dirty_page+0x144/0x1cc() [ 21.439899] Modules linked in: wutbox_cp sata_mv [ 21.444572] CPU: 0 PID: 309 Comm: kworker/0:1 Not tainted 3.14.21-WuT #77 [ 21.451395] Workqueue: events free_ioctx [ 21.455362] [<c0016a64>] (unwind_backtrace) from [<c0012f88>] (show_stack+0x20/0x24) [ 21.463164] [<c0012f88>] (show_stack) from [<c03f8bec>] (dump_stack+0x24/0x28) [ 21.470441] [<c03f8bec>] (dump_stack) from [<c0023ae4>] (warn_slowpath_common+0x84/0x9c) [ 21.478577] [<c0023ae4>] (warn_slowpath_common) from [<c0023bb8>] (warn_slowpath_null+0x2c/0x34) [ 21.487402] [<c0023bb8>] (warn_slowpath_null) from [<c00c05f0>] (cancel_dirty_page+0x144/0x1cc) [ 21.496125] [<c00c05f0>] (cancel_dirty_page) from [<c00c073c>] (truncate_inode_page+0x8c/0x158) [ 21.504874] [<c00c073c>] (truncate_inode_page) from [<c00c09c4>] (truncate_inode_pages_range+0x11c/0x53c) [ 21.514493] [<c00c09c4>] (truncate_inode_pages_range) from [<c00c0e9c>] (truncate_pagecache+0x88/0xac) [ 21.523855] [<c00c0e9c>] (truncate_pagecache) from [<c00c0f1c>] (truncate_setsize+0x5c/0x74) [ 21.532332] [<c00c0f1c>] (truncate_setsize) from [<c013b2d8>] (put_aio_ring_file.isra.14+0x34/0x90) [ 21.541428] [<c013b2d8>] (put_aio_ring_file.isra.14) from [<c013b354>] (aio_free_ring+0x20/0xcc) [ 21.550262] [<c013b354>] (aio_free_ring) from [<c013b424>] (free_ioctx+0x24/0x44) [ 21.557791] [<c013b424>] (free_ioctx) from [<c003d8d8>] (process_one_work+0x134/0x47c) [ 21.565732] [<c003d8d8>] (process_one_work) from [<c003e988>] (worker_thread+0x130/0x414) [ 21.575225] [<c003e988>] (worker_thread) from [<c00448ac>] (kthread+0xd4/0xec) [ 21.590695] [<c00448ac>] (kthread) from [<c000ec18>] (ret_from_fork+0x14/0x20) [ 21.600721] ---[ end trace a9d8eecdd373e997 ]--- The error only occurs when the Kernel Asynchronous I/O is used. Without AIO I get no error. Markus Moved to Memory Management. Hi, I am exeriencing exactly the same bug with selfbuilt kernel 3.17.2 on an arm machine. I fear it's killing my SSD. What does it mean the bug is "solved"? Where is the patch, which version of the kernel source contains the solution? Found it: Was moved to here, correct? https://bugzilla.kernel.org/show_bug.cgi?id=86831 Yes, correct. Am 13.11.2014 um 15:33 schrieb bugzilla-daemon@bugzilla.kernel.org: > https://bugzilla.kernel.org/show_bug.cgi?id=86471 > > --- Comment #10 from JPT <j-p-t@gmx.net> --- > Found it: Was moved to here, correct? > https://bugzilla.kernel.org/show_bug.cgi?id=86831 > -- Unsere Aussagen koennen Irrtuemer und Missverstaendnisse enthalten. Bitte pruefen Sie die Aussagen fuer Ihren Fall, bevor Sie Entscheidungen auf Grundlage dieser Aussagen treffen. Wiesemann & Theis GmbH, Porschestr. 12, D-42279 Wuppertal Geschaeftsfuehrer: Dipl.-Ing. Ruediger Theis Registergericht: Amtsgericht Wuppertal, HRB 6377 Tel. +49-202/2680-0, Fax +49-202/2680-265, http://www.wut.de |