Bug 151801 - syslog spammed with delaying requested-resync messages
Summary: syslog spammed with delaying requested-resync messages
Status: RESOLVED CODE_FIX
Alias: None
Product: IO/Storage
Classification: Unclassified
Component: MD (show other bugs)
Hardware: All Linux
: P1 enhancement
Assignee: io_md
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2016-08-09 13:15 UTC by Sven Köhler
Modified: 2016-10-19 03:52 UTC (History)
2 users (show)

See Also:
Kernel Version: 4.4.10
Subsystem:
Regression: No
Bisected commit-id:


Attachments

Description Sven Köhler 2016-08-09 13:15:37 UTC
Monthly, I check my RAID drives for consistency. The check is performed over days with a speed of 3MB/sec. During this period, I have tons of messages repeated over and over again in the kernel log. Messages such as

"delaying requested-resync of md8 until md7 has finished (they share one or more physical units)"

are repeated over and over again for the same device pair (md8 and md7 in the example above). There doesn't seem to be any point in doing so. The information on which resync is blocked by which other resync could also be provided in /proc/mdstat. It would also suffice to issue each message only once for each device pair (md8 is blocked by md6 => 1 message, md8 blocked by md7 => another message, etc.).
Comment 1 Neil Brown 2016-10-06 03:16:32 UTC
Can you be more specific about "tons"??  Do the messages appear every second? every minute? every hour ??

What do you use you initiate the check? The /usr/share/mdadm/checkarray script that Debian provides, or mdcheck that comes with upstream mdadm or something else?

I agree that there is no need for so many messages and something should be done, but I need to be sure that I understand exactly what is happening before suggesting a solution.
Comment 2 Walter Haidinger 2016-10-07 06:00:12 UTC
Well, I'm not the bug creater, but I can confirm the "annoyance", hence added myself on cc.

I'm triggering the checks using a custom script that writes to md/sync_action as documented in md(4). The logs are apparently triggered due to a) overlapping physical disks and b) parallel requests to the check, i.e. simply
 for md in /sys/block/md*/md/sync_action ; do echo check > $md ; done

Usually there're a are short bursts of logs (a couple of logs each second followed by delays). The total number obviously depends on the check duration but just today from 00:00 to 06:00, I did count 11016 logs entries. So on average about one every other second. This quite floods dmesg output...

To give you an idea some sample logs from the current run, note the timestamps:
[Fri Oct  7 05:58:46 2016] md: delaying data-check of md4 until md3 has finished (they share one or more physical units)
[Fri Oct  7 05:58:46 2016] md: delaying data-check of md4 until md3...
[Fri Oct  7 05:58:48 2016] md: delaying data-check of md4 until md3...
[Fri Oct  7 05:58:49 2016] md: delaying data-check of md4 until md3...
[Fri Oct  7 05:58:52 2016] md: delaying data-check of md4 until md3...
[Fri Oct  7 05:58:53 2016] md: delaying data-check of md4 until md3...
[Fri Oct  7 05:58:53 2016] md: delaying data-check of md4 until md3...
[Fri Oct  7 05:58:53 2016] md: delaying data-check of md4 until md3...
[Fri Oct  7 05:58:55 2016] md: delaying data-check of md4 until md3...
[Fri Oct  7 05:58:55 2016] md: delaying data-check of md4 until md3...
[Fri Oct  7 05:58:58 2016] md: delaying data-check of md4 until md3...
[Fri Oct  7 05:58:58 2016] md: delaying data-check of md4 until md3...
[Fri Oct  7 05:59:03 2016] md: delaying data-check of md4 until md3...
[Fri Oct  7 05:59:03 2016] md: delaying data-check of md4 until md3...
[Fri Oct  7 05:59:03 2016] md: delaying data-check of md4 until md3...
Comment 3 Sven Köhler 2016-10-07 11:21:39 UTC
I also have a custom script that writes to the sync_action file in sysfs. I does so for all md devices at once. Then the kernel performs the resync of the md devices in an order chosen by the kernel.

The resync speed has been limited to a few megabytes per second, and thus the resync actually runs over a couple of days. I see the same messages as Walter. Walter describes the issue pretty well.

Here's some messages from my log:
Oct  2 03:10:25 xen-neu kernel: md: delaying requested-resync of md4 until md3 has finished (they share one or more physical units)
Oct  2 03:10:25 xen-neu kernel: md: delaying requested-resync of md5 until md3 has finished (they share one or more physical units)
Oct  2 03:10:25 xen-neu kernel: md: delaying requested-resync of md6 until md3 has finished (they share one or more physical units)
Oct  2 03:10:25 xen-neu kernel: md: delaying requested-resync of md7 until md3 has finished (they share one or more physical units)
Oct  2 03:10:25 xen-neu kernel: md: delaying requested-resync of md8 until md3 has finished (they share one or more physical units)
Oct  2 03:10:25 xen-neu kernel: md: delaying requested-resync of md6 until md3 has finished (they share one or more physical units)
Oct  2 03:10:25 xen-neu kernel: md: delaying requested-resync of md5 until md3 has finished (they share one or more physical units)
Oct  2 03:10:25 xen-neu kernel: md: delaying requested-resync of md8 until md3 has finished (they share one or more physical units)
Oct  2 03:10:25 xen-neu kernel: md: delaying requested-resync of md7 until md3 has finished (they share one or more physical units)
Oct  2 03:10:25 xen-neu kernel: md: delaying requested-resync of md4 until md3 has finished (they share one or more physical units)
Oct  2 03:10:25 xen-neu kernel: md: delaying requested-resync of md7 until md3 has finished (they share one or more physical units)
Oct  2 03:10:25 xen-neu kernel: md: delaying requested-resync of md4 until md3 has finished (they share one or more physical units)
Oct  2 03:10:25 xen-neu kernel: md: delaying requested-resync of md5 until md3 has finished (they share one or more physical units)
Oct  2 03:10:25 xen-neu kernel: md: delaying requested-resync of md8 until md3 has finished (they share one or more physical units)
Oct  2 03:10:25 xen-neu kernel: md: delaying requested-resync of md6 until md3 has finished (they share one or more physical units)
Oct  2 03:10:26 xen-neu kernel: md: delaying requested-resync of md6 until md3 has finished (they share one or more physical units)
Oct  2 03:10:26 xen-neu kernel: md: delaying requested-resync of md5 until md3 has finished (they share one or more physical units)
Oct  2 03:10:26 xen-neu kernel: md: delaying requested-resync of md8 until md3 has finished (they share one or more physical units)
Oct  2 03:10:26 xen-neu kernel: md: delaying requested-resync of md7 until md3 has finished (they share one or more physical units)
Oct  2 03:10:26 xen-neu kernel: md: delaying requested-resync of md4 until md3 has finished (they share one or more physical units)
Oct  2 03:10:27 xen-neu kernel: md: delaying requested-resync of md4 until md3 has finished (they share one or more physical units)
Oct  2 03:10:27 xen-neu kernel: md: delaying requested-resync of md5 until md3 has finished (they share one or more physical units)
Oct  2 03:10:27 xen-neu kernel: md: delaying requested-resync of md8 until md3 has finished (they share one or more physical units)
Oct  2 03:10:27 xen-neu kernel: md: delaying requested-resync of md7 until md3 has finished (they share one or more physical units)
Oct  2 03:10:27 xen-neu kernel: md: delaying requested-resync of md6 until md3 has finished (they share one or more physical units)
Oct  2 03:10:27 xen-neu kernel: md: delaying requested-resync of md6 until md3 has finished (they share one or more physical units)
Oct  2 03:10:27 xen-neu kernel: md: delaying requested-resync of md7 until md3 has finished (they share one or more physical units)
Oct  2 03:10:27 xen-neu kernel: md: delaying requested-resync of md8 until md3 has finished (they share one or more physical units)
Oct  2 03:10:27 xen-neu kernel: md: delaying requested-resync of md5 until md3 has finished (they share one or more physical units)
Oct  2 03:10:27 xen-neu kernel: md: delaying requested-resync of md4 until md3 has finished (they share one or more physical units)


As a programmer I'd suggest the following solution:
For each md device X remember which other md device Y delays the resync action. Write the message only if Y changes.
Comment 4 Neil Brown 2016-10-19 03:52:22 UTC
Hmmm... turns out that was fixed back in August, with a reference to this bug, but no-one bothered to update the bug.

Commit: c622ca543bff ("md: don't print the same repeated messages about delayed sync operation")

Note You need to log in before you can comment on or make changes to this bug.