Bug 68061

Summary: raid checking sometimes locks up in 3.13, but not 3.12
Product: IO/Storage Reporter: Bruno Wolff III (bruno)
Component: MDAssignee: io_md
Status: RESOLVED CODE_FIX    
Severity: normal CC: neilb
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 3.13 Subsystem:
Regression: Yes Bisected commit-id:
Attachments: dmesg output
/proc/mdstat for 3.13.0-0.rc7.git0.1.fc21.i686+PAE
/proc/mdstat from another machine
dmesg output for work machine (3.13.0-0.rc7.git0.2.fc21.x86_64)
Stack output from md*_raid1 and md*_resync together

Description Bruno Wolff III 2014-01-02 15:09:51 UTC
I run nightly raid checking on some raid 1 mirrors (md raid) and when running 3.13 kernels it sometimes locks up. I am not sure of the exact trigger.
Here is some sample output:
Personalities : [raid1] 
md14 : active raid1 sda4[1] sdb4[2]
      217148624 blocks super 1.2 [2/2] [UU]
      [=>...................]  check =  9.9% (21538176/217148624) finish=109.0min speed=29906K/sec
      
md15 : active raid1 sdd1[0] sdc1[1]
      35564360 blocks super 1.2 [2/2] [UU]
      [>....................]  check =  0.0% (12352/35564360) finish=42063.5min speed=14K/sec
      
md13 : active raid1 sda3[1] sdb3[2]
      83884984 blocks super 1.2 [2/2] [UU]
      	resync=DELAYED
      
md12 : active raid1 sdb2[2] sda2[1]
      10484664 blocks super 1.2 [2/2] [UU]
      	resync=DELAYED
      
md11 : active raid1 sda1[1] sdb1[2]
      1048564 blocks super 1.0 [2/2] [UU]
      
unused devices: <none>

You can see that one of the two checks has locked up, but the other is continuing.
Comment 1 Bruno Wolff III 2014-01-02 18:03:15 UTC
I checked 3.13.0-0.rc6.git0.2.fc21.i686+PAE and I am still seeing the issue.
Comment 2 Neil Brown 2014-01-06 01:44:43 UTC
My guess is that this is a problem with the drive controller rather than with md/raid.

Can you attach complete kernel logs please?
What is the most recent kernel on which this has worked?
Comment 3 Bruno Wolff III 2014-01-06 02:09:20 UTC
The most recent kernel that hasn't exhibited the problem is kernel-PAE-3.12.6-300.fc20.i686.

I am testing kernel-PAE-3.13.0-0.rc7.git0.1.fc21.i686 right now. If that fails I'll get the dmesg output from this boot.
Comment 4 Bruno Wolff III 2014-01-06 02:13:41 UTC
I want to note that not every device check results in the problem. Often some succeed before one fails, but not always on the same device. When it gets stuck, it seems to happen pretty near the start of the process for a device. On the machine I have been seeing them, I haven't noticed the scsi disks having the problem. So far it has always been the pata disks. But I haven't had the problem happen so many times, that I'd say it very likely can't happen on the scsi disks. But that still seems in line with a possible driver problem.
Comment 5 Bruno Wolff III 2014-01-06 05:40:35 UTC
I just just managed to get two pairs of disks to show the problem at the same time with rc7. sdc and sdd are scsi disks and this is the first time I saw them have a raid check lock up. In this case they locked up first. A check was running on md14 when they locked up, that completed. Then the md13 check locked up.
As far as I can tell the raid arrays are working correctly except for the locked up checks.
Comment 6 Bruno Wolff III 2014-01-06 05:43:21 UTC
Created attachment 121071 [details]
dmesg output
Comment 7 Bruno Wolff III 2014-01-06 05:44:54 UTC
Created attachment 121081 [details]
/proc/mdstat for 3.13.0-0.rc7.git0.1.fc21.i686+PAE
Comment 8 Bruno Wolff III 2014-01-07 16:28:17 UTC
Created attachment 121201 [details]
/proc/mdstat from another machine

Now that I had a chance to go into work, I am also able to replicate the problem on my work machine.
Comment 9 Bruno Wolff III 2014-01-07 16:30:28 UTC
Created attachment 121211 [details]
dmesg output for work machine (3.13.0-0.rc7.git0.2.fc21.x86_64)
Comment 10 Bruno Wolff III 2014-01-09 20:27:28 UTC
I tested this on a vanilla 3.13-rc1 and reproduced the problem.
I'm going to start a bisect, but it will probably go slowly.
Comment 11 Bruno Wolff III 2014-01-11 22:31:22 UTC
I just hit a build that didn't boot for other reasons and that is going to slow things down more, but to narrow things down a bit I'm going to note the bisect log.
git bisect log
git bisect start
# bad: [6ce4eac1f600b34f2f7f58f9cd8f0503d79e42ae] Linux 3.13-rc1
git bisect bad 6ce4eac1f600b34f2f7f58f9cd8f0503d79e42ae
# good: [5e01dc7b26d9f24f39abace5da98ccbd6a5ceb52] Linux 3.12
git bisect good 5e01dc7b26d9f24f39abace5da98ccbd6a5ceb52
# good: [5cbb3d216e2041700231bcfc383ee5f8b7fc8b74] Merge branch 'akpm' (patches from Andrew Morton)
git bisect good 5cbb3d216e2041700231bcfc383ee5f8b7fc8b74
# good: [e1f56c89b040134add93f686931cc266541d239a] mm: convert mm->nr_ptes to atomic_long_t
git bisect good e1f56c89b040134add93f686931cc266541d239a
Comment 12 Neil Brown 2014-01-12 23:03:05 UTC
Thanks for persisting with this. It does look like is an md problem after all.
My guess would be commit 79ef3a8aa1cb1523cc231c9a90a278333c21f761

I've had another look at the code but cannot see the problem yet.

Next time you get a hang could you please collect a kernel stack trace for any process stuck in a 'D' wait, as well as the the mdXX_raid1 and mdXX_resync processes.
You can get these with
  cat /proc/$PID/stack
providing CONFIG_STACKTRACE is set.
Comment 13 Bruno Wolff III 2014-01-13 03:42:24 UTC
I'll make sure CONFIG_STACKTRACE is set in my config files. I am basing them off a Fedora config file, but don't know if that is set or not.
I'll get 79ef3a8aa1cb1523cc231c9a90a278333c21f761 tested tomorrow. I should be able to try that commit and at least one parent before I leave work. My home machine is having some other problems with the test commits. It looks Nouveau related and that shouldn't affect my work machine.
Comment 14 Bruno Wolff III 2014-01-13 04:14:38 UTC
I managed to lock up my system while I was trying to get the stack dumps the first time.
It looks like the problem with not be able to boot test kernels is related to some Fedora change (probably dracut), not with landing near a bad commit when doing the tests. That might give me problems testing tomorrow at work. But I should be able to figure out likely candidates for messing up the initramfs generation. And if it just Nouveau and not any old video driver, I'll be OK as well.
I hope to have the stack traces shortly (using a kernel/initramfs from a few days ago).
Comment 15 Bruno Wolff III 2014-01-13 06:40:41 UTC
After I hung the machine and rebooted tonight I am having trouble getting things to hang again to get the stack. I'll try again at work tomorrow as I am too tired to stay up any longer tonight.
Comment 16 Bruno Wolff III 2014-01-13 12:02:10 UTC
This is the stack trace from an md_resync that was stuck:
[root@wolff bruno]# cat /proc/27900/stack
[<f7f0a316>] sync_request+0x446/0xb50 [raid1]
[<c086b4f3>] md_do_sync+0xa13/0xfe0
[<c0867fa8>] md_thread+0xe8/0x100
[<c0472fd1>] kthread+0xa1/0xc0
[<c09caeb7>] ret_from_kernel_thread+0x1b/0x28
[<ffffffff>] 0xffffffff
ps uaxww|grep ' D '
root     27897  2.0  0.0      0     0 ?        D    05:55   0:02 [md13_resync]
root     27900  0.0  0.0      0     0 ?        D    05:55   0:00 [md15_resync]
root     27981  0.0  0.0   4968   844 pts/0    S+   05:57   0:00 grep --color=auto  D 
This is the stack trace from an md_resync that was in a D state, but progressing:
[<f7f0a316>] sync_request+0x446/0xb50 [raid1]
[<c086b4f3>] md_do_sync+0xa13/0xfe0
[<c0867fa8>] md_thread+0xe8/0x100
[<c0472fd1>] kthread+0xa1/0xc0
[<c09caeb7>] ret_from_kernel_thread+0x1b/0x28
[<ffffffff>] 0xffffffff
Comment 17 Bruno Wolff III 2014-01-13 16:00:29 UTC
I verified that 79ef3a8aa1cb1523cc231c9a90a278333c21f761 exhibits the problem.
I have another stack dump from an x86_64 machine (the above comment is for i686).

root      2216  0.0  0.0      0     0 ?        D    08:22   0:00 [md14_resync]
cat /proc/2216/stack
[<ffffffffa021f101>] sync_request+0x211/0xaa0 [raid1]
[<ffffffff814f2195>] md_do_sync+0x955/0xd60
[<ffffffff814eed38>] md_thread+0x118/0x130
[<ffffffff8108dda0>] kthread+0xc0/0xd0
[<ffffffff8167ae3c>] ret_from_fork+0x7c/0xb0
[<ffffffffffffffff>] 0xffffffffffffffff
Comment 18 Neil Brown 2014-01-13 23:45:33 UTC
Thanks.
I was really hoping to see the stack trace of md14_raid1 too.  I suspect there is almost nothing to it, but I'd like to know for certain.
Comment 19 Neil Brown 2014-01-14 00:23:16 UTC
I have a theory.
This code looks wrong, and could conceivably cause the symptom you are seeing.

If you can test it and let me know I'd really appreciate it.

I'll probably submit the patch anyway because the code definitely looks wrong.

(We move the 'sector = conf->start_next_window' assignment into the previous if block (where *_window_requests is incremented) and make it unconditional in there)

Thanks.

diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
index 1e5a540995e9..a49cfcc7a343 100644
--- a/drivers/md/raid1.c
+++ b/drivers/md/raid1.c
@@ -924,9 +924,8 @@ static sector_t wait_barrier(struct r1conf *conf, struct bio *bio)
 				conf->next_window_requests++;
 			else
 				conf->current_window_requests++;
-		}
-		if (bio->bi_sector >= conf->start_next_window)
 			sector = conf->start_next_window;
+		}
 	}
 
 	conf->nr_pending++;
Comment 20 Bruno Wolff III 2014-01-14 05:51:19 UTC
Created attachment 121941 [details]
Stack output from md*_raid1 and md*_resync together
Comment 21 Bruno Wolff III 2014-01-14 05:52:12 UTC
I'll try out the patch tomorrow.
Comment 22 Neil Brown 2014-01-14 06:12:54 UTC
Thanks for the extra stack trace.  It confirms what I suspected, which is often quite helpful!
Comment 23 Bruno Wolff III 2014-01-15 05:53:24 UTC
I have been having problems setting up testing. I have one machine failing to boot (from an unrelated problem) new kernels and on the other there is a problem building modules. I have some things to try, but trying things takes me a long time and I didn't get anything I could test this fix on today.
Comment 24 Bruno Wolff III 2014-01-16 03:24:17 UTC
I have have started testing this fix using the 3.13.0-0.rc8.git2.1.fc21 kernel on i686 and x86_64. So far no problems, but I'll feel better waiting until tomorrow to close this ticket.
Comment 25 Bruno Wolff III 2014-01-16 17:33:11 UTC
I am pretty confident that I would have seen the problem appear by now if it wasn't fixed, so i am going to close this bug.
Thank you for fixing things.
Comment 26 Neil Brown 2014-01-16 21:41:04 UTC
And thank you for persisting with testing.
The fix is already in Linus' tree and will flow to relevant -stable releases.