Bug 196383

Summary: Regression in md/raid1 with write behind
Product: IO/Storage Reporter: Markus (m4rkusxxl)
Component: MDAssignee: io_md
Status: RESOLVED CODE_FIX    
Severity: normal CC: shli
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 4.12.2 Subsystem:
Regression: Yes Bisected commit-id:
Attachments: debug patch
patch for the bug
patch for the bug

Description Markus 2017-07-15 21:51:04 UTC
Updated from 4.11.9 to 4.12.0 (newest was 4.12.2).

At boot time the system stops with (I skipped the ... if anything is needed I can hand it later.):
invalid opcode: 0000 [#1] PREEMPT SMP
Modules linked in: ...
CPU: 5 PID: 849 Comm: md11_raid1 Not tainted 4.12.2 #3
Hardware name: ...
task: ... task.stack: ...
RIP: ...:ata_qc_issue+.../...
Call Trace:
? ata_scsi_verify_xlat+...
ata_scsi_translate+...
ata_scsi_queuecmd+...
scsi_dispatch_cmd+...
scsi_request_fn+...
? cfq_rb_first+...
__blk_run_queue+...
cfg_insert_request+...
blk_flush_plug_list+...
blk_finish_plug+...
raid1d+...
? schedule_timeout+...
md_thread+...
? md_tzhread+...
? wake_atomic_t_function+...
kthread+...
? find_pers+...
? kthread_create_on_node+...
ret_from_fork+...
Code: ...
RIP: ata_qc_issue+... RSP: ...
---[ end trace ... ]---
note: md11_raid1[849] exited with preempt_count 1

Doing a bisect results in 841c1316c7da6199a7df473893c141943991a756 being the first bad commit:

Author: Ming Lei <tom.leiming@gmail.com>
Date:   Fri Mar 17 00:12:31 2017 +0800

    md: raid1: improve write behind
    
    This patch improve handling of write behind in the following ways:
    
    - introduce behind master bio to hold all write behind pages
    - fast clone bios from behind master bio
    - avoid to change bvec table directly
    - use bio_copy_data() and make code more clean
    
    Suggested-by: Shaohua Li <shli@fb.com>
    Signed-off-by: Ming Lei <tom.leiming@gmail.com>
    Signed-off-by: Shaohua Li <shli@fb.com>


Actually md11 is my root and a raid1 made from hdds and one ssd. Including write behind.
Comment 1 Shaohua Li 2017-07-17 18:46:37 UTC
Created attachment 257573 [details]
debug patch

can you please check if the patch works
Comment 2 Shaohua Li 2017-07-17 22:00:46 UTC
Created attachment 257575 [details]
patch for the bug

alright, this is the patch I'll push to upstream, could you please check?
Comment 3 Markus 2017-07-19 14:55:31 UTC
I just tested your "patch for the bug".
Unfortunatelly still fails.

Have you tried it? My raid is a raid1 over three hdds (one is spare) and one ssd. With write behind enabled.
Comment 4 Shaohua Li 2017-07-19 16:21:11 UTC
I did have a test in my virtual machine and it fixes one bug, maybe not exactly the one you saw. do you have kernel log with the patch applied?
Comment 5 Markus 2017-07-20 15:48:33 UTC
Yes. The error is the same. (Havent compared the numbers though.) I applied it on top of 4.12.2.

Is there a way to save that error? Making a photo of the screen feels wrong. And compare by hand is error prone.
Comment 6 Shaohua Li 2017-07-20 17:07:19 UTC
I have tried to emulate the setup you have in a virtual machine. 3 ahci harddisks and one fast nvme. Without the patch, I did see a crash and the patch fix it, though the crash I saw is in difference place. 

I think photo is the only way if you don't have a serial console. Can you give more details of your raid setup? (with mdadm -Q --detail /dev/md0). Is there any io error in the disks?
Comment 7 Markus 2017-07-20 17:41:25 UTC
I havent seen any io errors.

# mdadm -Q --detail /dev/md11
/dev/md11:
        Version : 0.90
  Creation Time : xxx
     Raid Level : raid1
     Array Size : 27262912 (26.00 GiB 27.92 GB)
  Used Dev Size : 27262912 (26.00 GiB 27.92 GB)
   Raid Devices : 3
  Total Devices : 4
Preferred Minor : 11
    Persistence : Superblock is persistent

  Intent Bitmap : Internal

    Update Time : Thu Jul 20 19:33:38 2017
          State : clean 
 Active Devices : 3
Working Devices : 4
 Failed Devices : 0
  Spare Devices : 1

           UUID : xxx
         Events : 0.1555

    Number   Major   Minor   RaidDevice State
       0       8       21        0      active sync writemostly   /dev/sdb5
       1       8       53        1      active sync writemostly   /dev/sdd5
       2       8        5        2      active sync   /dev/sda5

       3       8       37        -      spare   /dev/sdc5
# for i in /dev/sd{a,b,d}5 ; do mdadm -X $i ; done
        Filename : /dev/sda5
           Magic : 6d746962
         Version : 4
            UUID : xxx
          Events : 1555
  Events Cleared : 1555
           State : OK
       Chunksize : 64 MB
          Daemon : 5s flush period
      Write Mode : Allow write behind, max 256
       Sync Size : 27262912 (26.00 GiB 27.92 GB)
          Bitmap : 416 bits (chunks), 0 dirty (0.0%)
        Filename : /dev/sdb5
           Magic : 6d746962
         Version : 4
            UUID : xxx
          Events : 1555
  Events Cleared : 1555
           State : OK
       Chunksize : 64 MB
          Daemon : 5s flush period
      Write Mode : Allow write behind, max 256
       Sync Size : 27262912 (26.00 GiB 27.92 GB)
          Bitmap : 416 bits (chunks), 0 dirty (0.0%)
        Filename : /dev/sdd5
           Magic : 6d746962
         Version : 4
            UUID : xxx
          Events : 1555
  Events Cleared : 1555
           State : OK
       Chunksize : 64 MB
          Daemon : 5s flush period
      Write Mode : Allow write behind, max 256
       Sync Size : 27262912 (26.00 GiB 27.92 GB)
          Bitmap : 416 bits (chunks), 0 dirty (0.0%)
Comment 8 Shaohua Li 2017-07-20 19:53:22 UTC
Created attachment 257623 [details]
patch for the bug

alright, I got it. it's the trim request. This one should fix all the issues.
Comment 9 Markus 2017-07-21 18:03:32 UTC
This patch worked! The system boots without an error.
Thanks!
Comment 10 Markus 2017-08-01 16:40:37 UTC
Will you commit that fix?
Comment 11 Shaohua Li 2017-08-01 17:00:09 UTC
yep, already in upstream, please close this bug