Updated from 4.11.9 to 4.12.0 (newest was 4.12.2). At boot time the system stops with (I skipped the ... if anything is needed I can hand it later.): invalid opcode: 0000 [#1] PREEMPT SMP Modules linked in: ... CPU: 5 PID: 849 Comm: md11_raid1 Not tainted 4.12.2 #3 Hardware name: ... task: ... task.stack: ... RIP: ...:ata_qc_issue+.../... Call Trace: ? ata_scsi_verify_xlat+... ata_scsi_translate+... ata_scsi_queuecmd+... scsi_dispatch_cmd+... scsi_request_fn+... ? cfq_rb_first+... __blk_run_queue+... cfg_insert_request+... blk_flush_plug_list+... blk_finish_plug+... raid1d+... ? schedule_timeout+... md_thread+... ? md_tzhread+... ? wake_atomic_t_function+... kthread+... ? find_pers+... ? kthread_create_on_node+... ret_from_fork+... Code: ... RIP: ata_qc_issue+... RSP: ... ---[ end trace ... ]--- note: md11_raid1[849] exited with preempt_count 1 Doing a bisect results in 841c1316c7da6199a7df473893c141943991a756 being the first bad commit: Author: Ming Lei <tom.leiming@gmail.com> Date: Fri Mar 17 00:12:31 2017 +0800 md: raid1: improve write behind This patch improve handling of write behind in the following ways: - introduce behind master bio to hold all write behind pages - fast clone bios from behind master bio - avoid to change bvec table directly - use bio_copy_data() and make code more clean Suggested-by: Shaohua Li <shli@fb.com> Signed-off-by: Ming Lei <tom.leiming@gmail.com> Signed-off-by: Shaohua Li <shli@fb.com> Actually md11 is my root and a raid1 made from hdds and one ssd. Including write behind.
Created attachment 257573 [details] debug patch can you please check if the patch works
Created attachment 257575 [details] patch for the bug alright, this is the patch I'll push to upstream, could you please check?
I just tested your "patch for the bug". Unfortunatelly still fails. Have you tried it? My raid is a raid1 over three hdds (one is spare) and one ssd. With write behind enabled.
I did have a test in my virtual machine and it fixes one bug, maybe not exactly the one you saw. do you have kernel log with the patch applied?
Yes. The error is the same. (Havent compared the numbers though.) I applied it on top of 4.12.2. Is there a way to save that error? Making a photo of the screen feels wrong. And compare by hand is error prone.
I have tried to emulate the setup you have in a virtual machine. 3 ahci harddisks and one fast nvme. Without the patch, I did see a crash and the patch fix it, though the crash I saw is in difference place. I think photo is the only way if you don't have a serial console. Can you give more details of your raid setup? (with mdadm -Q --detail /dev/md0). Is there any io error in the disks?
I havent seen any io errors. # mdadm -Q --detail /dev/md11 /dev/md11: Version : 0.90 Creation Time : xxx Raid Level : raid1 Array Size : 27262912 (26.00 GiB 27.92 GB) Used Dev Size : 27262912 (26.00 GiB 27.92 GB) Raid Devices : 3 Total Devices : 4 Preferred Minor : 11 Persistence : Superblock is persistent Intent Bitmap : Internal Update Time : Thu Jul 20 19:33:38 2017 State : clean Active Devices : 3 Working Devices : 4 Failed Devices : 0 Spare Devices : 1 UUID : xxx Events : 0.1555 Number Major Minor RaidDevice State 0 8 21 0 active sync writemostly /dev/sdb5 1 8 53 1 active sync writemostly /dev/sdd5 2 8 5 2 active sync /dev/sda5 3 8 37 - spare /dev/sdc5 # for i in /dev/sd{a,b,d}5 ; do mdadm -X $i ; done Filename : /dev/sda5 Magic : 6d746962 Version : 4 UUID : xxx Events : 1555 Events Cleared : 1555 State : OK Chunksize : 64 MB Daemon : 5s flush period Write Mode : Allow write behind, max 256 Sync Size : 27262912 (26.00 GiB 27.92 GB) Bitmap : 416 bits (chunks), 0 dirty (0.0%) Filename : /dev/sdb5 Magic : 6d746962 Version : 4 UUID : xxx Events : 1555 Events Cleared : 1555 State : OK Chunksize : 64 MB Daemon : 5s flush period Write Mode : Allow write behind, max 256 Sync Size : 27262912 (26.00 GiB 27.92 GB) Bitmap : 416 bits (chunks), 0 dirty (0.0%) Filename : /dev/sdd5 Magic : 6d746962 Version : 4 UUID : xxx Events : 1555 Events Cleared : 1555 State : OK Chunksize : 64 MB Daemon : 5s flush period Write Mode : Allow write behind, max 256 Sync Size : 27262912 (26.00 GiB 27.92 GB) Bitmap : 416 bits (chunks), 0 dirty (0.0%)
Created attachment 257623 [details] patch for the bug alright, I got it. it's the trim request. This one should fix all the issues.
This patch worked! The system boots without an error. Thanks!
Will you commit that fix?
yep, already in upstream, please close this bug