Bug 198861
Summary: | Regression causes kernel OOPS and hang in SCSI error report | ||
---|---|---|---|
Product: | IO/Storage | Reporter: | ncopa |
Component: | SCSI | Assignee: | linux-scsi (linux-scsi) |
Status: | RESOLVED CODE_FIX | ||
Severity: | normal | CC: | buschmann, bvanassche, dennis-busch, fan4326, flyser42, jskier, kernel, kionmaru, mabo, nielson.peter, sgh |
Priority: | P1 | ||
Hardware: | All | ||
OS: | Linux | ||
Kernel Version: | 4.14.20 | Subsystem: | |
Regression: | No | Bisected commit-id: |
Description
ncopa
2018-02-21 15:43:42 UTC
Those two patches fixes the issue: https://patchwork.kernel.org/patch/10181165/ https://patchwork.kernel.org/patch/10233617/ Getting a very similar Warning on 4.15.7 when dealing with an old unhappy disk. TL;DR the above patches applied cleanly and fixed this on 4.15.7 for me as well. Mar 3 20:42:00 redsun kernel: [ 750.861060] ata4.00: exception Emask 0x11 SAct 0x80000 SErr 0x680100 action 0x6 frozen Mar 3 20:42:00 redsun kernel: [ 750.861063] ata4.00: irq_stat 0x48000008, interface fatal error Mar 3 20:42:00 redsun kernel: [ 750.861066] ata4: SError: { UnrecovData 10B8B BadCRC Handshk } Mar 3 20:42:00 redsun kernel: [ 750.861069] ata4.00: failed command: READ FPDMA QUEUED Mar 3 20:42:00 redsun kernel: [ 750.861078] ata4.00: cmd 60/00:98:c0:c5:c3/01:00:02:00:00/40 tag 19 ncq dma 131072 in Mar 3 20:42:00 redsun kernel: [ 750.861078] res 40/00:98:c0:c5:c3/00:00:02:00:00/40 Emask 0x10 (ATA bus error) Mar 3 20:42:00 redsun kernel: [ 750.861080] ata4.00: status: { DRDY } Mar 3 20:42:39 redsun kernel: [ 789.396122] WARNING: CPU: 3 PID: 1926 at /usr/src/linux-stable/kernel/rcu/tree.c:2792 rcu_process_callbacks+0x482/0x4a0 Mar 3 20:42:39 redsun kernel: [ 789.396124] Modules linked in: cmac cifs fscache xt_CHECKSUM iptable_mangle ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_nat_ipv4 nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack ipt_REJECT nf_reject_ipv4 xt_tcpudp tun bridge ebtable_filter ebtables ip6table_f ilter ip6_tables iptable_filter ip_tables x_tables ipv6 cfg80211 rfkill 8021q garp stp llc fuse amdkfd amd_iommu_v2 amdgpu snd_hda_codec_realtek snd_hda_codec_generic snd_hda_codec_hdmi snd_hda_intel snd_hda_codec snd_hda_core chash ttm drm_kms_helper snd_hwdep i2c_dev uas drm igb snd_pcm snd_timer agpgart wmi_bmof sn d mxm_wmi kvm_amd usb_storage fb_sys_fops syscopyarea sysfillrect kvm soundcore ptp sysimgblt pps_core dca i2c_algo_bit alx mdio ccp crct10dif_pclmul crc32_pclmul evdev crc32c_intel shpchp i2c_piix4 i2c_core Mar 3 20:42:39 redsun kernel: [ 789.396173] efi_pstore efivars ghash_clmulni_intel k10temp hwmon button wmi acpi_cpufreq loop hid_generic usbhid hid uhci_hcd ohci_pci ohci_hcd ehci_hcd xhci_pci xhci_hcd Mar 3 20:42:39 redsun kernel: [ 789.396188] CPU: 3 PID: 1926 Comm: Web Content Tainted: G W 4.15.7 #21 Mar 3 20:42:39 redsun kernel: [ 789.396189] Hardware name: Gigabyte Technology Co., Ltd. AX370-Gaming 5/AX370-Gaming 5, BIOS F21 02/08/2018 Mar 3 20:42:39 redsun kernel: [ 789.396193] RIP: 0010:rcu_process_callbacks+0x482/0x4a0 Mar 3 20:42:39 redsun kernel: [ 789.396194] RSP: 0000:ffff94857ecc3f10 EFLAGS: 00010002 Mar 3 20:42:39 redsun kernel: [ 789.396197] RAX: ffffffffffffd800 RBX: ffff94857ece1800 RCX: 0000000000004601 Mar 3 20:42:39 redsun kernel: [ 789.396198] RDX: 0000000000000001 RSI: ffff94857ecc3f18 RDI: ffff94857ece1838 Mar 3 20:42:39 redsun kernel: [ 789.396199] RBP: ffffffffa983cb40 R08: 0000000000024f30 R09: ffffffffa80fbdcb Mar 3 20:42:39 redsun kernel: [ 789.396200] R10: fffff3a0c85f5e00 R11: 0000000000000326 R12: ffff94857ece1838 Mar 3 20:42:39 redsun kernel: [ 789.396201] R13: 0000000000000246 R14: 7fffffffffffffff R15: fffffffffffffffc Mar 3 20:42:39 redsun kernel: [ 789.396204] FS: 00007f0492a279c0(0000) GS:ffff94857ecc0000(0000) knlGS:0000000000000000 Mar 3 20:42:39 redsun kernel: [ 789.396205] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Mar 3 20:42:39 redsun kernel: [ 789.396207] CR2: 00007f04656dc000 CR3: 000000026ceaa000 CR4: 00000000003406e0 Mar 3 20:42:39 redsun kernel: [ 789.396208] Call Trace: Mar 3 20:42:39 redsun kernel: [ 789.396211] <IRQ> Mar 3 20:42:39 redsun kernel: [ 789.396216] __do_softirq+0xe0/0x2dd Mar 3 20:42:39 redsun kernel: [ 789.396220] irq_exit+0xae/0xb0 Mar 3 20:42:39 redsun kernel: [ 789.396223] smp_apic_timer_interrupt+0x76/0x130 Mar 3 20:42:39 redsun kernel: [ 789.396225] apic_timer_interrupt+0x7d/0x90 Mar 3 20:42:39 redsun kernel: [ 789.396227] </IRQ> Mar 3 20:42:39 redsun kernel: [ 789.396229] RIP: 0033:0x7f0490e38707 Mar 3 20:42:39 redsun kernel: [ 789.396230] RSP: 002b:00007ffe7bb45690 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff11 Mar 3 20:42:39 redsun kernel: [ 789.396232] RAX: 00007f045c4af910 RBX: 00007f047fc08000 RCX: ffffffffff78f144 Mar 3 20:42:39 redsun kernel: [ 789.396233] RDX: 00007f0490e386f0 RSI: 00007f045c4af940 RDI: 00007f047fc08000 Mar 3 20:42:39 redsun kernel: [ 789.396234] RBP: 00007ffe7bb46a40 R08: 00007f045f77f800 R09: 0000000000000000 Mar 3 20:42:39 redsun kernel: [ 789.396235] R10: 000000000000000a R11: 00007f045fc113ca R12: 00007f045c4af970 Mar 3 20:42:39 redsun kernel: [ 789.396237] R13: 00007f045c4b0078 R14: 00007ffe7bb456f0 R15: 0000000000000000 Mar 3 20:42:39 redsun kernel: [ 789.396238] Code: 8b 1d e3 66 86 01 48 85 db 74 1b 48 8b 03 48 8b 7b 08 48 83 c3 18 48 89 ee e8 6b 6f f0 00 48 8b 03 48 85 c0 75 e8 e9 b6 fb ff ff <0f> 0b e9 df fd ff ff 0f 0b e9 d6 fc ff ff 4c 89 ee 4c 89 e7 e8 Mar 3 20:42:39 redsun kernel: [ 789.396279] ---[ end trace 9f6cbfdd9555c3ff ]--- The two patches linked above applied cleanly, and the link can now be reset and the disk can continue to be used: Mar 4 01:19:21 redsun kernel: [ 500.894429] ata4.00: exception Emask 0x11 SAct 0x40000001 SErr 0x680100 action 0x6 frozen Mar 4 01:19:21 redsun kernel: [ 500.894432] ata4.00: irq_stat 0x48000008, interface fatal error Mar 4 01:19:21 redsun kernel: [ 500.894435] ata4: SError: { UnrecovData 10B8B BadCRC Handshk } Mar 4 01:19:21 redsun kernel: [ 500.894438] ata4.00: failed command: READ FPDMA QUEUED Mar 4 01:19:21 redsun kernel: [ 500.894446] ata4.00: cmd 60/00:00:28:f8:7a/01:00:05:00:00/40 tag 0 ncq dma 131072 in Mar 4 01:19:21 redsun kernel: [ 500.894446] res 40/00:00:28:f8:7a/00:00:05:00:00/40 Emask 0x10 (ATA bus error) Mar 4 01:19:21 redsun kernel: [ 500.894448] ata4.00: status: { DRDY } Mar 4 01:19:21 redsun kernel: [ 500.894451] ata4.00: failed command: READ FPDMA QUEUED Mar 4 01:19:21 redsun kernel: [ 500.894459] ata4.00: cmd 60/00:f0:28:f7:7a/01:00:05:00:00/40 tag 30 ncq dma 131072 in Mar 4 01:19:21 redsun kernel: [ 500.894459] res 40/00:00:28:f8:7a/00:00:05:00:00/40 Emask 0x10 (ATA bus error) Mar 4 01:19:21 redsun kernel: [ 500.894461] ata4.00: status: { DRDY } Mar 4 01:22:04 redsun kernel: [ 664.145825] ata4.00: exception Emask 0x11 SAct 0x400000 SErr 0x680100 action 0x6 frozen Mar 4 01:22:04 redsun kernel: [ 664.145828] ata4.00: irq_stat 0x48000008, interface fatal error Mar 4 01:22:04 redsun kernel: [ 664.145831] ata4: SError: { UnrecovData 10B8B BadCRC Handshk } Mar 4 01:22:04 redsun kernel: [ 664.145834] ata4.00: failed command: READ FPDMA QUEUED Mar 4 01:22:04 redsun kernel: [ 664.145842] ata4.00: cmd 60/30:b0:e8:26:4e/00:00:03:00:00/40 tag 22 ncq dma 24576 in Mar 4 01:22:04 redsun kernel: [ 664.145842] res 40/00:b0:e8:26:4e/00:00:03:00:00/40 Emask 0x10 (ATA bus error) Mar 4 01:22:04 redsun kernel: [ 664.145844] ata4.00: status: { DRDY } Mar 4 01:22:25 redsun kernel: [ 685.225653] ata4: limiting SATA link speed to 3.0 Gbps Mar 4 01:22:25 redsun kernel: [ 685.225658] ata4.00: exception Emask 0x11 SAct 0x1800 SErr 0x600100 action 0x6 frozen Mar 4 01:22:25 redsun kernel: [ 685.225660] ata4.00: irq_stat 0x48000008, interface fatal error Mar 4 01:22:25 redsun kernel: [ 685.225663] ata4: SError: { UnrecovData BadCRC Handshk } Mar 4 01:22:25 redsun kernel: [ 685.225666] ata4.00: failed command: READ FPDMA QUEUED Mar 4 01:22:25 redsun kernel: [ 685.225674] ata4.00: cmd 60/00:58:90:96:d6/01:00:08:00:00/40 tag 11 ncq dma 131072 in Mar 4 01:22:25 redsun kernel: [ 685.225674] res 40/00:58:90:96:d6/00:00:08:00:00/40 Emask 0x10 (ATA bus error) Mar 4 01:22:25 redsun kernel: [ 685.225676] ata4.00: status: { DRDY } Mar 4 01:22:25 redsun kernel: [ 685.225678] ata4.00: failed command: READ FPDMA QUEUED Mar 4 01:22:25 redsun kernel: [ 685.225686] ata4.00: cmd 60/00:60:90:97:d6/01:00:08:00:00/40 tag 12 ncq dma 131072 in Mar 4 01:22:25 redsun kernel: [ 685.225686] res 40/00:58:90:96:d6/00:00:08:00:00/40 Emask 0x10 (ATA bus error) Mar 4 01:22:25 redsun kernel: [ 685.225688] ata4.00: status: { DRDY } This has been fixed by the following commit, which is in James' tree but not yet in Linus' tree nor any of the stable trees: 3be8828fc507 ("scsi: core: Avoid that ATA error handling can trigger a kernel hang or oops"). See also https://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi.git/commit/?h=fixes&id=3be8828fc507cdafe7040a3dcf361a2bcd8e305b. A pull request that includes that commit has been sent to Linus. See also https://lkml.org/lkml/2018/3/6/767. *** Bug 198923 has been marked as a duplicate of this bug. *** Commit 3be8828fc507 is now upstream but is not yet in any of the stable trees. Kernels 4.15.10 and 4.14.27 include patch "scsi: core: Avoid that ATA error handling can trigger a kernel hang or oops". Many thanks for the updates. I upgraded to 4.15.10 now and the issue I had before (as reported in the closed duplicate) appears to be fixed with that patch. 4.14.27 kernel fixes the issue I had on my system. Thank you! |