Bug 196315 - kernel BUG at nvme/host/pci.c
Summary: kernel BUG at nvme/host/pci.c
Status: RESOLVED CODE_FIX
Alias: None
Product: Drivers
Classification: Unclassified
Component: Other (show other bugs)
Hardware: x86-64 Linux
: P1 high
Assignee: drivers_other
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2017-07-10 15:28 UTC by Andreas Pflug
Modified: 2017-08-15 12:28 UTC (History)
2 users (show)

See Also:
Kernel Version: 4.9.30
Subsystem:
Regression: No
Bisected commit-id:


Attachments
netconsole kernel log (16.67 KB, text/plain)
2017-07-10 15:28 UTC, Andreas Pflug
Details

Description Andreas Pflug 2017-07-10 15:28:22 UTC
Created attachment 257445 [details]
netconsole kernel log

I'm running a patched (see below) debian 4.9.30 kernel with xen4.8.1 on Debian9. Starting a specific virtual machine, very soon the kernel will emit

  kernel BUG at /usr/src/kernel/linux-4.9.30/drivers/nvme/host/pci.c:495!

via netconsole to my logging host, and become unstable until hard reset.

Hardware is dual E5-2620v4 on Supermicro 10DRI-T with two SAMSUNG
MZQLW960HMJP-00003 NVME disks (mdadm RAID-1) backing the vhds (os on separate SSD).

The bug was reported to debian as https://bugs.debian.org/866511 . According to Ben Hutchings' advice, I patched the standard kernel with 0001-swiotlb-ensure-that-page-sized-mappings-are-page-ali.patch since its description sounded promising, but the bug remains.

Log is attached, cut after 460 lines: the last trace on CPU15 is
repeated all over again, eventually leading to "Fixing recursive fault
but reboot is needed!"

Regards,
Andreas
Comment 1 Ben Hutchings 2017-07-10 17:38:38 UTC
This report should be reassigned or closed (as I don't think nvme bugs are tracked on Bugzilla).
Comment 2 David Woodhouse 2017-07-11 14:33:35 UTC
No idea why you filed this bug against raw NOR/NAND flash devices, but I'll take it anyway...
Comment 3 Andreas Pflug 2017-07-11 14:51:05 UTC
Because this appeared the least non-appropriate category to me...

After Ben's hint, I posted to linux-nvme@lists.infradead.org, and checked with a 4.12.0 kernel with same result.
Comment 4 David Woodhouse 2017-08-15 12:28:56 UTC
http://xenbits.xen.org/xsa/advisory-229.html

Note You need to log in before you can comment on or make changes to this bug.