Bug 213077

Summary: Kernel 5.10.37 immediately panics at boot w/ Intel Core i7-4910MQ Haswell or Core i3-5010U Broadwell w/ custom .config CONFIG_INTEL_IOMMU_DEFAULT_ON=y, same config worked with 5.10.36, due to commit 416fa531c816 = a8ce9ebbecdfda3322bbcece6b3b25888217f8e3
Product: Platform Specific/Hardware Reporter: Joerg M. Sigle (joerg.sigle)
Component: x86-64Assignee: platform_x86_64 (platform_x86_64)
Status: RESOLVED CODE_FIX    
Severity: normal CC: bp, hagar-dunor, it, joro, sashal, steffen
Priority: P1    
Hardware: x86-64   
OS: Linux   
Kernel Version: 5.10.37 Subsystem:
Regression: No Bisected commit-id:

Description Joerg M. Sigle 2021-05-15 09:35:51 UTC
Hello.

After upgrade from 5.10.36 to 5.10.37, with otherwise identical configuration,
there's a kernel panic extremely early after booting. This is specific to one
laptop, a Lenovo W540 Haswell with Intel Core i7-4910MQ CPU and two SSDs.

Kernel 5.10.37 (with slightly different .configs)
boots on 4 other machines with different CPUs.

I'm currently still trying to figure out what causes this.
The laptop has worked with numerous previous kernels,
including 5.10.34 and 5.10.36 with the same .config

This is a self-compiled kernel with a configuration used for years.

The only difference in config-5.10.37-i7 is: CONFIG_AD9467 is not set.

vmlinuz, System.map, initrd.img have very similar sizes;
The content of initrd.img seems identical (only Intel Microcode).
And /lib/modules/5.10.37-i7 has same structure/filenames
as /lib/modules/5.10.34-i7 or /lib/modules/5.10.36-i7

(N.B.: -i7 is a custom prefix added by my configuration.)

Output at from start of boot till crash:

Loading Linux 5.10.37-i7 ...
Loading intiial ramdisk ...
Wrong EFI loader signature. (was there before)
early console in extract_kernel
input_data: 0x0...
input_len: 0x0...
output: 0x0...
output_len: 0x0...
kernel_total_size: 0x0...
needed_size: 0x0...
trampoline_32bit: 0x0...
Physical KASLR using RDRAND RDTSC...
Virtual KASLR using RDRAND RDTSC...
Decompressing Linux...

Decompressing Linux... Parsing ELF... Performing relocations... done.
Booting the kernel.
[    0.395084] ACPI Error: Needed type ... (was there before)
[    0.395155] ACPI Error: AE_AML_OPERAND_TYPE ... (was there before)
[    0.395223] ACPI Error: Aborting method ... (was there before)

(Clear screen occurs here)

[    2.537239]  ? rest_init+0xb5/0xb5
[    2.537239]  kernel_init+0x5/0xfb
[    2.537239]  ret_from_fork+0xZZ/0x30
[    2.537239] Modules linked in:
[    2.537239] CR2: 0000000000000008
[    2.537239] ---[ end trace d82521e25a2a6c7 ]--
[    2.537239] RIP: 0010:__domain_mapping+0xa7/0x3a0
[    2.537239] Code: 85 13 02 00 00 4c 89 d3 45 31 c9 31 ed 48 c7 04 24 00 00 00 00 48 c1 e3 0c 4c 09 f3 4d 85 c0 0f 84 1c 01 00 00 4d 85 c9 75 61 <41> 8b 54 24 08 49 8b 1c 24 41 8b 44 24 0c 89 d1 48 83 e3 fc 48 2b
[    2.537239] RSP: 0000:ffffbb3300057d78 EFLAGS: 00010246
[    2.537239] RAX: 0000000000000000 RBX: 0000000000001003 RCX: 000000000000002d
[    2.537239] RDX: 0000000000000000 RSI: 0000000000000001 RDI: ffff93bce3638000
[    2.537239] RBP: 0000000000000000 R08: 000000000000009d R09: 0000000000000000
[    2.537239] R10: 0000000000000001 R11: 80b163e3bc93ffff R12: 0000000000000000
[    2.537239] R13: 0000000000000001 R14: 0000000000000003 R15: ffff93bce3638000
[    2.537239] FS:  0000000000000000(0000) GS:ffff93c226700000(0000) knlGS: 0000000000000000
[    2.537239] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[    2.537239] CR2: 0000000000000008 CR3: 00000002c740a001 CR4: 00000000001706e0
[    2.537239] Kernel panic - not syncing: Attempting to kill init! exitcode=0x00000009
[    2.537239] --[ end Kernel panic - not syncing: Attempted to kill init! exit code=0x00000009 ]--
_
Comment 1 Joerg M. Sigle 2021-05-15 10:13:54 UTC
Update:

The problem persists when I use: make localmodconfig
(while running 5.10.36 with my previously used custom configuration).

The problem goes away (i.e. successful boot) when I use: make defconfig


So the update from 5.10.36 to 5.10.37 introduced a problem that becomes
apparent on this machine *if* the same custom configuration options are
used that have worked with numerous kernels before.

Sadly, a diff between my custom 5.10.36 and the default 5.10.37 configuration
shows several thousand lines :-); so discovering the module causing the trouble
will not be easy without further knowledge.

If you're interested, I can try to isolate it further.
If you want to suggest a few patches that might possibly cause the problem, I can start with .36 and add them individually to check the result.
Otherwise I'll wait for 5.10.38 and check that with my custom config again.

I've changed the severity from "blocking" to "normal"
after I found that 5.10.37 *can* boot with the default configuration.

Kind regards, js
Comment 2 Borislav Petkov 2021-05-15 10:24:52 UTC
Your RIP points to __domain_mapping and that is drivers/iommu/intel/iommu.c.

There are 4 patches which went into 5.10.37 touching that area:

$ git log --oneline v5.10.36..v5.10.37 drivers/iommu/intel/iommu.c
e759105d459b iommu/vt-d: Invalidate PASID cache when root/context entry changed
c848416cc05a iommu/vt-d: Remove WO permissions on second-level paging entries
416fa531c816 iommu/vt-d: Preset Access/Dirty bits for IOVA over FL
eb0530d71c78 iommu/vt-d: Report right snoop capability when using FL for IOVA

so you can try reverting them, I think top-to-bottom order should do a clean revert, ontop of 5.10.37 and see if it boots then.

HTH.
Comment 3 Joerg M. Sigle 2021-05-15 17:52:47 UTC
Boris, thanks a lot for your quick and helpful response.

I found the boot panic occurs if the following kernel config option is ENabled:

Device Drivers
  IOMMU Hardware Support
    Enable Intel DMA Remapping Devices by default

CONFIG_INTEL_IOMMU_DEFAULT_ON:
Selecting this option will enable a DMAR device at boot time if
one is found. If this option is not selected, DMAR support can
be enabled by passing intel_iommu=on to the kernel.

This option triggered the boot panic independently
from other IOMMU or APGART related options which I also tried.


With 5.10.37 and this option DISabled, the running system shows:

$ dmesg | grep -i DMAR
[    0.001211] ACPI: DMAR 0x000000007CED3000 0000B0 (v01 LENOVO TP-GN    00002410 PTEC 00000002)
[    0.001231] ACPI: Reserving DMAR table memory at [mem 0x7ced3000-0x7ced30af]
[    0.304147] DMAR: Host address width 39
[    0.304148] DMAR: DRHD base: 0x000000fed90000 flags: 0x0
[    0.304164] DMAR: dmar0: reg_base_addr fed90000 ver 1:0 cap c0000020660462 ecap f0101a
[    0.304165] DMAR: DRHD base: 0x000000fed91000 flags: 0x1
[    0.304169] DMAR: dmar1: reg_base_addr fed91000 ver 1:0 cap d2008020660462 ecap f010da
[    0.304170] DMAR: RMRR base: 0x0000007c4df000 end: 0x0000007c4f5fff
[    0.304171] DMAR: RMRR base: 0x0000007d800000 end: 0x0000007f9fffff

$ dmesg | grep -i gart
[    0.323369] Linux agpgart interface v0.103

$ dmesg | grep -i iommu
[    0.171577] iommu: Default domain type: Translated 

--

Does this information already suffice for you to pinpoint the problem
into one of the patches you named, or should I still check what happens
with each of them reverted individually? 

And if I should:
Are these patches independent from each other, or do I risk
additional instability when removing them individually?
And do you think the same panic might occur in a running system,
when DMA Remapping might be enabled later on for whatever reason?
I certainly want to help in locating the cause of the problem,
but would like to avoid ruining a filesystem while testing :-)

With the above option *DIS*abled, and after a successful boot,
I've at least tried glxgear and vlc video playing without problems -
blindly guessing that iommu / agpgart stuff *might* be used by that
(only intel graphics so far; nvidia is (sub-)optimus, needs more work...).

Thanks & Regards, Joerg
Comment 4 Borislav Petkov 2021-05-15 19:18:32 UTC
Yeah, this option practically force-enables the iommu which triggers the splat.

In the meantime, bug#213087 points to 

416fa531c816 iommu/vt-d: Preset Access/Dirty bits for IOVA over FL

being problematic so you could try reverting only this one and see if it fixes your issue with the same kernel .config.

Thx.
Comment 5 Joerg M. Sigle 2021-05-15 21:20:40 UTC
Thanks & OK. I'll check that and report results.

Meanwhile, observed the same problem on a Broadwell Intel Core i3-5010U,

The problem also went away with disabling the same option in the kernel config.

So it's NOT specific to the notebook initially mentioned.

Regards, Joerg
Comment 6 Joerg M. Sigle 2021-05-15 21:32:01 UTC
Somebody else found the same problem and debugged it towards the commit you mentioned (416fa531c816 or a8ce9ebbecdfda3322bbcece6b3b25888217f8e3):

https://lkml.org/lkml/2021/5/15/164

Regards Joerg
Comment 7 Joerg M. Sigle 2021-05-15 23:47:39 UTC
This patch:

416fa531c816 iommu/vt-d: Preset Access/Dirty bits for IOVA over FL
commit 416fa531c8160151090206a51b829b9218b804d9
Upstream commit a8ce9ebbecdfda3322bbcece6b3b25888217f8e3

https://github.com/arter97/x86-kernel/commit/416fa531c8160151090206a51b829b9218b804d9

while doing other things, changed the conditional:

if (!sg) { ...
        sg_res = nr_pages;
        pteval = ((phys_addr_t)phys_pfn << VTD_PAGE_SHIFT) | attr;
}

to an unconditional:

pteval = ((phys_addr_t)phys_pfn << VTD_PAGE_SHIFT) | attr;

Reinserting the check for !sg fixed the immediate panic on boot for me.
Reverting the remainder of the same patch had not helped before.

Here's a possible patch for 5.10.37:

-------------------------------------------------------------------------
--- a/drivers/iommu/intel/iommu.c       2021-05-14 09:50:46.000000000 +0200
+++ b/drivers/iommu/intel/iommu.c       2021-05-16 01:02:17.816810690 +0200
@@ -2373,7 +2373,10 @@
                }
        }
 
-       pteval = ((phys_addr_t)phys_pfn << VTD_PAGE_SHIFT) | attr;
+        if (!sg) {
+                sg_res = nr_pages;
+                pteval = ((phys_addr_t)phys_pfn << VTD_PAGE_SHIFT) | attr;
+       }
 
        while (nr_pages > 0) {
                uint64_t tmp;
-------------------------------------------------------------------------

Could you please check this patch submission and pass it to upstream?

I have, however, NOT tried to understand what the code really does.
So please ask the suppliers of patch 416fa531c816 whether their
removal of the condition was intentional or a mere lapsus. Thanks!

Thanks and kind regards, Joerg
Comment 8 hagar-dunor 2021-05-16 12:06:00 UTC
Just wanted to point out that I followed Borislav's advice on reverting 

$ git log --oneline v5.10.36..v5.10.37 drivers/iommu/intel/iommu.c
e759105d459b iommu/vt-d: Invalidate PASID cache when root/context entry changed
c848416cc05a iommu/vt-d: Remove WO permissions on second-level paging entries
416fa531c816 iommu/vt-d: Preset Access/Dirty bits for IOVA over FL
eb0530d71c78 iommu/vt-d: Report right snoop capability when using FL for IOVA

following top to bottom, so in bug 213087 when I say I reverted 416fa531c816 I also reverted c848416cc05a and e759105d459b.
Comment 9 Joerg M. Sigle 2021-05-16 12:59:52 UTC
For me it was sufficient to apply *only* the little patch I supplied above.
That works fine on two systems, one with Intel GPU, one with Intel+nVidia GPU.

It's probably a correction to 416fa531c816, with no need for reversion.

I've also sent an e-mail to the original suppliers of 416fa531c816 at Intel.
Hope they can check it and comment after the weekend.
Comment 10 hagar-dunor 2021-05-16 13:16:20 UTC
Hi Joerg, as you said yourself you have no idea what the code does. Me neither. So taking out a part of 416fa531c816 which fixes a problem on the surface could leave other parts inconsistent. I'm not saying your patch is not the actual fix, maybe it is, but maybe not. The most logical way forward is to bring all these facts to Lu Baolu, author of 416fa531c816, and let him make sense out of it.
Comment 11 Joerg M. Sigle 2021-05-16 17:42:44 UTC
Hi Hagar. Thanks for your feedback.

> taking out a part of 416fa531c816 which fixes a problem on the surface
> could leave other parts inconsistent

Actually, I'm leaving all changes in place but one - and that one
reliably causes a kernel panic - and its context looks as if
it could have been introduced unintentionally.

> The most logical way forward is to bring all these facts to Lu Baolu,

Sure. That's why I already invited both Raj and Baolu from Intel (see above please), and also asked the bugzilla/kernel people for their reviews.


> you have no idea what the code does

I didn't look very far. But still, this wasn't a blind shot:

First: The 4 patches include fixes as well as attempts to simplify things.
I'd like to keep at least the fixes. But sadly, the patches didn't appear
to be completely independently reversible.

Now, looking at 416fa531c816: There was a little rearrangement of code lines, brackets etc. In that process, the conditional check "if (!sg) {...}" may have been removed unintentionally or all-too-easily. Both reasons occur quite often.

Then: Exactly the absence of this condition produces a kernel panic. Always.
So the status quo is *certainly not* correct. Restoring the condition removes the kernel panic - on two generations of CPUs. Always. Assuming that the original author of the code had a reason for putting the condition in place, and knowing that conditions are often used to prevent illegal accesses etc., that looks promising :-)

On possible side effects: Making "pteval =..." conditional again only restores the exact situation from 5.10.36 with regard to this line. That would *not* be worse than a complete reversal of the patch(es). This holds true, unless the surrounding rearrangement should introduce less obvious changes. And here,
the colleagues at Intel should certainly know and catch anything bad.
-

Anyway. More observations, a recommendation and a question:

The fix also works on a Core i5-4210Y Haswell with 8GB RAM
(the previously mentioned ones had 8GB and 24GB, so all exceed 32 Bit).

Interestingly, the panic has NOT occured with a stock 5.10.37 kernel on two older CPUs: Core 2 Quad Q6600 w/ 4GB and Dual Xeon X5355 w/ 20GB RAM, even though some iommu is mentioned in their dmesg logs.

These older machines can boot with the 5.10.37 stock AND my fixed version.

This is a clear reminder - most importantly - that this error does NOT become obvious on every machine. Whoever works in this area of the kernel must test their changes in a suitable = susceptible environment. This may sound trivial. But the fact that this panic could make it into the kernel at all, is living proof that not everyone does observe this simple rule.

Here, I would certainly like to know more about what the code does, and where the hardware generations differ. Sadly, I don't have time to dive into that right now. If the original authors might want to briefly explain, however, I'd appreciate that very much :-)

Thanks & regards! Joerg
Comment 12 Yuriy Dmitriev 2021-05-16 19:44:26 UTC
Folks, please see this. Hardware Lenovo.
https://bugzilla.kernel.org/show_bug.cgi?id=213095

Seems like this bug. Custom working config for years)

With best wishes - YD.
Comment 13 Steffen Nurpmeso 2021-05-16 22:07:58 UTC
Fyi.  For me (5.10.35->) 5.10.37 does not boot at all, it hangs after EFISTUB line.  No modules, custom config.  Kernel command line includes "iommu=on intel_iommu=on".  This is a Lenovo IdeaPad 530S i5 8th, 8GB.
Comment 14 hagar-dunor 2021-05-16 22:27:13 UTC
Joerg no worries, it's good you took the time to look deeper. Cheers.
Comment 15 Joerg M. Sigle 2021-05-16 22:33:58 UTC
Hi Steffen. Thanks for your comment.

You might check whether this is the same bug like this:

1. Try to boot with following entries on kernel command line:
intel_iommu=off

(Didn't try that myself, but this might work according to
 the snippet from make menuconfig help shown further above.)

Or:

1. make menuconfig
2. go to "Device Drivers" / "IOMMU Hardware Support" / set "Enable Intel DMA Remapping Devices by default" to "No" - Exit and save
3. make -j17 && make modules_install 
4. copy the newly built kernel etc. to your boot area; mkinitrd etc. if needed
5. reboot

If either of these two approaches works, it's probably the same cause.

Then, you can try to change the few lines in drivers/iommu/intel/iommu.c
as indicated in the suggested small patch further above. You can do this with the patch utility, or change the indicated lines manually. Or I can send you my version of the complete patched file.

Afterwards you can change the kernel parameter back to intel_iommu=on 
or switch the "Enable Intel DMA Remapping Devices by default" setting back to "Yes"  via make menuconfig, recompile, reinstall the new kernel etc., and should still be able to boot *without* kernel panic.

For me, the problem was not Lenovo (i7) specific. My second affected machine is a Zotac (i5); the third one (IIRC) may even have an Intel Board (i3). Common denominator among these: Haswell or Broadwell CPUs.

If you try these steps, please do report your results.
Thank you & good luck & regards, Joerg
Comment 16 Joerg M. Sigle 2021-05-16 22:42:54 UTC
@Hagar :-) :-)
Comment 17 Steffen Nurpmeso 2021-05-16 22:48:23 UTC
Hello Jörg.  Thanks for the info and your dedication.
Well.. i can confirm that removing iommu= stuff from kernel command line makes it boot.  I am now running 5.10.37.
(I would rather not do anything else, i usually boot once a week, we are not the youngest, and it would only be -j4 here anyway ;)
Hope you get it done, and looking forward for 5.10.38!
Thank you, Ciao!
Comment 18 Joerg M. Sigle 2021-05-17 06:52:17 UTC
Lu Baolu has responded positively this morning - thank you!
I've made a patch submission with proper formatting etc.
Kind regards, Joerg
Comment 19 Yuriy Dmitriev 2021-05-17 10:04:40 UTC
I confirm. Patch working. Write this reply from 5.10.37

Thanks alot, folks!



(In reply to Joerg M. Sigle from comment #7)
> This patch:
> 
> 416fa531c816 iommu/vt-d: Preset Access/Dirty bits for IOVA over FL
> commit 416fa531c8160151090206a51b829b9218b804d9
> Upstream commit a8ce9ebbecdfda3322bbcece6b3b25888217f8e3
> 
> https://github.com/arter97/x86-kernel/commit/
> 416fa531c8160151090206a51b829b9218b804d9
> 
> while doing other things, changed the conditional:
> 
> if (!sg) { ...
>         sg_res = nr_pages;
>         pteval = ((phys_addr_t)phys_pfn << VTD_PAGE_SHIFT) | attr;
> }
> 
> to an unconditional:
> 
> pteval = ((phys_addr_t)phys_pfn << VTD_PAGE_SHIFT) | attr;
> 
> Reinserting the check for !sg fixed the immediate panic on boot for me.
> Reverting the remainder of the same patch had not helped before.
> 
> Here's a possible patch for 5.10.37:
> 
> -------------------------------------------------------------------------
> --- a/drivers/iommu/intel/iommu.c       2021-05-14 09:50:46.000000000 +0200
> +++ b/drivers/iommu/intel/iommu.c       2021-05-16 01:02:17.816810690 +0200
> @@ -2373,7 +2373,10 @@
>                 }
>         }
>  
> -       pteval = ((phys_addr_t)phys_pfn << VTD_PAGE_SHIFT) | attr;
> +        if (!sg) {
> +                sg_res = nr_pages;
> +                pteval = ((phys_addr_t)phys_pfn << VTD_PAGE_SHIFT) | attr;
> +       }
>  
>         while (nr_pages > 0) {
>                 uint64_t tmp;
> -------------------------------------------------------------------------
> 
> Could you please check this patch submission and pass it to upstream?
> 
> I have, however, NOT tried to understand what the code really does.
> So please ask the suppliers of patch 416fa531c816 whether their
> removal of the condition was intentional or a mere lapsus. Thanks!
> 
> Thanks and kind regards, Joerg
Comment 20 Borislav Petkov 2022-01-06 18:42:17 UTC
Looks fixed. Closing.