Bug 219495 - [TPM2] tpm_tis driver crashs during the boot time.
Summary: [TPM2] tpm_tis driver crashs during the boot time.
Status: NEW
Alias: None
Product: Drivers
Classification: Unclassified
Component: Platform_x86 (show other bugs)
Hardware: All Linux
: P3 normal
Assignee: drivers_platform_x86@kernel-bugs.osdl.org
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2024-11-13 02:42 UTC by andy.liang
Modified: 2024-12-24 01:43 UTC (History)
6 users (show)

See Also:
Kernel Version:
Subsystem:
Regression: No
Bisected commit-id:


Attachments
binary_bios_measurements file (465.13 KB, text/plain)
2024-11-13 02:42 UTC, andy.liang
Details
ACPI TPM2 dump (734.38 KB, image/png)
2024-11-13 07:05 UTC, andy.liang
Details
Kernel.log (526.68 KB, text/plain)
2024-11-13 07:06 UTC, andy.liang
Details
raw binary_bios_measurements file (68.75 KB, application/octet-stream)
2024-11-14 04:48 UTC, andy.liang
Details
V6.12 dmesg (187.50 KB, text/plain)
2024-11-20 09:15 UTC, andy.liang
Details
dmesg with kernel-default-6.4.0-150700.1.1.g45e5524.x86_64.rpm (227.28 KB, text/plain)
2024-12-12 11:38 UTC, andy.liang
Details
screenshot with kernel-default-6.4.0-150700.1.1.g45e5524.x86_64.rpm (663.42 KB, image/png)
2024-12-12 11:39 UTC, andy.liang
Details
ACPI TPM2 DUMP with fix (606.05 KB, image/png)
2024-12-13 06:22 UTC, andy.liang
Details
tpm: Don't copy ACPI event log (4.13 KB, application/mbox)
2024-12-20 19:14 UTC, jarkko
Details
tpm: Map the ACPI provided event log [compiled tested only] (16.13 KB, application/mbox)
2024-12-20 22:48 UTC, jarkko
Details
tpm: Map the ACPI provided event log [in-progress] (15.88 KB, application/mbox)
2024-12-21 00:21 UTC, jarkko
Details
[PATCH] tpm: Map the ACPI provided event log (13.89 KB, application/mbox)
2024-12-21 11:35 UTC, jarkko
Details
[PATCH v2] tpm: Map the ACPI provided event log (13.78 KB, application/mbox)
2024-12-21 12:46 UTC, jarkko
Details
[PATCH v3] tpm: Map the ACPI provided event log (13.34 KB, patch)
2024-12-22 15:27 UTC, jarkko
Details | Diff
No Patch dmesg with Kernel v6.12.6 (254.14 KB, text/plain)
2024-12-23 08:38 UTC, andy.liang
Details
v2-patch-binary_bios_measurements_kernel-6.12.6 (52.96 KB, image/x-panasonic-raw)
2024-12-23 08:42 UTC, andy.liang
Details
v2-patch-kernel-6.12.6-lp155.1.1.g509a912.log (252.35 KB, text/plain)
2024-12-23 08:43 UTC, andy.liang
Details
v3-patch-binary_bios_measurements_kernel-6.12.6.raw (52.23 KB, image/x-panasonic-raw)
2024-12-23 08:43 UTC, andy.liang
Details
v3-patch-kernel-6.12.6-lp155.2.1.gfb072de.log (249.57 KB, text/plain)
2024-12-23 08:43 UTC, andy.liang
Details

Description andy.liang 2024-11-13 02:42:37 UTC
Created attachment 307212 [details]
binary_bios_measurements file

When the TPM2 ACPI table sets the Minimum Log Length size more then 4MB, a call trace occurs during the boot time. The reason is the kmalloc limits the maximum size to 4MB for x86 arch.

======
https://elixir.bootlin.com/linux/v6.8/source/arch/x86/include/asm/page_types.h#L10
#define PAGE_SHIFT 12
#define KMALLOC_SHIFT_MAX (MAX_PAGE_ORDER + PAGE_SHIFT)
 
https://elixir.bootlin.com/linux/v6.8/source/include/linux/mmzone.h#L30
#define MAX_PAGE_ORDER 10
 
https://elixir.bootlin.com/linux/v6.8/source/include/linux/slab.h#L309
#define KMALLOC_MAX_SIZE (1UL << KMALLOC_SHIFT_MAX)
The max size  = (1UL <<  MAX_PAGE_ORDER + PAGE_SHIFT) = ( 1UL << (10 + 12)) = 2^22 =4,194,304 (4MB)
======

https://elixir.bootlin.com/linux/v6.8/source/drivers/char/tpm/eventlog/acpi.c#L87
[   11.795176] kernel:  __kmalloc+0x2b1/0x330
[   11.795179] kernel:  tpm_read_log_acpi+0x95/0x1e0

Problems:
[   11.795129] kernel: Call Trace:
[   11.795131] kernel:  <TASK>
[   11.795133] kernel:  ? show_trace_log_lvl+0x1d6/0x2ea
[   11.795139] kernel:  ? show_trace_log_lvl+0x1d6/0x2ea
[   11.795141] kernel:  ? alloc_page_interleave+0x19/0x90
[   11.795145] kernel:  ? show_regs.part.0+0x23/0x29
[   11.795146] kernel:  ? show_regs.cold+0x8/0xd
[   11.795148] kernel:  ? __alloc_pages+0x2b0/0x330
[   11.795150] kernel:  ? __warn+0x8c/0x100
[   11.795154] kernel:  ? __alloc_pages+0x2b0/0x330
[   11.795156] kernel:  ? report_bug+0xa4/0xd0
[   11.795160] kernel:  ? handle_bug+0x39/0x90
[   11.795162] kernel:  ? exc_invalid_op+0x19/0x70
[   11.795164] kernel:  ? asm_exc_invalid_op+0x1b/0x20
[   11.795167] kernel:  ? __alloc_pages+0x2b0/0x330
[   11.795169] kernel:  alloc_page_interleave+0x19/0x90
[   11.795171] kernel:  alloc_pages+0x131/0x1e0
[   11.795173] kernel:  kmalloc_order+0x2f/0xd0
[   11.795175] kernel:  kmalloc_order_trace+0x1d/0x90
[   11.795176] kernel:  __kmalloc+0x2b1/0x330
[   11.795179] kernel:  tpm_read_log_acpi+0x95/0x1e0
[   11.795183] kernel:  tpm_bios_log_setup+0x54/0x270
[   11.795185] kernel:  tpm_chip_register.part.0+0x32/0x1f0
[   11.795186] kernel:  tpm_chip_register+0x61/0xa0
[   11.795188] kernel:  tpm_tis_core_init.cold+0x206/0x448
[   11.795191] kernel:  tpm_tis_init.part.0+0xb4/0x130
[   11.795193] kernel:  tpm_tis_plat_probe+0xd4/0x100

[   11.795195] kernel:  platform_probe+0x46/0xd0
[   11.795201] kernel:  really_probe+0x21f/0x420
[   11.795203] kernel:  __driver_probe_device+0xe8/0x140
[   11.795205] kernel:  driver_probe_device+0x23/0xc0
[   11.795207] kernel:  __driver_attach+0xf7/0x1f0
[   11.795209] kernel:  ? __device_attach_driver+0x140/0x140
[   11.795211] kernel:  bus_for_each_dev+0x7c/0xd0
[   11.795213] kernel:  driver_attach+0x1e/0x30
[   11.795215] kernel:  bus_add_driver+0x148/0x220
[   11.795217] kernel:  driver_register+0x95/0x100
[   11.795219] kernel:  ? tpm_init+0x68/0xfa
Comment 1 jarkko 2024-11-13 05:50:28 UTC
(In reply to andy.liang from comment #0)
> Created attachment 307212 [details]
> binary_bios_measurements file
> 
> When the TPM2 ACPI table sets the Minimum Log Length size more then 4MB, a
> call trace occurs during the boot time. The reason is the kmalloc limits the
> maximum size to 4MB for x86 arch.

I'm a bit lost why we need that attachment anyway. I don't have any use for it but maybe someone else asked for it. Can you remind me on that?

Also please attach complete klog (dmesg output) not some randomly picked snippets of it.

I'd also like to checkout acpidump contents.
Comment 2 Takashi Iwai 2024-11-13 06:50:05 UTC
(In reply to jarkko from comment #1)
> (In reply to andy.liang from comment #0)
> > Created attachment 307212 [details]
> > binary_bios_measurements file
> > 
> > When the TPM2 ACPI table sets the Minimum Log Length size more then 4MB, a
> > call trace occurs during the boot time. The reason is the kmalloc limits
> the
> > maximum size to 4MB for x86 arch.
> 
> I'm a bit lost why we need that attachment anyway. I don't have any use for
> it but maybe someone else asked for it. Can you remind me on that?

https://lore.kernel.org/262eff2e-90c5-45db-81d7-30b1cc217ef4@linux.ibm.com
Comment 3 andy.liang 2024-11-13 07:05:53 UTC
Created attachment 307213 [details]
ACPI TPM2 dump
Comment 4 andy.liang 2024-11-13 07:06:17 UTC
Created attachment 307214 [details]
Kernel.log
Comment 5 jarkko 2024-11-13 12:43:09 UTC
(In reply to Takashi Iwai from comment #2)
> (In reply to jarkko from comment #1)
> > (In reply to andy.liang from comment #0)
> > > Created attachment 307212 [details]
> > > binary_bios_measurements file
> > > 
> > > When the TPM2 ACPI table sets the Minimum Log Length size more then 4MB,
> a
> > > call trace occurs during the boot time. The reason is the kmalloc limits
> > the
> > > maximum size to 4MB for x86 arch.
> > 
> > I'm a bit lost why we need that attachment anyway. I don't have any use for
> > it but maybe someone else asked for it. Can you remind me on that?
> 
> https://lore.kernel.org/262eff2e-90c5-45db-81d7-30b1cc217ef4@linux.ibm.com

Thanks for the attachments I'll check them up later.

So, I'd expected binary_bios_measurements not to exist as the allocation files but overlooked the fact that it is still useful to check what kind of data it has :-)
So you got this via applying your workaround? It is useful because it looks like somewhat legit data and not corrupted.

You parsed it with tpm2_eventlog? Did it complain any errors? Can you attach it as binary complete unparsed?

And dmesg is with or without the workaround? Just checking (have not opened it yet).

Also apparently the attachment is parsed. What is the size of that file?
Comment 6 jarkko 2024-11-13 12:47:36 UTC
I'll enumerate the questions because there were many in the previous comment. Ignore the questions in that, and answer these instead (just to help you out):

1. I assume you got the binary bios measurements after applying your fix. Is this true?
2. I assume you parsed the measurements with tpm2_eventlog. Can you attach the unparsed binary instead?
3. I assume dmesg was generated without workaround. Is this true or not? I have not opened it yet (because lack of time).

I think these three questions should be enough for now. Don't waste your time on previous questions, I can deduce answers from these...
Comment 7 Stefan Berger 2024-11-13 13:05:01 UTC
The attached parsed contents (comment 1 attachment) of /sys/kernel/security/tpm0/binary_bios_measurements are 476kb. I cannot say how much this file would be in its binary version but I would think it would only be like 10% of it, so like 45kb since the parsed contents also display whole certificate chains. I don't see why 45kb should be an issue.

Can you attach the plain binary file?

> When the TPM2 ACPI table sets the Minimum Log Length size more then 4MB, a
> call trace occurs during the boot time. The reason is the kmalloc limits the
> maximum size to 4MB for x86 arch.

4MB would be excessive and I think would hint at a buggy BIOS. We should never allocate that much memory for the BIOS log.
Comment 8 jarkko 2024-11-13 18:07:55 UTC
(In reply to Stefan Berger from comment #7)
> The attached parsed contents (comment 1 attachment) of
> /sys/kernel/security/tpm0/binary_bios_measurements are 476kb. I cannot say
> how much this file would be in its binary version but I would think it would
> only be like 10% of it, so like 45kb since the parsed contents also display
> whole certificate chains. I don't see why 45kb should be an issue.
> 
> Can you attach the plain binary file?
> 
> > When the TPM2 ACPI table sets the Minimum Log Length size more then 4MB, a
> > call trace occurs during the boot time. The reason is the kmalloc limits
> the
> > maximum size to 4MB for x86 arch.
> 
> 4MB would be excessive and I think would hint at a buggy BIOS. We should
> never allocate that much memory for the BIOS log.

Yeah, possibly! I'll look at the data collected so far with time as soon as I get answer to my question-triplet :-) Just don't want to play assumptions, that's all.

So here the interesting part is still that apparently it could be still parsed (see my question #2).

Thanks for taking a note on this bug!
Comment 9 andy.liang 2024-11-14 04:48:03 UTC
(In reply to jarkko from comment #6)
> I'll enumerate the questions because there were many in the previous
> comment. Ignore the questions in that, and answer these instead (just to
> help you out):
> 
> 1. I assume you got the binary bios measurements after applying your fix. Is
> this true?
Yes, the binary is from the fixed machine.

Here is the output without patch
[root@localhost Documents]# dmesg | grep tpm
[    9.911537] tpm_tis STM0925:00: 2.0 TPM (device-id 0x3, rev-id 0)
[    9.944535]  ? tpm_read_log_acpi+0x93/0x1e0
[    9.944578]  tpm_read_log_acpi+0x93/0x1e0
[    9.944586]  tpm_bios_log_setup+0x48/0x1e0
[    9.944593]  tpm_chip_register+0x38/0x210
[    9.944603]  tpm_tis_core_init.cold+0x2a7/0x45d
[    9.944613]  tpm_tis_plat_probe+0xd8/0x100
[   20.044163] intel_uncore_frequency_tpmi intel_vsec.tpmi-uncore.2: Uncore: Ignore: Unsupported minor version:2
[   20.044241] intel_uncore_frequency_tpmi intel_vsec.tpmi-uncore.2: Uncore: Ignore: Unsupported minor version:2
[   20.044279] intel_uncore_frequency_tpmi intel_vsec.tpmi-uncore.2: Uncore: Ignore: Unsupported minor version:2
[root@localhost Documents]# tsseventextend -if ./binary_bios_measurements -sim -v
Unable to open input file './binary_bios_measurements'
[root@localhost Documents]# 

> 2. I assume you parsed the measurements with tpm2_eventlog. Can you attach
> the unparsed binary instead?
The binary is from the command "tsseventextend -if ./^Cnary_bios_measurements -sim -v"
Please check the attached file.

> 3. I assume dmesg was generated without workaround. Is this true or not? I
> have not opened it yet (because lack of time).

Yes, the dmesg is from the machine with the bug. 

> I think these three questions should be enough for now. Don't waste your
> time on previous questions, I can deduce answers from these...
Comment 10 andy.liang 2024-11-14 04:48:48 UTC
Created attachment 307217 [details]
raw binary_bios_measurements file
Comment 11 Stefan Berger 2024-11-18 14:33:14 UTC
Binary log is 69k. Presumably this is the biggest log we saw so far since it's the first to fail. Cap acceptable log sizes at 96kb?
Comment 12 jarkko 2024-11-18 18:57:35 UTC
The bug report is not acceptable:

Oct 19 06:25:18 ubuntu kernel: [    0.000000] Linux version 5.15.0-119-generic (buildd@lcy02-amd64-075) (gcc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0, GNU ld (GNU Binutils for Ubuntu) 2.38) #129-Ubuntu SMP Fri Aug 2 19:25:20 UTC 2024 (Ubuntu 5.15.0-119.129-generic 5.15.160)
Oct 19 06:25:18 ubuntu kernel: [    0.000000] Command line: BOOT_IMAGE=/vmlinuz-5.15.0-119-generic root=/dev/mapper/ubuntu--vg--1-ubuntu--lv ro

Not going to investigate a bug in over three years old distribution kernel. You should contact Canonical for this.
Comment 13 Takashi Iwai 2024-11-20 07:38:35 UTC
It'd be better to choose different words...

Andy, please give the kernel log from the recent upstream kernel (at best 6.12).
If you don't know how to get the latest kernel, ask your distribution for help.
Comment 14 andy.liang 2024-11-20 09:15:37 UTC
Created attachment 307254 [details]
V6.12 dmesg
Comment 15 Takashi Iwai 2024-12-02 13:48:12 UTC
Jarkko, any progress on this?
As Stefan suggested, just capping the max size to 96kB would work?
Comment 16 jarkko 2024-12-03 06:22:29 UTC
I've missed this comment: https://bugzilla.kernel.org/show_bug.cgi?id=219495#c14.

Looking at dmesg next week.
Comment 17 jarkko 2024-12-03 06:23:39 UTC
Obviously before that you can send RFC patch to linux-integrity if you think you have a fix for the possible issue (consider this comment as neutral: neither encouraging nor discouraging for action).
Comment 18 jarkko 2024-12-09 12:56:54 UTC
(In reply to Takashi Iwai from comment #15)
> Jarkko, any progress on this?
> As Stefan suggested, just capping the max size to 96kB would work?

So, you should:

1. Create a patch.
2. Add relevant snippet of the dmesg to the commit message.
3. Explain why the size needs to be upgraded.
Comment 19 Takashi Iwai 2024-12-09 14:10:30 UTC
(In reply to jarkko from comment #18)
> (In reply to Takashi Iwai from comment #15)
> > Jarkko, any progress on this?
> > As Stefan suggested, just capping the max size to 96kB would work?
> 
> So, you should:
> 
> 1. Create a patch.
> 2. Add relevant snippet of the dmesg to the commit message.
> 3. Explain why the size needs to be upgraded.


Please don't shoot the messenger.  Neither of the original report and I understand / manage the relevant code better than you subsystem maintainers.

If you can give a patch for testing, I can build a test kernel and ask the original reporter, of course.  Thanks!
Comment 20 Takashi Iwai 2024-12-11 09:07:09 UTC
I saw that Stefan submitted the fix patch to the upstream:
  https://lore.kernel.org/20241210222608.598424-1-stefanb@linux.ibm.com

Andy, I'm building a few test kernels for SUSE systems on OBS:
home:tiwai:bsc1232421-sle15-sp6 repo for SLE15-SP6,
home:tiwai:bsc1232421-sle15-sp7 repo for SLE15-SP7,
home:tiwai:bsc1232421-stable-backport for 6.12.x kernel on SLE15-SPx.

Please test it later once after the build finishes and report back.
Comment 21 andy.liang 2024-12-12 11:38:31 UTC
Created attachment 307351 [details]
dmesg with kernel-default-6.4.0-150700.1.1.g45e5524.x86_64.rpm
Comment 22 andy.liang 2024-12-12 11:39:02 UTC
Created attachment 307352 [details]
screenshot with kernel-default-6.4.0-150700.1.1.g45e5524.x86_64.rpm
Comment 23 Stefan Berger 2024-12-12 13:22:48 UTC
I suppose the binary and ascii dumps of the log are still as before.
Comment 24 andy.liang 2024-12-13 06:22:58 UTC
Created attachment 307354 [details]
ACPI TPM2 DUMP with fix
Comment 25 andy.liang 2024-12-13 06:31:31 UTC
(In reply to Stefan Berger from comment #23)
> I suppose the binary and ascii dumps of the log are still as before.

The TPM2 DUMP still shows the TPM event log size as 8MB. Thank you.
Comment 26 andy.liang 2024-12-13 06:31:31 UTC
(In reply to Stefan Berger from comment #23)
> I suppose the binary and ascii dumps of the log are still as before.

The TPM2 DUMP still shows the TPM event log size as 8MB. Thank you.
Comment 27 jarkko 2024-12-14 03:28:23 UTC
(In reply to andy.liang from comment #26)
> (In reply to Stefan Berger from comment #23)
> > I suppose the binary and ascii dumps of the log are still as before.
> 
> The TPM2 DUMP still shows the TPM event log size as 8MB. Thank you.

And? We don't and should not change TPM2 table.
Comment 28 andy.liang 2024-12-20 01:42:57 UTC
(In reply to jarkko from comment #27)
> (In reply to andy.liang from comment #26)
> > (In reply to Stefan Berger from comment #23)
> > > I suppose the binary and ascii dumps of the log are still as before.
> > 
> > The TPM2 DUMP still shows the TPM event log size as 8MB. Thank you.
> 
> And? We don't and should not change TPM2 table.

(In reply to Stefan Berger from comment #23)
> I suppose the binary and ascii dumps of the log are still as before.

The binary_bios_measurements file only shows when the tpm_tis driver is loaded. Without the patch, no binary can be generated.
Comment 29 andy.liang 2024-12-20 01:45:52 UTC
(In reply to jarkko from comment #27)
> (In reply to andy.liang from comment #26)
> > (In reply to Stefan Berger from comment #23)
> > > I suppose the binary and ascii dumps of the log are still as before.
> > 
> > The TPM2 DUMP still shows the TPM event log size as 8MB. Thank you.
> 
> And? We don't and should not change TPM2 table.

The screenshot only proves the patch is working with the TPM eventlog is still 8 MB. I understand the patch does not change the TPM2 table.
Comment 30 andy.liang 2024-12-20 04:09:49 UTC
The kernel v6.12 dmesg is also submitted on Nov 20 and this version should be newer enough. I am using the HPE DL560 Gen11 as the verification platform. If you want more information from me, please give me what you need to resolve your concerns. Thanks.

The ACPI spec doesn't address the TPM event log size limitations at all so any other BIOS requests more than 4MB should all hit this issue can cause the kernel tained. In order to save time for future fix requests, we should move this forward.
Comment 31 jarkko 2024-12-20 14:06:15 UTC
OK, I somehow managed to miss it as it had different naming convention sorry.

Anywhow, I see this:

[   10.693310][    T1] tpm_tis STM0925:00: 2.0 TPM (device-id 0x3, rev-id 0)
[   10.848132][    T1] ------------[ cut here ]------------
[   10.853559][    T1] WARNING: CPU: 59 PID: 1 at mm/page_alloc.c:4727 __alloc_pages_noprof+0x2ca/0x330
[   10.862827][    T1] Modules linked in:
[   10.866671][    T1] CPU: 59 UID: 0 PID: 1 Comm: swapper/0 Not tainted 6.12.0-lp155.2.g52785e2-default #1 openSUSE Tumbleweed (unreleased) 588cd98293a7c9eba9013378d807364c088c9375
[   10.882741][    T1] Hardware name: HPE ProLiant DL320 Gen12/ProLiant DL320 Gen12, BIOS 1.20 10/28/2024
[   10.892170][    T1] RIP: 0010:__alloc_pages_noprof+0x2ca/0x330
[   10.898103][    T1] Code: 24 08 e9 4a fe ff ff e8 34 36 fa ff e9 88 fe ff ff 83 fe 0a 0f 86 b3 fd ff ff 80 3d 01 e7 ce 01 00 75 09 c6 05 f8 e6 ce 01 01 <0f> 0b 45 31 ff e9 e5 fe ff ff f7 c2 00 00 08 00 75 42 89 d9 80 e1
[   10.917750][    T1] RSP: 0000:ffffb7cf40077980 EFLAGS: 00010246
[   10.923777][    T1] RAX: 0000000000000000 RBX: 0000000000040cc0 RCX: 0000000000000000
[   10.931727][    T1] RDX: 0000000000000000 RSI: 000000000000000c RDI: 0000000000040cc0
[   10.939678][    T1] RBP: 000000000000000c R08: ffffffffbb6fdc67 R09: 0000000000000000
[   10.947626][    T1] R10: ffffb7cf40077ac8 R11: 0000000000000000 R12: 0000000000000000
[   10.955560][    T1] R13: 00000000ffffffff R14: 0000000000000cc0 R15: ffff9a5c051cc000
[   10.963507][    T1] FS:  0000000000000000(0000) GS:ffff9a6348780000(0000) knlGS:0000000000000000
[   10.972405][    T1] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   10.978944][    T1] CR2: 0000000000000000 CR3: 0000000184638001 CR4: 0000000000f70ef0
[   10.986891][    T1] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[   10.994837][    T1] DR3: 0000000000000000 DR6: 00000000ffff07f0 DR7: 0000000000000400
[   11.002770][    T1] PKRU: 55555554
[   11.006256][    T1] Call Trace:
[   11.009479][    T1]  <TASK>
[   11.012352][    T1]  ? __alloc_pages_noprof+0x2ca/0x330


RSI=0x0c, which maps to the parameter 'order' in __alloc_pages_noprof(), which allocates 
4096*2**0xc = 16777216 = 16 MiB of memory. This means that memory consumed by the log is in the range 8 MiB < N <= 16 MiB.

It's a huge buffer to bind. Are you sure that there is no BIOS bug and what is the exact target hardware, BIOS type and version etc. Have you reported to the vendor? This definitely is not normal.

And I don't understand where Stefan got his size figures, which are completely different...
Comment 32 Matthew Garrett 2024-12-20 14:35:29 UTC
The spec doesn't define any maximum size for the buffer, so while this is ludicrous it's not inherently a bug in the firmware - what the firmware is communicating to us is that it's allocated at least that much space, and if we were still logging events in the firmware environment we could do so until we hit that limit. I believe that Stefan's figures are for the calculated size of the log rather than for the table's buffer size. An alternative approach to allocating the LAML would be to map the log area, use the code we already have for calculating how long the log *actually* is, and then only allocating that, which would be more computationally intensive but would probably save RAM? 

Alternatively, check whether the LAML is above some arbitrary size, throw a BIOS_BUG, and just copy that much. This would work fine in this scenario (where the log is much less than the LAML value)
Comment 33 jarkko 2024-12-20 17:50:12 UTC
(In reply to Matthew Garrett from comment #32)
> calculated size of the log rather than for the table's buffer size. An
> alternative approach to allocating the LAML would be to map the log area,
> use the code we already have for calculating how long the log *actually* is,
> and then only allocating that, which would be more computationally intensive
> but would probably save RAM? 

So what you're saying matches what I'm suggesting for the most part doesn't it? [1]

[1] https://lore.kernel.org/linux-integrity/D6FS8ALS4HSV.2CSS6SGE8ND09@kernel.org/
Comment 34 jarkko 2024-12-20 18:30:34 UTC
I'm working on a learner fix than my suggestion. A patch should be eventually out...
Comment 35 jarkko 2024-12-20 18:46:57 UTC
Copy-pasting what I wrote to the header of tpm_read_log_acpi() should clear up a bit what I'm aiming for:

/*
 * Find out the physical address range for the event log from TCPA and TPM2
 * ACPI tables. If successful, sets TPM_CHIP_FLAG_EVENTLOG_ACPI. Also,
 * chip->bios_event_log and chip->bios_event_log_end will end up containing
 * physical address range, rather than virtual. This range must be mapped
 * with acpi_os_map_iomem().
 */

int tpm_read_log_acpi(struct tpm_chip *chip)
{
	u8 event[sizeof(struct tcg_pcr_event) + sizeof(struct tcg_efi_specid_event_head)];
	struct tcg_pcr_event *event_header = (struct tcg_pcr_event *)event;
	struct tcg_efi_specid_event_head *efispecid =

Then just tweak the code paths in eventlog/tpm2.c and that should fix this up. I read a fixed-size buffer just to open code tpm_is_tpm2_log() to the function and that way also the pre-existing code is still needed, just without the allocation part.
Comment 36 jarkko 2024-12-20 19:14:52 UTC
Created attachment 307378 [details]
tpm: Don't copy ACPI event log

OK, so this is about 50% done what I had in mind but kind of gives enough hints of the concept. Comments welcome. Now all it needs is just tpm2.c tweaks, which should be somewhat easy to implement.
Comment 37 jarkko 2024-12-20 19:16:10 UTC
Errata: memcpy_fromio(event_buf, virt, len) should be memcpy_fromio(event-buf, virt, sizeof(event_buf))
Comment 38 jarkko 2024-12-20 19:17:09 UTC
Errata 2: missing:

chip->bios_event_log = start;
chip->bios_event_log_end = start + len;
Comment 39 jarkko 2024-12-20 20:16:22 UTC
Since ACPI's own code uses normal memory accesses to the memory mapped by acpi_os_map_io_mem(), we can too:

https://elixir.bootlin.com/linux/v6.11.5/source/drivers/acpi/apei/einj-core.c#L222
Comment 40 jarkko 2024-12-20 20:33:49 UTC
Errata 3: Better to keep semantics of TPM 1 path for ACPI as it were as we don't want to mod tpm1.c.
Comment 41 jarkko 2024-12-20 20:34:39 UTC
These ideas pretty much will solve this equation I'd should have patch out eventually...
Comment 42 jarkko 2024-12-20 22:48:16 UTC
Created attachment 307380 [details]
tpm: Map the ACPI provided event log [compiled tested only]

Only compile-tested version (test at your own risk).
Comment 43 jarkko 2024-12-21 00:21:09 UTC
Created attachment 307381 [details]
tpm: Map the ACPI provided event log [in-progress]
Comment 44 jarkko 2024-12-21 11:35:05 UTC
Created attachment 307383 [details]
[PATCH] tpm: Map the ACPI provided event log

https://lore.kernel.org/linux-integrity/20241221113318.562138-1-jarkko@kernel.org/T/#u
Comment 45 jarkko 2024-12-21 12:46:21 UTC
Created attachment 307384 [details]
[PATCH v2] tpm: Map the ACPI provided event log

https://lore.kernel.org/linux-integrity/20241221124447.774553-1-jarkko@kernel.org/T/#u
Comment 46 jarkko 2024-12-21 14:11:45 UTC
I tried my patch with swtpm with both TPM 1.2 and TPM 2.0 emulations and also emulated both TIS and CRB chips. I don't have large regions but the test should give pretty good coverage, given that now memory mapping is always used for ACPI log instead of allocating a buffer.
Comment 47 jarkko 2024-12-21 14:12:14 UTC
That said, obviously requires testing with the actual workload.
Comment 48 Takashi Iwai 2024-12-22 08:34:44 UTC
(In reply to jarkko from comment #47)
> That said, obviously requires testing with the actual workload.

Andy, FWIW, I'm building test kernels based on the latest 6.12.x with Jarkko's v2 patch for (open)SUSE kernels; they'll be available later at:

    https://download.opensuse.org/repositories/home:/tiwai:/bsc1232421-v2/backport/

and

    https://download.opensuse.org/repositories/home:/tiwai:/bsc1232421-v2/standard/

The former is to be installed on SLE15-SP6 system while the latter is for openSUSE Tumbleweed system.  If that helps, please give it a try and report back.  Thanks!
Comment 49 jarkko 2024-12-22 15:27:06 UTC
Created attachment 307385 [details]
[PATCH v3] tpm: Map the ACPI provided event log
Comment 50 jarkko 2024-12-22 22:41:31 UTC
Please put TPM2 table as file instead of image.
Comment 51 jarkko 2024-12-22 22:55:44 UTC
(In reply to jarkko from comment #31)
> OK, I somehow managed to miss it as it had different naming convention sorry.
> 
> Anywhow, I see this:
> 
> [   10.693310][    T1] tpm_tis STM0925:00: 2.0 TPM (device-id 0x3, rev-id 0)
> [   10.848132][    T1] ------------[ cut here ]------------
> [   10.853559][    T1] WARNING: CPU: 59 PID: 1 at mm/page_alloc.c:4727
> __alloc_pages_noprof+0x2ca/0x330
> [   10.862827][    T1] Modules linked in:
> [   10.866671][    T1] CPU: 59 UID: 0 PID: 1 Comm: swapper/0 Not tainted
> 6.12.0-lp155.2.g52785e2-default #1 openSUSE Tumbleweed (unreleased)
> 588cd98293a7c9eba9013378d807364c088c9375
> [   10.882741][    T1] Hardware name: HPE ProLiant DL320 Gen12/ProLiant
> DL320 Gen12, BIOS 1.20 10/28/2024
> [   10.892170][    T1] RIP: 0010:__alloc_pages_noprof+0x2ca/0x330
> [   10.898103][    T1] Code: 24 08 e9 4a fe ff ff e8 34 36 fa ff e9 88 fe ff
> ff 83 fe 0a 0f 86 b3 fd ff ff 80 3d 01 e7 ce 01 00 75 09 c6 05 f8 e6 ce 01
> 01 <0f> 0b 45 31 ff e9 e5 fe ff ff f7 c2 00 00 08 00 75 42 89 d9 80 e1
> [   10.917750][    T1] RSP: 0000:ffffb7cf40077980 EFLAGS: 00010246
> [   10.923777][    T1] RAX: 0000000000000000 RBX: 0000000000040cc0 RCX:
> 0000000000000000
> [   10.931727][    T1] RDX: 0000000000000000 RSI: 000000000000000c RDI:
> 0000000000040cc0
> [   10.939678][    T1] RBP: 000000000000000c R08: ffffffffbb6fdc67 R09:
> 0000000000000000
> [   10.947626][    T1] R10: ffffb7cf40077ac8 R11: 0000000000000000 R12:
> 0000000000000000
> [   10.955560][    T1] R13: 00000000ffffffff R14: 0000000000000cc0 R15:
> ffff9a5c051cc000
> [   10.963507][    T1] FS:  0000000000000000(0000) GS:ffff9a6348780000(0000)
> knlGS:0000000000000000
> [   10.972405][    T1] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [   10.978944][    T1] CR2: 0000000000000000 CR3: 0000000184638001 CR4:
> 0000000000f70ef0
> [   10.986891][    T1] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
> 0000000000000000
> [   10.994837][    T1] DR3: 0000000000000000 DR6: 00000000ffff07f0 DR7:
> 0000000000000400
> [   11.002770][    T1] PKRU: 55555554
> [   11.006256][    T1] Call Trace:
> [   11.009479][    T1]  <TASK>
> [   11.012352][    T1]  ? __alloc_pages_noprof+0x2ca/0x330
> 
> 
> RSI=0x0c, which maps to the parameter 'order' in __alloc_pages_noprof(),
> which allocates 
> 4096*2**0xc = 16777216 = 16 MiB of memory. This means that memory consumed
> by the log is in the range 8 MiB < N <= 16 MiB.

I don't understand why ESI=0x0c while TPM2 table has size 0x800000, which maps to the order 0x0b as 4096 * 2**0x0b = 0x800000.

It looks like as if TPM2 table and this transcript are either from different machines or different BIOS versions. I'm not sure how to otherwise explain that difference.

For 0x0c, the last line in that TPM2 dump screenshot should start: 00000001 not 00008000.
Comment 52 Takashi Iwai 2024-12-23 07:31:43 UTC
OK, the test kernels for (open)SUSE with v3 patch are being built in OBS home:tiwai:bsc1232421-v3 repo instead.  Give it a try later.
Comment 53 andy.liang 2024-12-23 08:38:38 UTC
Created attachment 307386 [details]
No Patch dmesg with Kernel v6.12.6
Comment 54 andy.liang 2024-12-23 08:40:51 UTC
(In reply to jarkko from comment #51)
> (In reply to jarkko from comment #31)
> > OK, I somehow managed to miss it as it had different naming convention
> sorry.
> > 
> > Anywhow, I see this:
> > 
> > [   10.693310][    T1] tpm_tis STM0925:00: 2.0 TPM (device-id 0x3, rev-id
> 0)
> > [   10.848132][    T1] ------------[ cut here ]------------
> > [   10.853559][    T1] WARNING: CPU: 59 PID: 1 at mm/page_alloc.c:4727
> > __alloc_pages_noprof+0x2ca/0x330
> > [   10.862827][    T1] Modules linked in:
> > [   10.866671][    T1] CPU: 59 UID: 0 PID: 1 Comm: swapper/0 Not tainted
> > 6.12.0-lp155.2.g52785e2-default #1 openSUSE Tumbleweed (unreleased)
> > 588cd98293a7c9eba9013378d807364c088c9375
> > [   10.882741][    T1] Hardware name: HPE ProLiant DL320 Gen12/ProLiant
> > DL320 Gen12, BIOS 1.20 10/28/2024
> > [   10.892170][    T1] RIP: 0010:__alloc_pages_noprof+0x2ca/0x330
> > [   10.898103][    T1] Code: 24 08 e9 4a fe ff ff e8 34 36 fa ff e9 88 fe
> ff
> > ff 83 fe 0a 0f 86 b3 fd ff ff 80 3d 01 e7 ce 01 00 75 09 c6 05 f8 e6 ce 01
> > 01 <0f> 0b 45 31 ff e9 e5 fe ff ff f7 c2 00 00 08 00 75 42 89 d9 80 e1
> > [   10.917750][    T1] RSP: 0000:ffffb7cf40077980 EFLAGS: 00010246
> > [   10.923777][    T1] RAX: 0000000000000000 RBX: 0000000000040cc0 RCX:
> > 0000000000000000
> > [   10.931727][    T1] RDX: 0000000000000000 RSI: 000000000000000c RDI:
> > 0000000000040cc0
> > [   10.939678][    T1] RBP: 000000000000000c R08: ffffffffbb6fdc67 R09:
> > 0000000000000000
> > [   10.947626][    T1] R10: ffffb7cf40077ac8 R11: 0000000000000000 R12:
> > 0000000000000000
> > [   10.955560][    T1] R13: 00000000ffffffff R14: 0000000000000cc0 R15:
> > ffff9a5c051cc000
> > [   10.963507][    T1] FS:  0000000000000000(0000)
> GS:ffff9a6348780000(0000)
> > knlGS:0000000000000000
> > [   10.972405][    T1] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > [   10.978944][    T1] CR2: 0000000000000000 CR3: 0000000184638001 CR4:
> > 0000000000f70ef0
> > [   10.986891][    T1] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
> > 0000000000000000
> > [   10.994837][    T1] DR3: 0000000000000000 DR6: 00000000ffff07f0 DR7:
> > 0000000000000400
> > [   11.002770][    T1] PKRU: 55555554
> > [   11.006256][    T1] Call Trace:
> > [   11.009479][    T1]  <TASK>
> > [   11.012352][    T1]  ? __alloc_pages_noprof+0x2ca/0x330
> > 
> > 
> > RSI=0x0c, which maps to the parameter 'order' in __alloc_pages_noprof(),
> > which allocates 
> > 4096*2**0xc = 16777216 = 16 MiB of memory. This means that memory consumed
> > by the log is in the range 8 MiB < N <= 16 MiB.
> 
> I don't understand why ESI=0x0c while TPM2 table has size 0x800000, which
> maps to the order 0x0b as 4096 * 2**0x0b = 0x800000.
> 
> It looks like as if TPM2 table and this transcript are either from different
> machines or different BIOS versions. I'm not sure how to otherwise explain
> that difference.
> 
> For 0x0c, the last line in that TPM2 dump screenshot should start: 00000001
> not 00008000.

Sorry about that. The machine I had in November is different from the one in December. I uploaded a new dmesg with Kernel 6.12.6. Thank you. 
https://bugzilla.kernel.org/attachment.cgi?id=307386
Comment 55 andy.liang 2024-12-23 08:42:35 UTC
Created attachment 307387 [details]
v2-patch-binary_bios_measurements_kernel-6.12.6
Comment 56 andy.liang 2024-12-23 08:43:03 UTC
Created attachment 307388 [details]
v2-patch-kernel-6.12.6-lp155.1.1.g509a912.log
Comment 57 andy.liang 2024-12-23 08:43:26 UTC
Created attachment 307389 [details]
v3-patch-binary_bios_measurements_kernel-6.12.6.raw
Comment 58 andy.liang 2024-12-23 08:43:50 UTC
Created attachment 307390 [details]
v3-patch-kernel-6.12.6-lp155.2.1.gfb072de.log
Comment 59 andy.liang 2024-12-23 08:44:50 UTC
The binary dump and kernel log for v2 and v3 patches are uploaded. Thank you.
Comment 60 jarkko 2024-12-23 11:05:03 UTC
(In reply to andy.liang from comment #54)
> (In reply to jarkko from comment #51)
> > (In reply to jarkko from comment #31)
> > > OK, I somehow managed to miss it as it had different naming convention
> > sorry.
> > > 
> > > Anywhow, I see this:
> > > 
> > > [   10.693310][    T1] tpm_tis STM0925:00: 2.0 TPM (device-id 0x3, rev-id
> > 0)
> > > [   10.848132][    T1] ------------[ cut here ]------------
> > > [   10.853559][    T1] WARNING: CPU: 59 PID: 1 at mm/page_alloc.c:4727
> > > __alloc_pages_noprof+0x2ca/0x330
> > > [   10.862827][    T1] Modules linked in:
> > > [   10.866671][    T1] CPU: 59 UID: 0 PID: 1 Comm: swapper/0 Not tainted
> > > 6.12.0-lp155.2.g52785e2-default #1 openSUSE Tumbleweed (unreleased)
> > > 588cd98293a7c9eba9013378d807364c088c9375
> > > [   10.882741][    T1] Hardware name: HPE ProLiant DL320 Gen12/ProLiant
> > > DL320 Gen12, BIOS 1.20 10/28/2024
> > > [   10.892170][    T1] RIP: 0010:__alloc_pages_noprof+0x2ca/0x330
> > > [   10.898103][    T1] Code: 24 08 e9 4a fe ff ff e8 34 36 fa ff e9 88 fe
> > ff
> > > ff 83 fe 0a 0f 86 b3 fd ff ff 80 3d 01 e7 ce 01 00 75 09 c6 05 f8 e6 ce
> 01
> > > 01 <0f> 0b 45 31 ff e9 e5 fe ff ff f7 c2 00 00 08 00 75 42 89 d9 80 e1
> > > [   10.917750][    T1] RSP: 0000:ffffb7cf40077980 EFLAGS: 00010246
> > > [   10.923777][    T1] RAX: 0000000000000000 RBX: 0000000000040cc0 RCX:
> > > 0000000000000000
> > > [   10.931727][    T1] RDX: 0000000000000000 RSI: 000000000000000c RDI:
> > > 0000000000040cc0
> > > [   10.939678][    T1] RBP: 000000000000000c R08: ffffffffbb6fdc67 R09:
> > > 0000000000000000
> > > [   10.947626][    T1] R10: ffffb7cf40077ac8 R11: 0000000000000000 R12:
> > > 0000000000000000
> > > [   10.955560][    T1] R13: 00000000ffffffff R14: 0000000000000cc0 R15:
> > > ffff9a5c051cc000
> > > [   10.963507][    T1] FS:  0000000000000000(0000)
> > GS:ffff9a6348780000(0000)
> > > knlGS:0000000000000000
> > > [   10.972405][    T1] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > > [   10.978944][    T1] CR2: 0000000000000000 CR3: 0000000184638001 CR4:
> > > 0000000000f70ef0
> > > [   10.986891][    T1] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
> > > 0000000000000000
> > > [   10.994837][    T1] DR3: 0000000000000000 DR6: 00000000ffff07f0 DR7:
> > > 0000000000000400
> > > [   11.002770][    T1] PKRU: 55555554
> > > [   11.006256][    T1] Call Trace:
> > > [   11.009479][    T1]  <TASK>
> > > [   11.012352][    T1]  ? __alloc_pages_noprof+0x2ca/0x330
> > > 
> > > 
> > > RSI=0x0c, which maps to the parameter 'order' in __alloc_pages_noprof(),
> > > which allocates 
> > > 4096*2**0xc = 16777216 = 16 MiB of memory. This means that memory
> consumed
> > > by the log is in the range 8 MiB < N <= 16 MiB.
> > 
> > I don't understand why ESI=0x0c while TPM2 table has size 0x800000, which
> > maps to the order 0x0b as 4096 * 2**0x0b = 0x800000.
> > 
> > It looks like as if TPM2 table and this transcript are either from
> different
> > machines or different BIOS versions. I'm not sure how to otherwise explain
> > that difference.
> > 
> > For 0x0c, the last line in that TPM2 dump screenshot should start: 00000001
> > not 00008000.
> 
> Sorry about that. The machine I had in November is different from the one in
> December. I uploaded a new dmesg with Kernel 6.12.6. Thank you. 
> https://bugzilla.kernel.org/attachment.cgi?id=307386

OK thanks for responding! I just needed confirmation that I'm not interpreting the log incorrectly :-)
Comment 61 jarkko 2024-12-23 11:07:48 UTC
(In reply to andy.liang from comment #59)
> The binary dump and kernel log for v2 and v3 patches are uploaded. Thank you.

Awesome I'll check them later on thanks.
Comment 62 jarkko 2024-12-23 18:32:22 UTC
(In reply to jarkko from comment #61)
> (In reply to andy.liang from comment #59)
> > The binary dump and kernel log for v2 and v3 patches are uploaded. Thank
> you.
> 
> Awesome I'll check them later on thanks.

So, v3 log looks great.

Have you tried to copy /sys/kernel/security/tpm0/binary_measurements?
Comment 63 jarkko 2024-12-23 18:34:03 UTC
(In reply to jarkko from comment #62)
> (In reply to jarkko from comment #61)
> > (In reply to andy.liang from comment #59)
> > > The binary dump and kernel log for v2 and v3 patches are uploaded. Thank
> > you.
> > 
> > Awesome I'll check them later on thanks.
> 
> So, v3 log looks great.
> 
> Have you tried to copy /sys/kernel/security/tpm0/binary_measurements?

Oops! Sorry it was there. It looks great too.

Mind if I add Tested-by from you to the patch?
Comment 64 jarkko 2024-12-23 18:34:38 UTC
Both v2 and v3 look totally legit.
Comment 65 andy.liang 2024-12-24 01:43:07 UTC
(In reply to jarkko from comment #63)
> (In reply to jarkko from comment #62)
> > (In reply to jarkko from comment #61)
> > > (In reply to andy.liang from comment #59)
> > > > The binary dump and kernel log for v2 and v3 patches are uploaded.
> Thank
> > > you.
> > > 
> > > Awesome I'll check them later on thanks.
> > 
> > So, v3 log looks great.
> > 
> > Have you tried to copy /sys/kernel/security/tpm0/binary_measurements?
> 
> Oops! Sorry it was there. It looks great too.
> 
> Mind if I add Tested-by from you to the patch?

No problems at all. Thank you for your help.

Note You need to log in before you can comment on or make changes to this bug.