Bug 98181
Summary: | tpm_crb stacktrace: ioremap - invalid physical address | ||
---|---|---|---|
Product: | Drivers | Reporter: | mick.saunders+kernel |
Component: | Other | Assignee: | drivers_other |
Status: | RESOLVED PATCH_ALREADY_AVAILABLE | ||
Severity: | normal | CC: | jarkko.sakkinen, kernel, kyle, michael, oscar, yuv.adm |
Priority: | P1 | ||
Hardware: | x86-64 | ||
OS: | Linux | ||
Kernel Version: | 4.0.1-1-ARCH | Subsystem: | |
Regression: | No | Bisected commit-id: | |
Attachments: | /sys/firmware/acpi/tables/TPM2 |
Description
mick.saunders+kernel
2015-05-13 00:07:21 UTC
Looking into this. Could you provide dump of TPM2 ACPI table: /sys/firmware/acpi/tables/TPM2 (i.e. cp and attach it to the bug). No worries. I should be able to provide this tomorrow. Created attachment 178871 [details]
/sys/firmware/acpi/tables/TPM2
I think the BIOS gives a corrupted table: $ xxd -c8 -p TPM2 | sed -e 's/.\{2\}/& /g' 54 50 4d 32 34 00 00 00 03 c2 00 00 00 00 00 00 54 70 6d 32 54 61 62 6c 01 00 00 00 41 4d 49 20 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 06 00 00 00 Looking at the end of the dump there is field called "the start method". Without going into details the number six looks like a bogus value for this field. The 8 bytes before it specify the physical address of the control area (memory mapped interface for sending commands to TPM). It is set to zero, which is obviously a bogus value. This is a bug in your AMI BIOS. I would suggest updating the BIOS and if that does not work you should report this issue to AMI. I talked about the bug with Matt Fleming and his position was that there should be a workaround for this BIOS bug. I tend to agree because it is fairly easy to recognize (physical address of the control area is zero). I've already planned how to workaround this and will be submitting a fix as soon as possible. Sorry to bother you again with this but could you do acpidump -o acpitables.txt (full dump of all ACPI tables) and email the result to me. Thanks and sorry for the trouble (should have asked full dump at the first time). I have e-mailed you with the output. Thanks again for looking into this. I am also seeing this on an Intel NUC5i5MYHE that also has an AMI BIOS. Would it be helpful for me to send any debugging info from my system, or do you already have all you need? (In reply to Michael Marley from comment #9) > I am also seeing this on an Intel NUC5i5MYHE that also has an AMI BIOS. > Would it be helpful for me to send any debugging info from my system, or do > you already have all you need? Can you email it to me? I'll check that it's the same issue that physical address is zero. As a short term quick fix the driver should return error code when the address is zero (so that it doesn't try to ioremap zero) and driver fails to initialize more gracefully... I sent you an email. Thanks! I also tried reporting this issue to Intel hardware support, but they told me to bugger off because I am not running Microsoft® Windows®. Jarkko, maybe since you work at Intel you would know the right people to contact? Your AMI BIOS has the same bug (checked the dumps that you sent). The issue should be eventually fixed in the BIOS. Only AMI can help you with that. I submitted a patch that fails gracefully when the contents are corrupted: https://lkml.org/lkml/2015/6/24/386 Does that mean that the idea you had for a workaround previously won't work? I've discussed about the approach but not yet sure whether it is the right thing to do a workaround. On the other hand the patch that I sent is obvious bug fix because it prevents from using corrupted data inside the kernel. In short term that is sane thing to do. I'm not sure yet what is the long-term solution or should kernel report the error and leave it up to the BIOS vendor the fix the real bug (instead of masking the real bug). /Jarkko I definitely see that viewpoint, and I am going to see what I can do about contacting AMI today. I expect, however, that they will just refer me to Intel hardware support, who has already indicated that they have no interest in fixing my problem because I am running Linux. My attempts to contact AMI and Intel about this problem have failed. I received no response from AMI and Intel either tells me to use Windows® or does not respond. I'm waiting still feedback for my first patch. That's why this hasn't progressed. Same issue also on Intel NUC5i5MYHE The TPM 2.0 spec seems to suggest that the start method=6 value is not entirely bogus, see Table 8 in https://www.trustedcomputinggroup.org/files/static_page_files/648D7D46-1A4B-B294-D088037B8F73DAAF/TCG_ACPIGeneralSpecification_1-10_0-37-Published.pdf . 6 means "Reserved for the Memory mapped I/O Interface (TIS 1.2+Cancel)." Perhaps someone at AMI didn't quite understand the word reserved, but using 1.2 MMIO at 0xFED40000 seems to work. In particular, for the Intel NUC5i5MYHE on Linux 4.0 booting with tpm_crb.backlist=1 tpm_tis.force=1 results in a working TPM. Thanks for the feedback. And sorry for the late reply. I was on a vacation for four weeks. Jethro: It looks that NUC5i5MYHE has a discrete TPM2 chip. The reason why TIS driver does not autodetect the chip is obvious. The device ID for that particular chip is missing from the device ID table inside the TIS driver. That is the reason why it works when you use the force parameter. The CRB driver does not have a module parameter called backlist. Where does that come from? Do you also get initialization error from CRB driver? > The CRB driver does not have a module parameter called backlist. Where does > that come from? Sorry that was a complete typo on my end. Of course I meant: modprobe.blacklist=tpm_crb tpm_tis.force=1 > Do you also get initialization error from CRB driver? I don't quite remember and am unable to test it now. I can get back to you later with a full dmesg output. (In reply to Jethro Beekman from comment #22) > > The CRB driver does not have a module parameter called backlist. Where does > > that come from? > > Sorry that was a complete typo on my end. Of course I meant: > modprobe.blacklist=tpm_crb tpm_tis.force=1 I think what happens here is follows (you should verify this from BIOS settings): - dTPM 2.0 chip is enabled (that's why tpm_tis initializes when you use 'force'), PTT is disabled - For some reason the BIOS still exposes ACPI object for PTT. So it's a bug the ACPI tables. The reason why you have to use 'force' is that there is no device ID in tpm_tis driver for your chip. This patch makes the CRB fail cleanly in this situation: https://lkml.org/lkml/2015/6/24/386 It's already pulled to mainline and stable kernels. The second patch to make the dTPM 2.0 chip init correctly would be to put the device ID to tpm_tis driver. Could you run acpidump with the machine and send me the file to my email? I'm then able to fix the tpm_tis driver. > > Do you also get initialization error from CRB driver? > > I don't quite remember and am unable to test it now. I can get back to you > later with a full dmesg output. I think I don't need it at this point. I have fairly good picture already what is happening here. ACPI dump should be enough. After fix for CRB driver and tpm_tis, CRB will report one line error on boot but will fail cleanly and tpm_tis should initialize itself automatically without need for the 'force' parameter. /Jarkko This the part that I cannot comprehend really: Method (_HID, 0, NotSerialized) // _HID: Hardware ID { If (TCMF) { Return (0x01013469) } Else { If ((TTDP == Zero)) { Return (0x310CD041) } Else { Return ("MSFT0101") } } } Usually HIDs are readable strings. With my limited ACPI knowledge the device object for TPM looks weird but it might be also because I'm not really an ACPI expert! Anyway, I think that the BIOS has a bug that it ends up returning wrong HID for the dTPM. We can deduce this from the fact that tpm_crb is tried to initialize automatically. Again, I'm assuming here that HID can map to only for one device and one device only. The workaround that fix the issue? It's too folded solution: - In CRB driver check the start method. If it's 6, fail with error code 0. - In FIFO driver add MSFT0101 to the list of HIDs. If start method is not 6, fail with error code 0. I think in the FIFO case it's safe to use 0xFED40000 as the address if start method is 6. Have to see what code reviewers think about this once I send my bug fix to LKML. I got answer form one TCG guy. Yes. Regardless the device is FIFO or CRB based TPM 2.0 device it always identifies itself as MSFT0101. So the fix that I proposed is absolutely the right thing to do. When I try the tpm_tis.force=1 kernel argument on my NUC5i5MYHE, the TPM works but tpm_tis allocates every interrupt from 1 to 15, which clobbers (at least) the serial port driver and makes the serial port not work. I tried tpm_tis.force=1,interrupts=0 but that makes no difference. I'm still trying to acquire the hardware to test this. That's why things are lagging. Sorry. FYI https://lkml.org/lkml/2015/8/29/102. Testing this would help. No guarantees that it will work though because I haven't tested it myself (will probably in next week if I receive NUC5i5MYHE). The patch doesn't work for me. tpm_tis.force still works. There is no useful debugging information in dmesg. I'm still waiting for the NUC to be delivered. Not much I can do until I get it. The patch that I did before. I only checked that it compiles. It's not production quality so regressions are expected :) I made it available only because getting the environment has taken so long. Almost finalized fixes for the dTPM (two patches): https://github.com/jsakkine/linux-tpm2/tree/tis-acpi-fix Still testing and adjusting since the code changes are quite big so that I don't break anything. I can confirm that this works correctly for me, thanks! This can be closed down once v4.4-rc1 becomes available. Fixes will land to v4.4. This *was not* a BIOS bug but a bug in detection of the TPM device in tpm_tis. Mark as RESOLVED? I don't seem to have authority to do that. Still seeing this bug on 4.4.1 (Arch Linux) $ uname -a Linux hostname 4.4.1-2-ARCH #1 SMP PREEMPT Wed Feb 3 13:12:33 UTC 2016 x86_64 GNU/Linux $ dmesg [ 15.732458] ------------[ cut here ]------------ [ 15.732463] WARNING: CPU: 3 PID: 389 at arch/x86/mm/ioremap.c:198 __ioremap_caller+0x234/0x390() [ 15.732464] Info: mapping multiple BARs. Your kernel is fine. [ 15.732465] Modules linked in: [ 15.732467] tpm_crb(+) industrialio fjes acpi_pad pcc_cpufreq(-) acpi_cpufreq(-) tpm_tis tpm evdev processor mac_hid sch_fq_codel ip_tables x_tables sha256_ssse3 sha256_generic hmac drbg ansi_cprng algif_skcipher af_alg hid_generic hid_logitech_hidpp hid_logitech_dj usbhid hid crct10dif_pclmul crc32_pclmul crc32c_intel aesni_intel lrw gf128mul glue_helper ablk_helper cryptd nvme ahci sdhci_pci libahci sdhci led_class xhci_pci libata xhci_hcd mmc_core scsi_mod usbcore usb_common i915 video button intel_gtt i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm aes_x86_64 dm_crypt dm_mod ext4 crc16 mbcache jbd2 [ 15.734562] CPU: 3 PID: 389 Comm: systemd-udevd Tainted: G U 4.4.1-2-ARCH #1 [ 15.734564] Hardware name: /NUC6i5SYB, BIOS SYSKLi35.86A.0028.2015.1112.1822 11/12/2015 [ 15.734566] 0000000000000000 0000000078e33dfa ffff88045ca279a0 ffffffff812c7f39 [ 15.734569] ffff88045ca279e8 ffff88045ca279d8 ffffffff810765b2 00000000fed40040 [ 15.734572] ffffc90001940040 0000000000001000 0000000000000040 ffff88045beab280 [ 15.734574] Call Trace: [ 15.734580] [<ffffffff812c7f39>] dump_stack+0x4b/0x72 [ 15.734584] [<ffffffff810765b2>] warn_slowpath_common+0x82/0xc0 [ 15.734587] [<ffffffff8107664c>] warn_slowpath_fmt+0x5c/0x80 [ 15.734589] [<ffffffff8107cc27>] ? iomem_map_sanity_check+0x97/0xd0 [ 15.734593] [<ffffffff81064a84>] __ioremap_caller+0x234/0x390 [ 15.734595] [<ffffffff81064bf7>] ioremap_nocache+0x17/0x20 [ 15.734610] [<ffffffff812e0a82>] devm_ioremap_nocache+0x42/0x80 [ 15.734615] [<ffffffffa051c238>] crb_acpi_add+0x108/0x2d0 [tpm_crb] [ 15.734618] [<ffffffff8134b6b4>] acpi_device_probe+0x4f/0xf5 [ 15.734621] [<ffffffff813eb652>] driver_probe_device+0x222/0x4a0 [ 15.734623] [<ffffffff813eb954>] __driver_attach+0x84/0x90 [ 15.734625] [<ffffffff813eb8d0>] ? driver_probe_device+0x4a0/0x4a0 [ 15.734628] [<ffffffff813e928c>] bus_for_each_dev+0x6c/0xc0 [ 15.734631] [<ffffffff813eae0e>] driver_attach+0x1e/0x20 [ 15.734633] [<ffffffff813ea95b>] bus_add_driver+0x1eb/0x280 [ 15.734635] [<ffffffffa0521000>] ? 0xffffffffa0521000 [ 15.734637] [<ffffffff813ec260>] driver_register+0x60/0xe0 [ 15.734639] [<ffffffff8134b583>] acpi_bus_register_driver+0x3b/0x43 [ 15.734643] [<ffffffffa0521010>] crb_acpi_driver_init+0x10/0x1000 [tpm_crb] [ 15.734645] [<ffffffff81002123>] do_one_initcall+0xb3/0x200 [ 15.734649] [<ffffffff811619d7>] do_init_module+0x5f/0x1e8 [ 15.734652] [<ffffffff810fbf7f>] load_module+0x219f/0x27e0 [ 15.734654] [<ffffffff810f8e70>] ? symbol_put_addr+0x50/0x50 [ 15.734658] [<ffffffff810fc70e>] SyS_init_module+0x14e/0x190 [ 15.734662] [<ffffffff81591b2e>] entry_SYSCALL_64_fastpath+0x12/0x71 [ 15.734664] ---[ end trace ca977c5b5ff297f5 ]--- (In reply to Yuval Adam from comment #38) > Still seeing this bug on 4.4.1 (Arch Linux) You are seeing a different regression (not this one) https://bugzilla.kernel.org/show_bug.cgi?id=111511 Thanks for the pointer Jarkko! 399235dc6e954 |