Bug 213667
Summary: | e1000e leads to lockup while accessing its nvm | ||
---|---|---|---|
Product: | Drivers | Reporter: | AceLan Kao (acelan) |
Component: | Network | Assignee: | drivers_network (drivers_network) |
Status: | RESOLVED CODE_FIX | ||
Severity: | normal | CC: | amir.avivi, avi.shalev, dima.ruinskiy, gianluca.pindinelli, k.pagel, marcinropa, martin.hamant, rex.tsai, sasha.neftin, vitaly.lifshits |
Priority: | P1 | ||
Hardware: | All | ||
OS: | Linux | ||
Kernel Version: | 5.10 & 5.13 | Subsystem: | |
Regression: | No | Bisected commit-id: | |
Attachments: |
dmesg.log
v1-0001-e1000e-Do-not-take-care-about-NVM-checksum v1-0001-e1000e-Do-not-take-care-about-NVM-checksumBACK attachment-32179-0.html |
why NVM checksum is bad? (in original NVM with good checksum the driver won't try to recovery checksum) why NVM checksum is bad? (in original NVM with good checksum the driver won't try to recovery checksum) I looked at the attached log - first crash reported from i915 (graphic) I would suggest work with your vendor and check your system. Created attachment 297863 [details]
v1-0001-e1000e-Do-not-take-care-about-NVM-checksum
Created attachment 297865 [details]
v1-0001-e1000e-Do-not-take-care-about-NVM-checksumBACK
Hi, Intel worked with Dell and their partners to confirm that the issue is related to incorrect checksum in GbE NVM. Dell and their partners are updating their process to write correct checksum in GbE NVM after they update anything in GbE NVM. In parallel, Intel will propose to remove checksum correction from e1000e driver due to new design change on GbE since recent platforms. I upload two patches; I will request Dell and their partners to verify. Rex Confirmed the patches fix the issue. Thanks for the confirmation. I will provide the commit id for tracking. Please, work with your vendor to properly calculate checksum commit 7b0827770dee8c5c08f97ea65568b834e60438f6 On the Dell Precision 7560 the e1000e driver doesn't load due to an error: e1000e 0000:00:1f.6: The NVM Checksum Is Not Valid I tested this with Gentoo with kernel 5.10.76 and also tried Ubuntu 20.04.3 LTS and 21.10 (kernel 5.13) It seems that the patch causes the NVM is not updated, however, the checksum is still verified, and if its not correct, the driver is not loaded, network card doesn't work. + if (hw->mac.type < e1000_pch_cnp) { + data |= valid_csum_mask; + ret_val = e1000_write_nvm(hw, word, 1, &data); + if (ret_val) + return ret_val; + ret_val = e1000e_update_nvm_checksum(hw); + if (ret_val) + return ret_val; + } I added some prints to the driver and it seems that the expected checksum is 0xbaba but the calculated value is 0xbcba. I do not know if it helps but: The e1000_probe function selects const struct e1000_info *ei = e1000_info_tbl[ent->driver_data]; ent->driver_data = 14 => board_pch_tgp I checked EEPROM and the selected element is: #define E1000_DEV_ID_PCH_TGP_I219_LM14 0x15F9 I know I can remove the checksum verification code, but I am curious if there is any other option. Otherwise, I will have to do this every time my distribution updates the kernel. Many thanks Marcin Can anyone advise me at least if this error is important? What does it really mean? Can this problem be solved somehow or is it better to return the hardware? -Marcin Created attachment 299645 [details]
attachment-32179-0.html
Out of office. Expected delayed response
I have same identical problem on my Precision 7560 with Ubuntu 20.04 OEM, kernel 5.13.0-1019-oem: + e1000e 0000:00:1f.6: The NVM Checksum Is Not Valid + e1000e: probe of 0000:00:1f.6 failed with error -5 Is there any plans to fix this at the kernel level or is it necessary to request hardware replacement (even if it works correctly despite the wrong checksum)? Unfortunately it is not possible for me to compile the form at each update nor is it possible on this model to overwrite the checksum ("Unable to write default configuration to EEPROM"). (In reply to pindi from comment #13) > I have same identical problem on my Precision 7560 with Ubuntu 20.04 OEM, > kernel 5.13.0-1019-oem: > > + e1000e 0000:00:1f.6: The NVM Checksum Is Not Valid > + e1000e: probe of 0000:00:1f.6 failed with error -5 > > Is there any plans to fix this at the kernel level or is it necessary to > request hardware replacement (even if it works correctly despite the wrong > checksum)? > Unfortunately it is not possible for me to compile the form at each update > nor is it possible on this model to overwrite the checksum ("Unable to write > default configuration to EEPROM"). There is no need HW replacement I thought. You should contact your PC vendor and update NVM (part of BIOS) There is no more option to update NVM by SW tools on new HW (drivers, etc...) I have a brand new Dell Latitude E5420 with latest BIOS (1.14.1) and got the same error message as previous comment about NVM. Running Ubuntu 20.04.3 LTS and kernel 5.11.0-43-generic #47~20.04.2-Ubuntu SMP I don't see any download for a NVM update What should I do ? (In reply to Sasha Neftin from comment #14) > (In reply to pindi from comment #13) > > I have same identical problem on my Precision 7560 with Ubuntu 20.04 OEM, > > kernel 5.13.0-1019-oem: > > > > + e1000e 0000:00:1f.6: The NVM Checksum Is Not Valid > > + e1000e: probe of 0000:00:1f.6 failed with error -5 > > > > Is there any plans to fix this at the kernel level or is it necessary to > > request hardware replacement (even if it works correctly despite the wrong > > checksum)? > > Unfortunately it is not possible for me to compile the form at each update > > nor is it possible on this model to overwrite the checksum ("Unable to > write > > default configuration to EEPROM"). > > There is no need HW replacement I thought. You should contact your PC vendor > and update NVM (part of BIOS) > There is no more option to update NVM by SW tools on new HW (drivers, etc...) The only solution to the NIC problem was to replace the motherboard, as DELL does not allow overwriting of the ROM. Any other solution applied does not solve the problem in any way (except by rewrite the e1000 module and recompiling the kernel, but it is not applicable in my case). Regards. On my side, it seems to work with Intel's upstream version 3.8.7 from https://sourceforge.net/projects/e1000/. I installed it with https://github.com/koljah-de/e1000e-dkms-debian, so as DKMS. The question remains regarding what to do in order to make the Ubuntu driver to work natively... I also find strange that now the bootutil64e utility is complaining, while the NIC seems to work: --- Connection to QV driver failed - please reinstall it! Intel(R) Ethernet Flash Firmware Utility BootUtil version 1.37.34.3 Copyright (C) 2003-2021 Intel Corporation ERROR: The adapter (location 0:31.6) cannot be initialized due to inaccessible device memory. Update the device driver and reboot the system before running this utility again. Consult the utility documentation for more information. Type BootUtil -? for help Port Network Address Location Series WOL Flash Firmware Version ==== =============== ======== ======= === ============================= ======= 1 (Cannot initialize adapter) and then I hit https://bugzilla.kernel.org/show_bug.cgi?id=213651 with a network speed issue... This issue has been fixed since v5.14 by below commit commit 4051f68318ca9f3d3becef3b54e70ad2c146df97 Author: Sasha Neftin <sasha.neftin@intel.com> Date: Sun Jul 18 07:10:31 2021 +0300 e1000e: Do not take care about recovery NVM checksum On new platforms, the NVM is read-only. Attempting to update the NVM is causing a lockup to occur. Do not attempt to write to the NVM on platforms where it's not supported. Emit an error message when the NVM checksum is invalid. Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=213667 Fixes: fb776f5d57ee ("e1000e: Add support for Tiger Lake") Suggested-by: Dima Ruinskiy <dima.ruinskiy@intel.com> Suggested-by: Vitaly Lifshits <vitaly.lifshits@intel.com> Signed-off-by: Sasha Neftin <sasha.neftin@intel.com> Tested-by: Dvora Fuxbrumer <dvorax.fuxbrumer@linux.intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> I'm running `5.15.0-41-generic` on Ubuntu 20.04 LTS and still have the issue, the interface is not working (This is a Dell 5420): [ 0.944599] e1000e: Intel(R) PRO/1000 Network Driver [ 0.944601] e1000e: Copyright(c) 1999 - 2015 Intel Corporation. [ 0.945221] e1000e 0000:00:1f.6: Interrupt Throttling Rate (ints/sec) set to dynamic conservative mode [ 1.166829] e1000e 0000:00:1f.6: The NVM Checksum Is Not Valid [ 1.231542] e1000e: probe of 0000:00:1f.6 failed with error -5 Hi @all. short question: what part of the problem was fixed with the patch discussed here? Correction of nvm-checksum seems not to be done in current version to avoid lockup. But what about testing the checksum? Will the driver be loaded if checksum is correct or not? Still having the problem that the e1000e driver is not loaded due to checksum-error on actual ubuntu 22.04 lts with 5.17.0-1024-oem kernel on dell precision 7760 with current bios version dated 2022-11-23. Rex Tsai stated that he had contact with Dell and their partners on 2021-07-14 - that's quite a long time ago... so i'm wondering on wich side we have the problem. is this dells responsibility? or do i have to file a bug in the ubuntu-bugtracker? thank, karsten |
Created attachment 297775 [details] dmesg.log On Dell Precision 7760, it takes around 2 mins to boot into desktop. It looks like e1000e lockup the system while accessing it nvm. blacklist e1000e and load it after booted into desktop can't duplicate this issue. [ 28.145341] u-Precision-7760 kernel: watchdog: BUG: soft lockup - CPU#6 stuck for 23s! [systemd-udevd:235] [ 28.145342] u-Precision-7760 kernel: Modules linked in: hid_sensor_custom hid_sensor_hub intel_ishtp_loader intel_ishtp_hid nvidia_drm(POE) nvidia_modeset(POE) hid_ge neric nvidia(POE) i915(+) nvme nvme_core rtsx_pci_sdmmc i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops cec crc32_pclmul rc_core intel_lpss_pci psmouse e1000e(+) intel_lpss rtsx_pci xhci_pci idma64 i2c_hid vmd virt_dma drm thunderbolt i2c_i801 intel_ish_ipc i2c_smbus intel_ishtp wmi xhci_pci_renesas fjes(-) hid video pinctrl_tigerlake [ 28.145357] u-Precision-7760 kernel: CPU: 6 PID: 235 Comm: systemd-udevd Tainted: P W OE 5.10.0-1035-oem #36 [ 28.145357] u-Precision-7760 kernel: Hardware name: Dell Inc. Precision 7760/, BIOS 1.0.0 05/07/2021 [ 28.145364] u-Precision-7760 kernel: RIP: 0010:e1000_flash_cycle_ich8lan.constprop.0+0x5c/0x90 [e1000e] [ 28.145365] u-Precision-7760 kernel: Code: 00 00 0b 76 42 c1 e0 10 89 42 04 bb 81 96 98 00 eb 0f bf c7 10 00 00 e8 b2 1a b5 f0 83 eb 01 74 10 49 8b 44 24 10 66 8b 40 04 <41> 89 c5 a8 01 74 e1 41 83 e5 03 31 c0 5b 41 5c 41 80 fd 01 41 5d [ 28.145366] u-Precision-7760 kernel: RSP: 0018:ffff9fdf80513920 EFLAGS: 00000202 [ 28.145367] u-Precision-7760 kernel: RAX: ffff9fdf809a4028 RBX: 0000000000349c5e RCX: 0000000000000006 [ 28.145367] u-Precision-7760 kernel: RDX: 0000000000000c86 RSI: 0000000000000006 RDI: 0000000000000c5a [ 28.145368] u-Precision-7760 kernel: RBP: ffff9fdf80513938 R08: 0000001c2f5d11d1 R09: 0000000000000006 [ 28.145368] u-Precision-7760 kernel: R10: ffff922857a21030 R11: 0000000000000000 R12: ffff922857a20f38 [ 28.145369] u-Precision-7760 kernel: R13: 00000000809a4028 R14: 0000000000000000 R15: ffff922857a20f38 [ 28.145369] u-Precision-7760 kernel: FS: 00007f098f44c880(0000) GS:ffff922fafd80000(0000) knlGS:0000000000000000 [ 28.145370] u-Precision-7760 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 28.145370] u-Precision-7760 kernel: CR2: 0000563528b67010 CR3: 000000011607a006 CR4: 0000000000770ee0 [ 28.145370] u-Precision-7760 kernel: PKRU: 55555554 [ 28.145371] u-Precision-7760 kernel: Call Trace: [ 28.145375] u-Precision-7760 kernel: e1000_erase_flash_bank_ich8lan+0xa0/0x1a0 [e1000e] [ 28.145380] u-Precision-7760 kernel: e1000_update_nvm_checksum_spt+0x1f5/0x340 [e1000e] [ 28.145383] u-Precision-7760 kernel: e1000_validate_nvm_checksum_ich8lan+0xa1/0xd0 [e1000e] [ 28.145390] u-Precision-7760 kernel: e1000_probe+0x65f/0xc90 [e1000e] [ 28.145399] u-Precision-7760 kernel: local_pci_probe+0x48/0x80 [ 28.145400] u-Precision-7760 kernel: pci_device_probe+0x10f/0x1c0 [ 28.145402] u-Precision-7760 kernel: really_probe+0xfb/0x420 [ 28.145403] u-Precision-7760 kernel: driver_probe_device+0xe9/0x160 [ 28.145404] u-Precision-7760 kernel: device_driver_attach+0x5d/0x70 [ 28.145405] u-Precision-7760 kernel: __driver_attach+0x8f/0x150 [ 28.145406] u-Precision-7760 kernel: ? device_driver_attach+0x70/0x70 [ 28.145407] u-Precision-7760 kernel: bus_for_each_dev+0x7e/0xc0 [ 28.145408] u-Precision-7760 kernel: driver_attach+0x1e/0x20 [ 28.145408] u-Precision-7760 kernel: bus_add_driver+0x152/0x1f0 [ 28.145409] u-Precision-7760 kernel: driver_register+0x74/0xd0 [ 28.145410] u-Precision-7760 kernel: ? 0xffffffffc04ee000 [ 28.145411] u-Precision-7760 kernel: __pci_register_driver+0x54/0x60 [ 28.145415] u-Precision-7760 kernel: e1000_init_module+0x3b/0x1000 [e1000e] [ 28.145416] u-Precision-7760 kernel: do_one_initcall+0x48/0x1d0 [ 28.145418] u-Precision-7760 kernel: ? _cond_resched+0x19/0x30 [ 28.145420] u-Precision-7760 kernel: ? kmem_cache_alloc_trace+0x37a/0x430 [ 28.145421] u-Precision-7760 kernel: ? do_init_module+0x28/0x250 [ 28.145422] u-Precision-7760 kernel: do_init_module+0x62/0x250 [ 28.145423] u-Precision-7760 kernel: load_module+0x11ac/0x1370 [ 28.145425] u-Precision-7760 kernel: ? security_kernel_post_read_file+0x5c/0x70 [ 28.145425] u-Precision-7760 kernel: ? security_kernel_post_read_file+0x5c/0x70 [ 28.145427] u-Precision-7760 kernel: __do_sys_finit_module+0xc2/0x120 [ 28.145427] u-Precision-7760 kernel: ? __do_sys_finit_module+0xc2/0x120 [ 28.145428] u-Precision-7760 kernel: __x64_sys_finit_module+0x1a/0x20 [ 28.145430] u-Precision-7760 kernel: do_syscall_64+0x38/0x90 [ 28.145431] u-Precision-7760 kernel: entry_SYSCALL_64_after_hwframe+0x44/0xa9 [ 28.145431] u-Precision-7760 kernel: RIP: 0033:0x7f098f9ce89d [ 28.145432] u-Precision-7760 kernel: Code: 00 c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d c3 f5 0c 00 f7 d8 64 89 01 48 [ 28.145433] u-Precision-7760 kernel: RSP: 002b:00007ffc40ddeaa8 EFLAGS: 00000246 ORIG_RAX: 0000000000000139 [ 28.145433] u-Precision-7760 kernel: RAX: ffffffffffffffda RBX: 0000563528b826a0 RCX: 00007f098f9ce89d [ 28.145434] u-Precision-7760 kernel: RDX: 0000000000000000 RSI: 00007f098f8abded RDI: 0000000000000005 [ 28.145434] u-Precision-7760 kernel: RBP: 0000000000020000 R08: 0000000000000000 R09: 0000000000000000 [ 28.145435] u-Precision-7760 kernel: R10: 0000000000000005 R11: 0000000000000246 R12: 00007f098f8abded [ 28.145435] u-Precision-7760 kernel: R13: 0000000000000000 R14: 0000563528b8aa40 R15: 0000563528b826a0