Bug 217734

Summary: Bad firmware loading reporting situation
Product: Other Reporter: Artem S. Tashkinov (aros)
Component: ModulesAssignee: other_modules
Status: NEW ---    
Severity: high CC: gyakovlev, mpagano, robbat2, sam, torvalds
Priority: P3    
Hardware: All   
OS: Linux   
Kernel Version: Subsystem:
Regression: No Bisected commit-id:

Description Artem S. Tashkinov 2023-07-30 13:27:05 UTC
I really dislike how firmware loading is handled by kernel [modules]:

* Many modules don't report which firmware files are getting loaded
* Often no information is reported in regard to firmware files themselves, such as version, date or firmware file size. In a perfect world it would be nice to at least see the CRC32 checksum of the firmware file.
* Often bus and address are not clearly indicated.
* All the required firmware files must be probed simultaneously (file_exists()) and reported if any of them are missing _before_ attempting to load any of them.   For instance missing firmware files for the amdgpu driver may lead to a system instability or outright crashes. 

This needs to change.

What I expect to see in `dmesg`:

module_name: PCI-e 0000:01:00.0 Loading firmware file 'directory/fw1.bin', version 1.2.3, size 12345 bytes
module_name: PCI-e 0000:01:00.0 Loading firmware file 'directory/fw2.bin', version <unversioned>, size 23456 bytes
module_name: PCI-e 0000:01:00.0 Loading firmware file 'directory/fw3.bin', version 1.0.0, size 4444 bytes

In case fw2.bin or fw3.bin is missing, the module must report that info _before_ attempting to load any found/existing firmware files. 

module_name: PCI-e 0000:01:00.0 Warning: missing firmware file 'directory/fw3.bin'
module_name: PCI-e 0000:01:00.0 Warning: missing firmware file 'directory/fw2.bin'
module_name: PCI-e 0000:01:00.0 Loading firmware file 'directory/fw1.bin', version 1.2.3, size 12345 bytes

As it currently stands firmware files are loaded on a one by one basis, so in case you don't want to keep hundreds of megabytes of useless files in /lib/firmware, several reboot attempts may be required to figure out all the missing firmware files which is far from optimal.


Case in point:

cs35l41-hda i2c-CSC3551:00-cs35l41-hda.0: Cirrus Logic CS35L41 (35a40), Revision: B2
cs35l41-hda i2c-CSC3551:00-cs35l41-hda.1: Reset line busy, assuming shared reset
cs35l41-hda i2c-CSC3551:00-cs35l41-hda.1: Cirrus Logic CS35L41 (35a40), Revision: B2
cs35l41-hda i2c-CSC3551:00-cs35l41-hda.0: xz decompression failed (xz_ret=6)
cs35l41-hda i2c-CSC3551:00-cs35l41-hda.0: Falling back to default firmware.

No information about firmware being requested whatsoever. No information about bus or hardware device or maybe there is but it's hard to read. There's no module named "cs35l41-hda".


Another example:

mt7921e 0000:01:00.0: enabling device (0000 -> 0002)
mt7921e 0000:01:00.0: ASIC revision: 79220010
mt7921e 0000:01:00.0: HW/SW Version: 0x8a108a10, Build Time: 20230627143702a
mt7921e 0000:01:00.0: WM Firmware Version: ____000000, Build Time: 20230627143946

No information about firmware file(s) being loaded whatsoever. Actually three files are being loaded by this device (Wi-Fi + BlueTooth).
Comment 1 Artem S. Tashkinov 2023-08-08 18:16:39 UTC
Linus,

Is it possible to merge this Gentoo patch (without #ifdef, i.e. unconditionally) to alleviate this situation at least partially, now that no one is seemingly interested in getting it improved?

https://gitweb.gentoo.org/proj/linux-patches.git/tree/3000_Support-printing-firmware-info.patch?h=6.5

I don't think a printk() call is dangerous or could affect anything.
Comment 2 Sam James 2023-08-08 18:26:21 UTC
Some other references:
* I dug out the original Gentoo bug: https://bugs.gentoo.org/732852
* LKML discussion for an older version of the patch: https://lkml.org/lkml/2019/11/17/215
* https://lore.kernel.org/all/16a44663-808c-2eb4-ea6e-66f51a66f7cf@gmx.com/ by OP
Comment 3 Robin H. Johnson 2023-08-09 03:58:20 UTC
As the Author of the 2019 version, I approve of this request. 

The dyndbg="func _request_firmware +p" variant enabled by my patch ensures only users who want the extra messages get it. 

I still feel that a persistent sysfs audit record of what was loaded or attempted to load is worthwhile for systems where it matters, but it must be opt-in, because it's noisy.
Comment 4 Robin H. Johnson 2023-08-26 17:08:10 UTC
OP: I see that FW_LOADER_DEBUG was added in v6.4 kernels, can you check if it fits all of your needs?
Comment 5 Artem S. Tashkinov 2023-08-26 17:41:43 UTC
(In reply to Robin H. Johnson from comment #4)
> OP: I see that FW_LOADER_DEBUG was added in v6.4 kernels, can you check if
> it fits all of your needs?

From the sound of it, it requires enabling some sort of debugging in the kernel.

That doesn't seem right. Firmware loading is not something which only developers/kernel debuggers should see.

On my desktop with 6.4.10 I only get this:

Bluetooth: hci0: Found device firmware: intel/ibt-18-16-1.sfi
Bluetooth: hci0: Firmware Version: 108-45.22
Bluetooth: hci0: Firmware already loaded

This is coming from the driver itself, not from the kernel i.e. whatever there is doesn't print the necessary information.