Bug 217606

Summary: firewire S3 resume failure (TSB43AB23 on AMD 770)
Product: Drivers Reporter: mmyangfl
Component: IEEE1394Assignee: drivers_ieee1394
Status: NEW ---    
Severity: normal CC: bagasdotme
Priority: P3    
Hardware: All   
OS: Linux   
Kernel Version: Subsystem:
Regression: No Bisected commit-id:
Attachments: console output
lspci -vvv
console output 6.4

Description mmyangfl 2023-06-28 23:49:55 UTC
Kernel: Debian 6.3.7-1

System cannot resume from S3 suspend. After researching, I managed to capture relevant logs from console.

# echo devices > /sys/power/pm_test
# echo platform > /sys/power/disk
# echo mem > /sys/power/state

...
[  220.933541] BUG: kernel NULL pointer dereference, address: 0000000000000000
[  220.940514] #PF: supervisor read access in kernel mode
[  220.940516] #PF: error_code(0x0000) - not-present page
[  220.940517] PGD 0 P4D 0 
[  220.940520] Oops: 0000 [#1] PREEMPT SMP NOPTI
[  220.940523] CPU: 1 PID: 1453 Comm: kworker/u16:19 Not tainted 6.3.0-1-amd64 #1  Debian 6.3.7-1
[  220.940526] Hardware name: Gigabyte Technology Co., Ltd. GA-MA770-US3/GA-MA770-US3, BIOS FJ 05/14/2010
[  220.940528] Workqueue: events_unbound async_run_entry_fn
[  220.940535] RIP: 0010:ohci_enable+0x2c7/0x5c0 [firewire_ohci]
[  220.940543] Code: e8 ee a6 a4 d4 48 89 83 70 08 00 00 48 89 c7 48 85 c0 0f 84 00 03 00 00 4c 89 e2 48 89 ee e8 70 ca ff ff 48 8b 83 70 08 00 00 <8b> 10 89 93 80 08 00 00 31 d2 c7 00 00 00 00 00 48 8b 83 a8 05 00
[  220.940545] RSP: 0018:ffffbeed42ef3da0 EFLAGS: 00010246
[  220.940547] RAX: 0000000000000000 RBX: ffff9b0244172000 RCX: ffffbeed402fb000
[  220.940548] RDX: 0000000000000000 RSI: 0000000000000004 RDI: ffff9b02441725d8
[  220.940550] RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000
[  220.940550] R10: 0000000000000001 R11: 0000000000000000 R12: 0000000000000000
[  220.940551] R13: ffff9b02441725d8 R14: 0000000000000001 R15: ffff9b0254c41808
[  220.940553] FS:  0000000000000000(0000) GS:ffff9b02f9c40000(0000) knlGS:0000000000000000
[  220.940554] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  220.940555] CR2: 0000000000000000 CR3: 0000000115628000 CR4: 00000000000006e0
[  220.940557] Call Trace:
[  220.940560]  <TASK>
[  220.940563]  ? __die+0x23/0x70
[  220.940566]  ? page_fault_oops+0x17d/0x4c0
[  220.940571]  ? exc_page_fault+0x74/0x170
[  220.940574]  ? asm_exc_page_fault+0x26/0x30
[  220.940578]  ? ohci_enable+0x2c7/0x5c0 [firewire_ohci]
[  220.940584]  pci_resume+0x7e/0x160 [firewire_ohci]
[  220.940589]  ? __pfx_pci_pm_resume+0x10/0x10
[  220.940593]  dpm_run_callback+0x89/0x1e0
[  220.940597]  device_resume+0x88/0x190
[  220.940600]  async_resume+0x1e/0x60
[  220.940602]  async_run_entry_fn+0x31/0x130
[  220.940606]  process_one_work+0x1c5/0x3c0
[  220.940609]  worker_thread+0x51/0x390
[  220.940611]  ? __pfx_worker_thread+0x10/0x10
[  220.940612]  kthread+0xea/0x120
[  220.940615]  ? __pfx_kthread+0x10/0x10
[  220.940617]  ret_from_fork+0x29/0x50
[  220.940623]  </TASK>
...

`rmmod firewire_ohci` can solve this issue.

Full console output attached.
Comment 1 mmyangfl 2023-06-28 23:51:06 UTC
Created attachment 304501 [details]
console output
Comment 2 mmyangfl 2023-06-28 23:51:21 UTC
Created attachment 304502 [details]
lspci -vvv
Comment 3 Bagas Sanjaya 2023-06-29 00:41:46 UTC
What Debian version are you running? Can you check v6.1.y stable series and latest mainline?
Comment 4 mmyangfl 2023-06-29 00:54:34 UTC
Debian testing. I upgraded from linux-image-6.1.0-9-amd64 to linux-image-6.3.0-1-amd64 and both kernels suffer from same issue.
Comment 5 mmyangfl 2023-06-29 01:49:45 UTC
Same issue upgrading to 6.4~rc7 and BIOS version FKb.

...
[   57.202706] BUG: kernel NULL pointer dereference, address: 0000000000000000
[   57.202715] #PF: supervisor read access in kernel mode
[   57.202718] #PF: error_code(0x0000) - not-present page
[   57.202721] PGD 0 P4D 0 
[   57.202725] Oops: 0000 [#1] PREEMPT SMP NOPTI
[   57.202729] CPU: 0 PID: 1457 Comm: kworker/u16:13 Not tainted 6.4.0-0-amd64 #1  Debian 6.4~rc7-1~exp1
[   57.202735] Hardware name: Gigabyte Technology Co., Ltd. GA-MA770-US3/GA-MA770-US3, BIOS FKb 01/06/2011
[   57.202738] Workqueue: events_unbound async_run_entry_fn
[   57.202748] RIP: 0010:ohci_enable+0x2c7/0x5c0 [firewire_ohci]
[   57.202762] Code: e8 fe 1a 73 f9 48 89 83 70 08 00 00 48 89 c7 48 85 c0 0f 84 00 03 00 00 4c 89 e2 48 89 ee e8 70 ca ff ff 48 8b 83 70 08 00 00 <8b> 10 89 93 80 08 00 00 31 d2 c7 00 00 00 00 00 48 8b 83 a8 05 00
[   57.202765] RSP: 0018:ffffb85d40a87da0 EFLAGS: 00010246
[   57.202768] RAX: 0000000000000000 RBX: ffff8f070400a000 RCX: ffffb85d402ca000
[   57.202771] RDX: 0000000000000000 RSI: 0000000000000004 RDI: ffff8f070400a5d8
[   57.202773] RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000
[   57.202774] R10: 0000000000000001 R11: 0000000000000000 R12: 0000000000000000
[   57.202776] R13: ffff8f070400a5d8 R14: 0000000000000001 R15: ffff8f0706866568
[   57.202778] FS:  0000000000000000(0000) GS:ffff8f07b9c00000(0000) knlGS:0000000000000000
[   57.202781] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   57.202783] CR2: 0000000000000000 CR3: 0000000135b20000 CR4: 00000000000006f0
[   57.202785] Call Trace:
[   57.202789]  <TASK>
[   57.202793]  ? __die+0x23/0x70
[   57.202798]  ? page_fault_oops+0x17d/0x4c0
[   57.202804]  ? exc_page_fault+0x77/0x170
[   57.202808]  ? asm_exc_page_fault+0x26/0x30
[   57.202815]  ? ohci_enable+0x2c7/0x5c0 [firewire_ohci]
[   57.202825]  pci_resume+0x7e/0x160 [firewire_ohci]
[   57.202833]  ? __pfx_pci_pm_resume+0x10/0x10
[   57.202839]  dpm_run_callback+0x89/0x1e0
[   57.202845]  device_resume+0x88/0x190
[   57.202849]  async_resume+0x1e/0x60
[   57.202852]  async_run_entry_fn+0x31/0x130
[   57.202856]  process_one_work+0x1c5/0x3c0
[   57.202862]  worker_thread+0x51/0x390
[   57.202866]  ? __pfx_worker_thread+0x10/0x10
[   57.202869]  kthread+0xf4/0x130
[   57.202874]  ? __pfx_kthread+0x10/0x10
[   57.202879]  ret_from_fork+0x29/0x50
[   57.202887]  </TASK>
...
Comment 6 mmyangfl 2023-06-29 01:50:09 UTC
Created attachment 304504 [details]
console output 6.4
Comment 7 mmyangfl 2023-06-30 16:38:39 UTC
After simple experiment I found [1] was never called, which is called by irq_handler, thus [2] immediately got null pointer after resume.

Quick fix
	...
		copy_config_rom(ohci->next_config_rom, config_rom, length);
	} else if (!ohci->config_rom || !ohci->config_rom_bus) {
		ohci_err(ohci, "failed to resume ohci card\n");
		return -EIO;
	} else {
	...
does prevent system from hanging after resume.

All I can figure out is that irq_handler always gets irq mask 0, so no subsequent action is made. Since I don't have any firewire devices, I can't really dig out why irq does not work, thus I don't know if this workaround is appropriate...

[1] https://elixir.bootlin.com/linux/v6.4/source/drivers/firewire/ohci.c#L2022
[2] https://elixir.bootlin.com/linux/v6.4/source/drivers/firewire/ohci.c#L2393