Bug 217200

Summary: after upgrading to Kernel 6.2 WWAN Intel XMM7560 LTE module is not working anymore
Product: Drivers Reporter: Martin (mwolf)
Component: network-wireless-intelAssignee: Default virtual assignee for network-wireless-intel (drivers_network-wireless-intel)
Status: NEW ---    
Severity: normal CC: davem, loic.poulain, m.chetan.kumar, mkchetan, shaneparslow808
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 6.2 Subsystem:
Regression: No Bisected commit-id:
Attachments: ModemManager.log with --debug
Proposed patch
new proposal

Description Martin 2023-03-15 08:26:13 UTC
after upgrading to Kernel 6.2.x 
01:00.0 Wireless controller [0d40]: Intel Corporation XMM7560 LTE Advanced Pro Modem (rev 01) 
is not working anymore.

I am getting errors like:

[   44.973374] iosm 0000:01:00.0: ch[1]: confused phase 2
[   45.973650] iosm 0000:01:00.0: ch[1]: confused phase 2
[   46.972517] iosm 0000:01:00.0: ch[1]: confused phase 2
[   47.973038] iosm 0000:01:00.0: ch[1]: confused phase 2
[   48.973154] iosm 0000:01:00.0: ch[1]: confused phase 3
...
[  174.984861] iosm 0000:01:00.0: PORT open refused, phase A-CD_READY
[  174.985767] iosm 0000:01:00.0: ch[6]: confused phase 3
[  184.996879] iosm 0000:01:00.0: PORT open refused, phase A-CD_READY
[  344.482600] iosm 0000:01:00.0: msg timeout
[  344.986684] iosm 0000:01:00.0: msg timeout
...
[  287.032750] iosm 0000:01:00.0: ch[6]:invalid channel state 2,expected 1
[  288.032786] iosm 0000:01:00.0: ch[6]:invalid channel state 2,expected 1
[  298.042818] iosm 0000:01:00.0: ch[6]:invalid channel state 2,expected 1
[  337.034256] iosm 0000:01:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x000d address=0x0 flags=0x0000]
[  337.536467] iosm 0000:01:00.0: msg timeout
[  338.040709] iosm 0000:01:00.0: msg timeout

with Kernel 6.1.18 it is working flawlessly.

The problem still occurs with the latest development release 6.3.-rc2.

I filed a bug report with more info on the redhat bugzilla:

https://bugzilla.redhat.com/show_bug.cgi?id=2175487

Here a bit more about my System:

System:
  Host: HP845G9 Kernel: 6.2.6-200.fc37.x86_64 arch: x86_64 bits: 64
    Desktop: GNOME v: 43.3 Distro: Fedora release 37 (Thirty Seven)
Machine:
  Type: Laptop System: HP product: HP EliteBook 845 14 inch G9 Notebook PC
    v: N/A serial: <superuser required>
  Mobo: HP model: 8990 v: KBC Version 09.49.00 serial: <superuser required>
    UEFI: HP v: U82 Ver. 01.04.01 date: 01/12/2023
CPU:
  Info: 8-core model: AMD Ryzen 7 6800U with Radeon Graphics bits: 64
    type: MT MCP cache: L2: 4 MiB
  Speed (MHz): avg: 872 min/max: 400/4768 cores: 1: 400 2: 1186 3: 1155
    4: 1186 5: 1217 6: 400 7: 1676 8: 400 9: 400 10: 400 11: 400 12: 400
    13: 1353 14: 400 15: 400 16: 2588
Comment 1 Martin 2023-03-15 22:02:46 UTC
after a bisection I found the breaking commit here:

https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?h=v6.2&id=d08b0f8f46e45a274fc8c9a5bc92cb9da70d9887

d08b0f8f46e45a274fc8c9a5bc92cb9da70d9887 is the first bad commit
commit d08b0f8f46e45a274fc8c9a5bc92cb9da70d9887
Author: Shane Parslow <shaneparslow808@gmail.com>
Date:   Sat Oct 29 02:03:56 2022 -0700

    net: wwan: iosm: add rpc interface for xmm modems
    
    Add a new iosm wwan port that connects to the modem rpc interface. This
    interface provides a configuration channel, and in the case of the 7360, is
    the only way to configure the modem (as it does not support mbim).
    
    The new interface is compatible with existing software, such as
    open_xdatachannel.py from the xmm7360-pci project [1].
    
    [1] https://github.com/xmm7360/xmm7360-pci
    
    Signed-off-by: Shane Parslow <shaneparslow808@gmail.com>
    Reviewed-by: Loic Poulain <loic.poulain@linaro.org>
    Signed-off-by: David S. Miller <davem@davemloft.net>

 drivers/net/wwan/iosm/iosm_ipc_chnl_cfg.c | 2 +-
 drivers/net/wwan/wwan_core.c              | 4 ++++
 include/linux/wwan.h                      | 2 ++
 3 files changed, 7 insertions(+), 1 deletion(-)

This commit aims for a different XMM modem (7360) other than the one I use (7560).
Comment 2 Shane Parslow 2023-03-21 04:24:24 UTC
Hi Martin, I was the author of that patch. The patch adds a character device interface to allow communication with one of the modem's configuration channels. This is desirable on the 7360, and on the 7560 it *should* simply provide a supplementary configuration channel, since it supports MBIM as the main configuration method. 

All this is to say that I suspect some stray data is being sent on the newly exposed interface. These modems are notoriously easy to crash if you send anything outside of what they expect.

I think there are two options here
1. Go through the lengthy process of determining what's writing to the new interface, and fix that.
2. Simply remove the configuration channel from the 7560 and keep it in the 7360.

Given that "we do not break userspace", 1 isn't an option, so I'll get started on 2.

Now, if you'll entertain a hunch of mine, could you run ModemManager in debug mode and send the logs? I suspect ModemManager is doing its probes on my precious, fragile configuration port.
Comment 3 Martin 2023-03-21 08:01:40 UTC
Created attachment 303993 [details]
ModemManager.log with --debug

I started ModemManager in debug mode and waited a while till these errors:
[  156.301246] iosm 0000:01:00.0: ch[6]: confused phase 3
[  157.300964] iosm 0000:01:00.0: ch[6]: confused phase 3
[  158.300937] iosm 0000:01:00.0: ch[6]: confused phase 3
[  159.301174] iosm 0000:01:00.0: ch[6]: confused phase 3
[  160.301498] iosm 0000:01:00.0: ch[6]: confused phase 3
[  161.301571] iosm 0000:01:00.0: ch[6]: confused phase 3
[  162.301680] iosm 0000:01:00.0: ch[6]: confused phase 3
[  163.301348] iosm 0000:01:00.0: ch[6]: confused phase 3
[  164.301607] iosm 0000:01:00.0: ch[6]: confused phase 3
[  165.301788] iosm 0000:01:00.0: ch[6]: confused phase 3
[  166.302122] iosm 0000:01:00.0: ch[6]: confused phase 3
[  167.300896] iosm 0000:01:00.0: ch[6]: confused phase 3
[  168.301037] iosm 0000:01:00.0: ch[6]: confused phase 3
[  169.301181] iosm 0000:01:00.0: ch[6]: confused phase 3
[  170.301245] iosm 0000:01:00.0: ch[6]: confused phase 3
[  180.306234] iosm 0000:01:00.0: PORT open refused, phase A-CD_READY
[  180.307514] iosm 0000:01:00.0: ch[6]: confused phase 3
[  190.318390] iosm 0000:01:00.0: PORT open refused, phase A-CD_READY

occured in journal / dmesg
Comment 4 Shane Parslow 2023-03-21 09:56:48 UTC
Yep, looks like ModemManager is probing the new port with junk data, causing a modem crash. I'll draft a new patch that disables the new port on the 7560. In the mean time, if you need to use the modem on this Linux version you should be able to stop ModemManager from probing the port by deleting /dev/xmmNxmmrpc0 before ModemManager starts.

Thanks for the data, and sorry for the bug!
Comment 5 Shane Parslow 2023-03-21 09:58:25 UTC
Sorry, /dev/wwanNxmmrpc0 not /dev/xmmNxmmrpc0.
Comment 6 Martin 2023-03-21 10:26:39 UTC
Thank you, but for now I will stay with Kernel 6.1.x
Comment 7 Shane Parslow 2023-03-24 07:30:11 UTC
Created attachment 304014 [details]
Proposed patch

I attached the patch that I'm going to send in. If you have a chance, it would help if you tried it out. Thanks!
Comment 8 Martin 2023-03-24 08:23:11 UTC
Comment on attachment 304014 [details]
Proposed patch

I compiled Kernel 6.2.8 with it and your patch does its job. The errors are gone. Ty!
Comment 9 M Chetan Kumar 2023-03-24 08:24:35 UTC
I am reviewing the patch. Will update ASAP.
Regards,
Chetan
Comment 10 M Chetan Kumar 2023-03-24 09:01:43 UTC
Created attachment 304016 [details]
new proposal

Basically it ignores the xmmrpc port for 7560 platform.
Comment 11 M Chetan Kumar 2023-03-24 09:02:36 UTC
Hi,
could you please check the above proposal ? it basically ignores the xmmrpc port for 7560 with few lines of change.
Comment 12 Martin 2023-03-24 09:25:15 UTC
Comment on attachment 304016 [details]
new proposal

this patch works as well.
Comment 13 Shane Parslow 2023-03-24 21:59:43 UTC
(In reply to M Chetan Kumar from comment #10)
> Created attachment 304016 [details]
> new proposal
> 
> Basically it ignores the xmmrpc port for 7560 platform.

I'm assuming you want to go with this one? It's much simpler than mine.
Comment 14 M Chetan Kumar 2023-03-27 05:10:48 UTC
Yes. If you are fine with it then we can go for the submission.
Comment 15 Shane Parslow 2023-03-27 07:09:23 UTC
Go for it.
Comment 16 Martin 2023-03-30 23:50:58 UTC
I am sorry to say, I think there is a problem with that patch.

I built Kernel 6.2.8 with that patch and when I send my notebook to standby and wake it up again, I get this message here in dmesg:

[   39.116586] amd_pmc AMDI0007:00: Last suspend didn't reach deepest state

If I boot Fedora Disto Kernel 6.2.8 (without that patch) I do not see this message.

Can you please help me debug it?
Comment 17 Martin 2023-03-30 23:53:45 UTC
I noticed this issue, when I had my notebook unplugged from power and my battery was nearly drained just from standby itself.
Comment 18 M Chetan Kumar 2023-04-03 04:41:31 UTC
I dont think the patch is related to this suspend issue.
We may need to check why the wake is happening immediately.
Comment 19 Martin 2023-04-03 05:16:29 UTC
I think you are right.
I was testing with different kernels and it only happens with my self-compiled ones.
I am running Kernel 6.2.9 from Fedora (with your patch) and the error is gone.
Sorry for the troubles.