For Qualcomm based modem devices, with PCIE interface, and mhi driver based on kernel 5.13.0, it will lose data connection. Details as below: Phenomenon: 1) Can NOT connect to Internet 2) /dev/wwan0p1QCDM, /dev/wwan0p2MBIM, /dev/wwan0p3AT NOT response Reproduce steps 1) Connect Internet via cellular 2) Do speedtest(https://www.speedtest.net/) or download a 1GB file (ipv4.download.thinkbroadband.com/1GB.zip) Recovery method Only can be recovered by reboot host Others It can NOT be reproduced in Windows 10/11.
Created attachment 300187 [details] host dmesg and related log Default debug mechanism wouldn't report any error message. We tried to add some debug message in each related function. Still no findings. This should be a common issue.
Can you please enable the MHI debug logs and share the dmesg? diff --git a/drivers/bus/mhi/Makefile b/drivers/bus/mhi/Makefile index 0a2d778d6fb4..ae4063866f73 100644 --- a/drivers/bus/mhi/Makefile +++ b/drivers/bus/mhi/Makefile @@ -1,3 +1,5 @@ +subdir-ccflags-y := -DDEBUG + # core layer obj-y += core/ Also, did you see any failure log in firmware console?
Created attachment 300210 [details] host dmesg with debug mask open Hi Mani, I just reproduced it with the new settings of the makefile. As we can see, there is no warning or error could be found. In the firmware side, Qualcomm said the direct cause is the IPA stucked. But they said it's host (driver) make it stuck.
I can see one relevant warning in dmesg: [ 47.768439] mhi_net mhi0_IP_HW0_MBIM mhi_mbim0: Fragmented packets received, fix MTU? This means that that IPA has splitted the packet into multiple ones due to low MTU/MRU in host. Can you try adding default MRU in "pci_generic.c" and see if that fixes the issue? diff --git a/drivers/bus/mhi/pci_generic.c b/drivers/bus/mhi/pci_generic.c index 29607f7bc8da..c7257154dfd2 100644 --- a/drivers/bus/mhi/pci_generic.c +++ b/drivers/bus/mhi/pci_generic.c @@ -367,6 +367,7 @@ static const struct mhi_pci_dev_info mhi_foxconn_sdx55_info = { .bar_num = MHI_PCI_DEFAULT_BAR_NUM, .dma_data_width = 32, .sideband_wake = false, + .mru_default = 32768, }; static const struct mhi_channel_config mhi_mv31_channels[] = {
Hi Mani, I applied with your patch in version 5.15.12 and reproduced this issue again. Actually we add some debug print in function mhi_net_dl_callback(), and we can see that mhi_res->transaction_status would always be -EOVERFLOW in version 5.13(with your patch). Just like the comment said, "Since this is not optimal,print a warning (once)".
Correct comment5: would always be -EOVERFLOW in version 5.13(with your patch). =>would always be -EOVERFLOW in version 5.13(without your patch).
> =>would always be -EOVERFLOW in version 5.13(without your patch). Okay, so you didn't see the overflow error with my patch on 5.13 but still the connection drops?
Hi Mani, May I know why we should increase the MRU size to 32K? Any reference except the commit https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/commit/drivers/bus/mhi/pci_generic.c?id=5c2c85315948c42c6c0258cf9bad596acaa79043 ?
Patch https://lore.kernel.org/lkml/20220115023430.4659-1-slark_xiao@163.com/ + https://lore.kernel.org/lkml/20220115103912.3775-1-slark_xiao@163.com/ fixed this issue.
Issue is gone in our local side. But still reproduced in Europe.
Created attachment 300284 [details] Seems only have that same warning with previous Test log in Europe side. It's more easier (about 20s)to reproduce than our side(more than 13 hours )with above patches.
(In reply to slark_xiao from comment #11) > Created attachment 300284 [details] > Seems only have that same warning with previous > > Test log in Europe side. It's more easier (about 20s)to reproduce than our > side(more than 13 hours )with above patches. side(more than 13 hours and result still pass)with above patches.
(In reply to slark_xiao from comment #12) > side(more than 13 hours and result still pass)with above patches. Does this mean the issue is fixed with mru_default patch?
Hi Mani, In my opinion, mru_default patch could lower the the reproducibility in our side. For example, we can reproduce issue no more than 3 hours( average time should be about 1 hour with simulator network) without any changes. After appling with the mru patch, we can't reproduce it over 13 hours(we thought we fix it, actually not). So, I think MRU patch is still helpful.
(In reply to slark_xiao from comment #10) > Issue is gone in our local side. But still reproduced in Europe. Good news! After checking, previous patches works in Europe. And my summary in Commet 10 could be ignored because this is caused by driver files mismatch issue. Today we remote to there for further checking and find that 'mistake'. So the conclusion is that the patches works! Sorry again for my previous wrong summary!