Bug 195289 - OOPS: Null pointer dereference in p54_usb
Summary: OOPS: Null pointer dereference in p54_usb
Status: NEW
Alias: None
Product: Drivers
Classification: Unclassified
Component: network-wireless (show other bugs)
Hardware: ARM Linux
: P1 high
Assignee: drivers_network-wireless@kernel-bugs.osdl.org
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2017-04-08 02:05 UTC by Solomon Peachy
Modified: 2017-04-21 16:19 UTC (History)
1 user (show)

See Also:
Kernel Version: 4.9.17
Subsystem:
Regression: No
Bisected commit-id:


Attachments
Photo of oops, part 1 (2.72 MB, image/jpeg)
2017-04-08 02:05 UTC, Solomon Peachy
Details
Photo of oops, part 2 (2.72 MB, image/jpeg)
2017-04-08 02:06 UTC, Solomon Peachy
Details
dmesg log immediately prior to plugging in device (21.53 KB, text/plain)
2017-04-08 02:25 UTC, Solomon Peachy
Details

Description Solomon Peachy 2017-04-08 02:05:09 UTC
Created attachment 255781 [details]
Photo of oops, part 1

When plugging in an isl3887 (p54_usb) device into an RPi2, it triggers an near-immediate null pointer dereference.

I apologize for attaching screenshots rather than a proper textual trace, but I don't currently have the means to get a serial console on this system.

This system was running stably for some time, but I did see this oops  occasionally.  Unfortunately now I see it always, and instantly.  I suspect it has something to do with my local RF environment.

It appears to be calling dev_kfree_skb_any() from one of the functions referenced by p54_rx().  There are only a few candidates, but this one stands out, if only because the comment doesn't match the code:

	/*
	 * Frames in P54_QUEUE_FWSCAN and P54_QUEUE_BEACON are
	 * generated by the driver. Therefore tx_status is bogus
	 * and we don't want to confuse the mac80211 stack.
	 */
	if (unlikely(entry_data->hw_queue < P54_QUEUE_FWSCAN)) {
		dev_kfree_skb_any(entry);
		return ;
	}

I think that should be '<= F54_QUEUE_FWSCAN'.  That said, I don't see any other reference to the FWSCAN queue.
Comment 1 Solomon Peachy 2017-04-08 02:06:35 UTC
Created attachment 255783 [details]
Photo of oops, part 2

I had a hard time getting a clean photo as the oops scrolled by.  This was the best of the set.
Comment 2 Solomon Peachy 2017-04-08 02:25:26 UTC
Created attachment 255785 [details]
dmesg log immediately prior to plugging in device

This is the 4.9.17-1.rpi.fc25.armv7hl kernel as supplied by the Fedberry Fedora folks.  It's essentially the Fedora kernel with the upstream stable RPi patches applied.

I am using the latest '2.13.25.0' firmware for the isl3887 device.

The oops is triggered by multiple adapters, and it does not matter which USB port is used.

The power supply is good.  Up until tonight, the system had an uptime of >30 days running two isl3887 devices as a network hotspot with nary a hiccup.  Now I can't get it to last two seconds after plugging in an isl3887.
Comment 3 Christian Lamparter 2017-04-10 22:06:30 UTC
Hello,

Myungho Jung alerted me that there's something wrong with your p54 device.
Looking at the trace, I think the panic can be easily fixed with sth like:

---
diff --git a/net/core/dev.c b/net/core/dev.c
index 7869ae3837ca..44f7d5a1c67c 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -2450,6 +2450,9 @@ void __dev_kfree_skb_irq(struct sk_buff *skb, enum skb_free_reason reason)
 {
 	unsigned long flags;
 
+	if (!skb)
+		return;
+
 	if (likely(atomic_read(&skb->users) == 1)) {
 		smp_rmb();
 		atomic_set(&skb->users, 0);
---

However, I think the error you are getting is caused by a communication problem between the driver and the firmware. I suspect that either the dwc2 is having issues or the SMC Hub/Ethernet Combo chip. 

I don't know if the device will operate properly. You see: the driver has to
manage the device's address space. So, if the address field gets corrupted the
data might get written all over the firmware data/code and it will crash sooner
or later. Sadly, there's not much the p54usb driver can do. It has no automatic restart code that could deal with this type of issue and restart a hung device.

Regards,
Christian
Comment 4 Solomon Peachy 2017-04-20 03:44:43 UTC
I expected this sort of response -- In a former life, I wrote drivers for the isl3887 and I recall the firmware wasn't terribly forgiving when it came to USB host hiccups.

(I'm actually trying to hunt down some unrelated RPi USB wonkiness with certain printers.  So I'm no stranger to the RPi's USB deficiencies... Bleh..)

That said, even if the isl3887 device gets wedged, IMO it shouldn't trigger an OOPS while doing so -- I don't know if the above patch would be accepted by the core netdev maintainers, but sanity-checking the skb pointers before calling dev_kfree_skb_any() does seem prudent, as does logging the condition.

Unfortunately, the p54 devices are the most consistently reliable I've found when trying to run two adapters in an RPi (one as a STA, one as an AP) -- plus the ones I'm using have excellent onboard antennae.
Comment 5 Christian Lamparter 2017-04-21 16:19:05 UTC
Ah, Hello cw1200 :). Myungho Jung did made a patch to linux-net: https://patchwork.ozlabs.org/patch/753045/

This should at least fix the panic. As for the USB issue: Synopsis does
support the DWC2 module on linux-usb. I had some help from John Youn <johnyoun@synopsys.com> with getting the dwc2 in the APM821xx SoCs fixed.
If you want to test their latest code, they have it all in a github repository:
https://github.com/synopsys-usb/linux 


Off-topic:

From what I know the OrangePI Zero WIFI is using that Intersil/Conexant/STEriccson/NXP?... IP as well. So if you ever want to relive some of your past, I think the oss-community would be glad to have some help with it.

Note You need to log in before you can comment on or make changes to this bug.