Bug 15606
Summary: | register_netdevice: Kobject changed too ealier | ||
---|---|---|---|
Product: | Networking | Reporter: | linuxzsc |
Component: | Other | Assignee: | Arnaldo Carvalho de Melo (acme) |
Status: | RESOLVED CODE_FIX | ||
Severity: | normal | CC: | alan, johannes, kvalo, linville, martin.xu, mcgrof |
Priority: | P1 | ||
Hardware: | All | ||
OS: | Linux | ||
Kernel Version: | 2.6.31 | Subsystem: | |
Regression: | No | Bisected commit-id: | |
Attachments: |
ath6kl: Use register_netdevice() and dev_alloc_name()
udev-ifname-race.c |
Description
linuxzsc
2010-03-22 09:31:54 UTC
I believe the driver you used was a cfg80211 driver, I am starting to think connman picked up a udev event for the kobject created for the device that gets used internally by cfg80211 through wiphy_register(). So for example, at least mac80211 later adds an actual networking device and registers it via register_netdevice(), and this will also create a udev event, this is the udev event that connman should look out for. Are we certain that the udev event that connman uses to process a networking device is the one from register_netdevice() and not from wiphy_register()? For ath6kl we have gotten issues reported with similar issues and at first I thought the race was we were calling register_netdev() after wiphy_register() but this turns out to be OK, even though inverting this can actually help, but its the wrong fix. The next step I took was to drop using register_netdev() and instead contend on the rtnl_lock() and use dev_alloc_name() myself and later use register_netdevice(). This should expose the proper desired name for the networking device for ath6kl. If this doesn't fix it I start wondering more about the udev events that connman is starting to process and if its really for a networking device. Created attachment 51952 [details]
ath6kl: Use register_netdevice() and dev_alloc_name()
This is the stuff I was talking about for ath6kl. This uses register_netdevice() and dev_alloc_name(), can you try this instead?
Oh that patch will not apply, I have a boat load of other updates on top of staging-2.6.git on the staging-next branch for which that will apply.. hmm, so I can get you all my other patches.. but that's a lot.. I think you can read the patch though and get the idea of what was intended though. anyone? I can reproduce this easily with flimflam on x86 box with one cpu and ath6kl. I just added this extra delay: --- a/net/core/dev.c +++ b/net/core/dev.c @@ -5428,6 +5428,9 @@ int register_netdevice(struct net_device *dev) ret = netdev_register_kobject(dev); if (ret) goto err_uninit; + + msleep(10000); + dev->reg_state = NETREG_REGISTERED; netdev_update_features(dev); And here's a part of flimflam log when the race happens. Notice how SIOCGIFNAME fails. Apr 29 00:21:35 roska flimflamd[2598]: src/rfkill.c:rfkill_event() idx 0 type 1 op 2 soft 0 hard 0 Apr 29 00:21:35 roska kernel: [ 125.530250] ar6000_init() Got WMI @ 0xf4bb3d40. Apr 29 00:21:35 roska kernel: [ 125.541146] Target Ready: credits: 11 credit size: 1664 Apr 29 00:21:35 roska kernel: [ 125.541151] HIF-SCATTER Enabled: max scatter req : 4 entries: 16 Apr 29 00:21:35 roska kernel: [ 125.541174] AR6K: HIF layer supports scatter requests (max scatter items:16: maxlen:32768) Apr 29 00:21:35 roska kernel: [ 125.541177] AR6K: max recv: 32768 max send: 12288 Apr 29 00:21:35 roska kernel: [ 125.549883] ar6000_init() WMI is ready Apr 29 00:21:35 roska kernel: [ 125.550168] wmi_control_rx() : Unknown id 0x101e Apr 29 00:21:35 roska kernel: [ 125.550673] AR6000 Reg Code = 0x40000060 Apr 29 00:21:35 roska flimflamd[2598]: === add === Apr 29 00:21:35 roska flimflamd[2598]: DEVPATH = /devices/pci0000:00/0000:00:1e.0/0000:03:02.1/mmc_host/mmc0/mmc0:0001/mmc0:0001:1/net/wlan0 Apr 29 00:21:35 roska flimflamd[2598]: INTERFACE = wlan0 Apr 29 00:21:35 roska flimflamd[2598]: IFINDEX = 4 Apr 29 00:21:35 roska flimflamd[2598]: DEVPATH = /devices/pci0000:00/0000:00:1e.0/0000:03:02.1/mmc_host/mmc0/mmc0:0001/mmc0:0001:1 Apr 29 00:21:35 roska flimflamd[2598]: src/udev.c:add_net_device() Apr 29 00:21:35 roska flimflamd[2598]: connman_inet_ifname: SIOCGIFNAME(index 4): No such device Apr 29 00:21:45 roska flimflamd[2598]: src/rtnl.c:rtnl_message() buf 0xbfefda3c len 1004 Apr 29 00:21:45 roska flimflamd[2598]: src/rtnl.c:rtnl_message() NEWLINK len 1004 type 16 flags 0x0000 seq 0 Apr 29 00:21:45 roska flimflamd[2598]: src/ipconfig.c:__connman_ipconfig_newlink() index 4 type 1 flags 0x1002 ifname wlan0 Apr 29 00:21:45 roska flimflamd[2598]: wlan0 {create} index 4 type 1 <ETHER> Jouni Malinen suggested another way to fix the issue: for some reason SIOCGIFNAME ioctl call doesn't take rtnl. Adding this fixes the race as well. Now the question is which way is the best way to do it? Here's a patch: --- a/net/core/dev.c +++ b/net/core/dev.c @@ -4917,8 +4917,12 @@ int dev_ioctl(struct net *net, unsigned int cmd, void __user *arg) rtnl_unlock(); return ret; } - if (cmd == SIOCGIFNAME) - return dev_ifname(net, (struct ifreq __user *)arg); + if (cmd == SIOCGIFNAME) { + rtnl_lock(); + ret = dev_ifname(net, (struct ifreq __user *)arg); + rtnl_unlock(); + return ret; + } if (copy_from_user(&ifr, arg, sizeof(struct ifreq))) return -EFAULT; Created attachment 56402 [details]
udev-ifname-race.c
I can now reproduce with a stock wireless-testing kernel 2.6.39-rc5-wl (rc5 plus wireless patches) on my x86 laptop, without flimflam or any other special software. Here's how to do it:
1. Run four burnP6 instances (I have four virtual cpus)
2. ./udev-ifname-race (attached)
3. while true; do modprobe e1000e; sleep 0.5; modprobe -r e1000e; echo -n .; done
Within few minutes udev-ifname-race will fail on my laptop (lenovo x201, i5, 64 bit).
Links to few discussions: http://marc.info/?l=linux-netdev&m=130409861507476&w=2 http://marc.info/?l=linux-netdev&m=130403226800848&w=2 |