Bug 81211

Summary: Regression: "unregister_netdevice: waiting for <virtual network interface> to become free"
Product: Networking Reporter: Adam Avramov (fallenblood)
Component: IPV4Assignee: Stephen Hemminger (stephen)
Status: NEEDINFO ---    
Severity: normal CC: 0312birdzhang, alan, andre, anthonybest, bastienphilbert, costamagnagianfranco, david, de.techno, dpiquet, fxposter, glen.84, gopinath.taget, jdelvare, justin, kevin, maitoor, michael+kernel.org, mitsuto, pegerto, philporada, rodrigo.vaz, schnitzelbub, szg00000, victor.bitman
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 3.14.3-rt4 Subsystem:
Regression: No Bisected commit-id:

Description Adam Avramov 2014-07-27 10:30:06 UTC
Hey guys, sorry I don't have the time to put together a proper bug report, but the following GitHub thread has plenty of info about it.

https://github.com/docker/docker/issues/5618#issuecomment-50189996
Comment 1 Alan 2014-07-29 15:04:36 UTC
This does not appear to be a bug, the kernel is merely waiting for all the users to drop the device. Something has a reference to it so it remains.
Comment 2 Adam Avramov 2014-08-01 10:12:49 UTC
(In reply to Alan from comment #1)
> This does not appear to be a bug, the kernel is merely waiting for all the
> users to drop the device. Something has a reference to it so it remains.

This occurs with both Docker and VirtualBox. Both applications rely on a virtual network interface, and it happens once some amount of data (estimated less than a few hundred megabytes) has been transferred over that interface. It never happens with kernel 3.13.0-29-generic, but happens all the time with 3.14.3-rt4, so obviously the kernel is the decisive factor here; how is it not regression?
Comment 3 Alan 2014-08-01 12:27:54 UTC
Virtualbox is out of tree modules so we don't care 8) - but it's useful data

First things to clear up given you've not filed much information yet

What is the reproducer ?
Does the message continue forever ?
Does it only occur on 3.14-rt not 3.14 standard ?
If it only happens on 3.14.3-rt, then does it happen on older -rt kernels ?

It's normal for this to occur if you drop a device that is in use for something, but it should very rapidly get cleaned up and go away. It's quite possible 3.13/3.14 timings make a difference and its just chance it worked before. Equally if its getting stuck forever that implies a race or refcount leak of some kind.
Comment 4 Adam Avramov 2014-08-01 14:27:55 UTC
To reproduce the issue, install Docker (I'm alternatively using v0.11.1 or v1.0.0), spin up a container, and download a few hundred megabytes in it. On attempted system shutdown, you will see the message, and it ostensibly continues forever. You can see other people encountering this in the GitHub thread linked above. I haven't tried it with either 3.14 standard or other -rt kernels yet, but I'll try it out as soon as I get the time.
Comment 5 Rodrigo Sampaio Vaz 2015-02-03 19:52:17 UTC
FWIW we got an instance with kernel 3.16.0-29-generic #39-Ubuntu hitting this bug, we don't use docker but we run lxc, we also don't have a reliable reproducer and the only way to notice this kernel is also affected was too boot it in production and wait the bug to happen (usually when we have container stoping and starting).
Comment 6 Pavel Forkert 2015-03-23 14:48:12 UTC
I've got the same problem with latest docker and both Debian 7 and 8 with 3.16(.7) kernels: https://github.com/docker/docker/issues/9605#issuecomment-85025729. This happens once in a couple of days and the only way to continue running docker containers on this server is to force reboot.
Comment 7 Justin Dray 2015-10-14 14:24:55 UTC
I'm also seeing this problem; on debian 7 hosts with 3.16 kernel and debian 8 with 4.2.0; happens every 1-2 days for us (across a few dozen machines). We haven't found a way to reliable reproduce the bug yet; but we are unable to shutdown the system cleanly, we have to force power off to be able to continue to run containers.
Comment 8 Marcel Mitsuto 2016-02-08 12:26:51 UTC
I'm also seeing this behaviour on docker 1.9.1 and kernel 4.4.1 from kernel-ml package on centos7. It's intermittent and not reliably reproducible.
Comment 9 gopinath.taget 2016-03-04 07:15:39 UTC
Guys,

I am seeing this issue on CoreOS 970.1.0 with Docker 1.10.1. This issue has been festering for more than 1.5 years. What can I do to help fix it? What would be the direction of investigation? Any pointers would help.

Thanks
Gopinath
Comment 10 Hannes Römer 2016-04-21 16:19:33 UTC
I can confirm that network IO will trigger this issue.
On a Debian 8.4 host running Docker 1.11.0 containers on top of kernel 4.4.6 I too face this problem.
My repro is to build an image (based on the default Debian image debian:jessie) that first dist-upgrades system packages and then installs some more packages.
I haven't experienced this running a similar setup on VirtualBox/Vagrant so far, though.
Any hint on how to tackle this highly appreciated.


Cheers,
Hannes
Comment 11 Phil Porada 2016-07-13 15:08:18 UTC
I'm running on CentOS Linux release 7.2.1511 (Core) with the Linux 4.6.3-1.el7.elrepo.x86_64 kernel. My docker version is 1.11.2. I am running on Amazon EC2. This does NOT happen to me in VirtualBox with the same kernel/docker version.

There's quite a bit of discussion on https://github.com/docker/docker/issues/5618
Comment 12 [account disabled by administrator] 2016-07-15 01:35:12 UTC
Can you git bisect it as you known what kernel version it seemed to stop working on and report back with the offending commit id when the bisection is complete.
Comment 13 pegerto 2016-08-11 10:56:30 UTC
Hi,

According to the docker issue is raise by creating destroying multiple containers, form my undertanding is a raise condition as this is not replicable with a single cpu vm.

I reproduced the issue with CoreOS 1068.8.0 over VirtualBox, I can take a kernel dump from the hypervisor if this helps.

Regards.
Comment 14 [account disabled by administrator] 2016-08-12 01:50:27 UTC
Can please do, and you mean a race condition not a raise condition I think.
Comment 15 Kevin 2017-01-12 03:03:07 UTC
Hi,

I reproduced the issue using kernel with this patch:

(Log each refcnt increase and decrease for netdev with its function name)

https://lkml.org/lkml/2008/7/20/5

It shows the following logs (for loopback device inside container namespace) when the issue occured:

Jan 11 16:17:43 <hostname> kernel: [ 2196.943640] lo: dev_hold 1 rx_queue_add_kobject
Jan 11 16:17:43 <hostname> kernel: [ 2196.943652] lo: dev_hold 2 netdev_queue_add_kobject
Jan 11 16:17:43 <hostname> kernel: [ 2196.943654] lo: dev_hold 3 register_netdevice
Jan 11 16:17:43 <hostname> kernel: [ 2196.943658] lo: dev_hold 4 neigh_parms_alloc
Jan 11 16:17:43 <hostname> kernel: [ 2196.943660] lo: dev_hold 5 inetdev_init

(I patched Docker to rename lo inside container to lodebug for ease of tracking)
Jan 11 16:17:43 <hostname> kernel: [ 2196.981741] lodebug: renamed from lo

Jan 11 16:17:43 <hostname> kernel: [ 2197.001771] lodebug: dev_hold 6 qdisc_alloc
Jan 11 16:17:43 <hostname> kernel: [ 2197.001815] lodebug: dev_hold 7 dev_get_by_index
Jan 11 16:17:43 <hostname> kernel: [ 2197.001824] lodebug: dev_hold 8 dev_get_by_index
Jan 11 16:17:43 <hostname> kernel: [ 2197.001831] lodebug: dev_hold 9 fib_check_nh
Jan 11 16:17:43 <hostname> kernel: [ 2197.001836] lodebug: dev_hold 10 fib_check_nh
Jan 11 16:17:43 <hostname> kernel: [ 2197.001843] lodebug: dev_hold 11 dev_get_by_index
Jan 11 16:17:43 <hostname> kernel: [ 2197.001849] lodebug: dev_hold 12 dev_get_by_index
Jan 11 16:17:43 <hostname> kernel: [ 2197.001852] lodebug: dev_hold 13 fib_check_nh
Jan 11 16:17:43 <hostname> kernel: [ 2197.001856] lodebug: dev_hold 14 fib_check_nh
Jan 11 16:17:43 <hostname> kernel: [ 2197.025451] lodebug: dev_put 14 free_fib_info_rcu
Jan 11 16:17:43 <hostname> kernel: [ 2197.025473] lodebug: dev_put 13 free_fib_info_rcu
Jan 11 16:17:43 <hostname> kernel: [ 2197.025475] lodebug: dev_put 12 free_fib_info_rcu
Jan 11 16:17:43 <hostname> kernel: [ 2197.025477] lodebug: dev_put 11 free_fib_info_rcu
Jan 11 16:17:43 <hostname> kernel: [ 2197.025480] lodebug: dev_put 10 free_fib_info_rcu
Jan 11 16:17:43 <hostname> kernel: [ 2197.025482] lodebug: dev_put 9 free_fib_info_rcu
Jan 11 16:17:43 <hostname> kernel: [ 2197.414900] lodebug: dev_hold 9 dst_init
Jan 11 16:17:43 <hostname> kernel: [ 2197.418634] lodebug: dev_hold 10 dst_init
Jan 11 16:17:43 <hostname> kernel: [ 2197.418638] lodebug: dev_hold 11 dst_init
Jan 11 16:17:43 <hostname> kernel: [ 2197.445496] lodebug: dev_put 11 dst_destroy
Jan 11 16:17:44 <hostname> kernel: [ 2197.614240] lodebug: dev_hold 11 dst_init

(after __neigh_create function call, the reference count for loopback device is incorrectly increased to 13)

Jan 11 16:20:41 <hostname> kernel: [ 2374.898917] lodebug: dev_hold 13 __neigh_create
Jan 11 16:23:42 <hostname> kernel: [ 2555.900160] lodebug: dev_hold 14 dst_init
Jan 11 16:24:10 <hostname> kernel: [ 2583.573193] lodebug: dev_hold 15 dst_init
Jan 11 16:26:09 <hostname> kernel: [ 2702.992174] lodebug: dev_hold 16 dst_init
Jan 11 16:27:15 <hostname> kernel: [ 2769.188044] lodebug: dev_hold 17 dst_init
Jan 11 16:31:57 <hostname> kernel: [ 3050.821593] lodebug: dev_hold 18 dst_init
Jan 11 16:37:41 <hostname> kernel: [ 3394.962714] lodebug: dev_hold 19 dst_init
Jan 11 16:46:27 <hostname> kernel: [ 3921.376163] lodebug: dev_hold 20 dst_init
Jan 11 16:46:30 <hostname> kernel: [ 3923.759813] lodebug: dev_hold 21 dst_init

(container and networking namespace tearing down)

Jan 11 16:47:01 <hostname> kernel: [ 3954.802588] lodebug: dev_hold 22 dst_ifdown
Jan 11 16:47:01 <hostname> kernel: [ 3954.802596] lodebug: dev_hold 23 dst_ifdown
Jan 11 16:47:01 <hostname> kernel: [ 3954.802599] lodebug: dev_hold 24 dst_ifdown
Jan 11 16:47:01 <hostname> kernel: [ 3954.802602] lodebug: dev_hold 25 dst_ifdown
Jan 11 16:47:01 <hostname> kernel: [ 3954.802605] lodebug: dev_hold 26 dst_ifdown
Jan 11 16:47:01 <hostname> kernel: [ 3954.802609] lodebug: dev_hold 27 dst_ifdown
Jan 11 16:47:01 <hostname> kernel: [ 3955.482563] lodebug: dev_put 27 dst_destroy
Jan 11 16:47:01 <hostname> kernel: [ 3955.482566] lodebug: dev_put 26 dst_destroy
Jan 11 16:47:01 <hostname> kernel: [ 3955.482568] lodebug: dev_put 25 dst_destroy
Jan 11 16:47:01 <hostname> kernel: [ 3955.482570] lodebug: dev_put 24 dst_destroy
Jan 11 16:47:01 <hostname> kernel: [ 3955.482572] lodebug: dev_put 23 dst_destroy
Jan 11 16:47:01 <hostname> kernel: [ 3955.482574] lodebug: dev_put 22 dst_destroy


Jan 11 16:49:20 <hostname> kernel: [ 4093.651505] lodebug: dev_put 21 neigh_destroy
Jan 11 16:49:20 <hostname> kernel: [ 4093.653397] lodebug: dev_put 20 qdisc_destroy
Jan 11 16:49:20 <hostname> kernel: [ 4093.653432] lodebug: dev_put 19 neigh_parms_release
Jan 11 16:49:20 <hostname> kernel: [ 4093.653463] lodebug: dev_put 18 rx_queue_release
Jan 11 16:49:20 <hostname> kernel: [ 4093.653474] lodebug: dev_put 17 netdev_queue_release
Jan 11 16:49:20 <hostname> kernel: [ 4093.653520] lodebug: dev_put 16 rollback_registered_many
Jan 11 16:49:20 <hostname> kernel: [ 4093.663424] lodebug: dev_put 15 free_fib_info_rcu
Jan 11 16:49:20 <hostname> kernel: [ 4093.667401] lodebug: dev_put 14 free_fib_info_rcu
Jan 11 16:49:20 <hostname> kernel: [ 4093.667430] lodebug: dev_put 13 dst_destroy
Jan 11 16:49:20 <hostname> kernel: [ 4093.667432] lodebug: dev_put 12 dst_destroy
Jan 11 16:49:20 <hostname> kernel: [ 4093.667434] lodebug: dev_put 11 dst_destroy
Jan 11 16:49:20 <hostname> kernel: [ 4093.667436] lodebug: dev_put 10 dst_destroy
Jan 11 16:49:20 <hostname> kernel: [ 4093.667438] lodebug: dev_put 9 dst_destroy
Jan 11 16:49:20 <hostname> kernel: [ 4093.667441] lodebug: dev_put 8 dst_destroy
Jan 11 16:49:20 <hostname> kernel: [ 4093.667446] lodebug: dev_put 7 dst_destroy
Jan 11 16:49:20 <hostname> kernel: [ 4093.667453] lodebug: dev_put 6 dst_destroy
Jan 11 16:49:20 <hostname> kernel: [ 4093.667459] lodebug: dev_put 5 dst_destroy
Jan 11 16:49:20 <hostname> kernel: [ 4093.667461] lodebug: dev_put 4 dst_destroy
Jan 11 16:49:20 <hostname> kernel: [ 4093.667463] lodebug: dev_put 3 dst_destroy
Jan 11 16:49:20 <hostname> kernel: [ 4093.667478] lodebug: dev_put 2 in_dev_finish_destroy


Jan 11 16:49:20 <hostname> kernel: [ 4093.667493] lodebug: dev_hold 2 dst_ifdown
Jan 11 16:49:20 <hostname> kernel: [ 4093.667495] lodebug: dev_put 2 dst_ifdown
Jan 11 16:49:21 <hostname> kernel: [ 4094.691406] lodebug: dev_hold 2 dst_ifdown
Jan 11 16:49:21 <hostname> kernel: [ 4094.691409] lodebug: dev_put 2 dst_ifdown
Jan 11 16:49:22 <hostname> kernel: [ 4095.715386] lodebug: dev_hold 2 dst_ifdown
Jan 11 16:49:22 <hostname> kernel: [ 4095.715390] lodebug: dev_put 2 dst_ifdown
Jan 11 16:49:23 <hostname> kernel: [ 4096.739367] lodebug: dev_hold 2 dst_ifdown
Jan 11 16:49:23 <hostname> kernel: [ 4096.739370] lodebug: dev_put 2 dst_ifdown
Jan 11 16:49:24 <hostname> kernel: [ 4097.763338] lodebug: dev_hold 2 dst_ifdown
Jan 11 16:49:24 <hostname> kernel: [ 4097.763342] lodebug: dev_put 2 dst_ifdown
Jan 11 16:49:25 <hostname> kernel: [ 4098.787312] lodebug: dev_hold 2 dst_ifdown
Jan 11 16:49:25 <hostname> kernel: [ 4098.787315] lodebug: dev_put 2 dst_ifdown
Jan 11 16:49:26 <hostname> kernel: [ 4099.811293] lodebug: dev_hold 2 dst_ifdown
Jan 11 16:49:26 <hostname> kernel: [ 4099.811297] lodebug: dev_put 2 dst_ifdown
Jan 11 16:49:27 <hostname> kernel: [ 4100.835270] lodebug: dev_hold 2 dst_ifdown
Jan 11 16:49:27 <hostname> kernel: [ 4100.835273] lodebug: dev_put 2 dst_ifdown
Jan 11 16:49:28 <hostname> kernel: [ 4101.859249] lodebug: dev_hold 2 dst_ifdown
Jan 11 16:49:28 <hostname> kernel: [ 4101.859252] lodebug: dev_put 2 dst_ifdown
Jan 11 16:49:29 <hostname> kernel: [ 4102.883228] lodebug: dev_hold 2 dst_ifdown
Jan 11 16:49:29 <hostname> kernel: [ 4102.883231] lodebug: dev_put 2 dst_ifdown
Jan 11 16:49:30 <hostname> kernel: [ 4103.907196] unregister_netdevice: waiting for lodebug to become free. Usage count = 1
Jan 11 16:49:30 <hostname> kernel: [ 4103.916753] lodebug: dev_hold 2 dst_ifdown
Jan 11 16:49:30 <hostname> kernel: [ 4103.916755] lodebug: dev_put 2 dst_ifdown
Jan 11 16:49:31 <hostname> kernel: [ 4104.939171] lodebug: dev_hold 2 dst_ifdown
Jan 11 16:49:31 <hostname> kernel: [ 4104.939174] lodebug: dev_put 2 dst_ifdown
Jan 11 16:49:32 <hostname> kernel: [ 4105.963147] lodebug: dev_hold 2 dst_ifdown
Jan 11 16:49:32 <hostname> kernel: [ 4105.963150] lodebug: dev_put 2 dst_ifdown
Jan 11 16:49:33 <hostname> kernel: [ 4106.987135] lodebug: dev_hold 2 dst_ifdown
Jan 11 16:49:33 <hostname> kernel: [ 4106.987138] lodebug: dev_put 2 dst_ifdown
Jan 11 16:49:34 <hostname> kernel: [ 4108.011109] lodebug: dev_hold 2 dst_ifdown
Jan 11 16:49:34 <hostname> kernel: [ 4108.011112] lodebug: dev_put 2 dst_ifdown
Jan 11 16:49:35 <hostname> kernel: [ 4109.035084] lodebug: dev_hold 2 dst_ifdown
Jan 11 16:49:35 <hostname> kernel: [ 4109.035087] lodebug: dev_put 2 dst_ifdown
Jan 11 16:49:36 <hostname> kernel: [ 4110.059057] lodebug: dev_hold 2 dst_ifdown
Jan 11 16:49:36 <hostname> kernel: [ 4110.059060] lodebug: dev_put 2 dst_ifdown
Jan 11 16:49:37 <hostname> kernel: [ 4111.095053] lodebug: dev_hold 2 dst_ifdown
Jan 11 16:49:37 <hostname> kernel: [ 4111.095056] lodebug: dev_put 2 dst_ifdown
Jan 11 16:49:38 <hostname> kernel: [ 4112.119017] lodebug: dev_hold 2 dst_ifdown
Jan 11 16:49:38 <hostname> kernel: [ 4112.119020] lodebug: dev_put 2 dst_ifdown
Jan 11 16:49:39 <hostname> kernel: [ 4113.142987] lodebug: dev_hold 2 dst_ifdown
Jan 11 16:49:39 <hostname> kernel: [ 4113.142990] lodebug: dev_put 2 dst_ifdown
Jan 11 16:49:40 <hostname> kernel: [ 4114.166957] unregister_netdevice: waiting for lodebug to become free. Usage count = 1
Jan 11 16:49:40 <hostname> kernel: [ 4114.176525] lodebug: dev_hold 2 dst_ifdown
Jan 11 16:49:40 <hostname> kernel: [ 4114.176527] lodebug: dev_put 2 dst_ifdown
Jan 11 16:49:41 <hostname> kernel: [ 4115.198945] lodebug: dev_hold 2 dst_ifdown
Jan 11 16:49:41 <hostname> kernel: [ 4115.198949] lodebug: dev_put 2 dst_ifdown
Jan 11 16:49:42 <hostname> kernel: [ 4116.222927] lodebug: dev_hold 2 dst_ifdown
Jan 11 16:49:42 <hostname> kernel: [ 4116.222931] lodebug: dev_put 2 dst_ifdown
Jan 11 16:49:43 <hostname> kernel: [ 4117.246894] lodebug: dev_hold 2 dst_ifdown
Jan 11 16:49:43 <hostname> kernel: [ 4117.246897] lodebug: dev_put 2 dst_ifdown


Thanks,
Kevin
Comment 16 Stephen Hemminger 2018-01-05 20:56:24 UTC
(In reply to pegerto from comment #13)
> Hi,
> 
> According to the docker issue is raise by creating destroying multiple
> containers, form my undertanding is a raise condition as this is not
> replicable with a single cpu vm.
> 
> I reproduced the issue with CoreOS 1068.8.0 over VirtualBox, I can take a
> kernel dump from the hypervisor if this helps.
> 
> Regards.

Remember that this upstream kernel bugzilla.
If any out of tree modules like VirtualBox are present then the bug is suspect and will be marked as INVALID.
Comment 17 dE 2018-03-26 07:58:26 UTC
Happens with 4.9 Debian kernel too.
Comment 18 mai 2018-12-06 06:38:26 UTC
The system ubuntu16.0.4 don't fix it,and 
$ uname -a
Linux hostname-test 4.4.0-112-generic #135-Ubuntu SMP Fri Jan 19 11:48:36 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
$ docker version
Client:
 Version:           18.06.1-ce
 API version:       1.38
 Go version:        go1.10.3
 Git commit:        e68fc7a
 Built:             Tue Aug 21 17:24:56 2018
 OS/Arch:           linux/amd64
 Experimental:      false

Server:
 Engine:
  Version:          18.06.1-ce
  API version:      1.38 (minimum version 1.12)
  Go version:       go1.10.3
  Git commit:       e68fc7a
  Built:            Tue Aug 21 17:23:21 2018
  OS/Arch:          linux/amd64
  Experimental:     false
$ sudo dmesg |tail -20
[268985.837007] unregister_netdevice: waiting for veth34afdb4 to become free. Usage count = 1
[268995.856958] unregister_netdevice: waiting for veth34afdb4 to become free. Usage count = 1
[269005.868864] unregister_netdevice: waiting for veth34afdb4 to become free. Usage count = 1
[269015.876471] unregister_netdevice: waiting for veth34afdb4 to become free. Usage count = 1
[269026.128173] unregister_netdevice: waiting for veth34afdb4 to become free. Usage count = 1
[269036.143902] unregister_netdevice: waiting for veth34afdb4 to become free. Usage count = 1
[269046.387621] unregister_netdevice: waiting for veth34afdb4 to become free. Usage count = 1
[269056.411322] unregister_netdevice: waiting for veth34afdb4 to become free. Usage count = 1
[269066.431065] unregister_netdevice: waiting for veth34afdb4 to become free. Usage count = 1
[269076.438783] unregister_netdevice: waiting for veth34afdb4 to become free. Usage count = 1
[269086.446489] unregister_netdevice: waiting for veth34afdb4 to become free. Usage count = 1
[269096.490222] unregister_netdevice: waiting for veth34afdb4 to become free. Usage count = 1
[269106.505820] unregister_netdevice: waiting for veth34afdb4 to become free. Usage count = 1
[269116.545378] unregister_netdevice: waiting for veth34afdb4 to become free. Usage count = 1
[269126.564839] unregister_netdevice: waiting for veth34afdb4 to become free. Usage count = 1
[269136.568241] unregister_netdevice: waiting for veth34afdb4 to become free. Usage count = 1
[269146.571673] unregister_netdevice: waiting for veth34afdb4 to become free. Usage count = 1
[269156.827151] unregister_netdevice: waiting for veth34afdb4 to become free. Usage count = 1
[269167.090594] unregister_netdevice: waiting for veth34afdb4 to become free. Usage count = 1
[269177.330099] unregister_netdevice: waiting for veth34afdb4 to become free. Usage count = 1
Comment 19 Jean Delvare 2018-12-14 15:57:17 UTC
(In reply to Kevin from comment #15)
> Jan 11 16:17:44 <hostname> kernel: [ 2197.614240] lodebug: dev_hold 11
> dst_init
> 
> (after __neigh_create function call, the reference count for loopback device
> is incorrectly increased to 13)
> 
> Jan 11 16:20:41 <hostname> kernel: [ 2374.898917] lodebug: dev_hold 13
> __neigh_create

This is hard to believe or explain. The only way to increase the counter is to call dev_hold(), and whenever dev_hold() is called, a message saying so would be logged. The only explanation I can think of would be if the memory area holding the per-cpu counters would have been accidentally overwritten (say by overrun of an array in nearby memory).

But a more likely explanation is that there is a missing line in the log, either due to copy and paste error, or because somehow the message never made it to the log buffer (are there still contexts in which printk doesn't work?)

Question: do you see the same apparent +2 counter increase whenever you hit the "waiting for <dev> to become free" error, or was it a one-time thing?