Bug 43132 - system hangs right after interface receives ip from dhcp, message "task kworker/1:3:622 blocked for more than 120 seconds"
Summary: system hangs right after interface receives ip from dhcp, message "task kwork...
Status: VERIFIED CODE_FIX
Alias: None
Product: Drivers
Classification: Unclassified
Component: Network (show other bugs)
Hardware: All Linux
: P1 high
Assignee: drivers_network@kernel-bugs.osdl.org
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2012-04-20 09:08 UTC by Igor
Modified: 2012-08-12 09:34 UTC (History)
3 users (show)

See Also:
Kernel Version: 3.3, 3.4.0-rc3
Subsystem:
Regression: Yes
Bisected commit-id:


Attachments
config file, lspci, dmesg, lspci, version atd more debug info. (59.01 KB, text/plain)
2012-04-20 09:08 UTC, Igor
Details
lspci, dmesg and kernel config (48.19 KB, application/x-zip-compressed)
2012-04-29 15:25 UTC, Floris Mouwen
Details
lspci -vvv after the problem with mtu 9014 and kernel 3.4.0 (19.24 KB, application/octet-stream)
2012-05-03 21:40 UTC, Floris Mouwen
Details
lspci -vvv from working system, kernel 3.0.0-17 (14.62 KB, text/plain)
2012-05-04 06:49 UTC, Igor
Details
the lspci -vvv content *after* issue occur, kernel 3.4.0-030400rc3 (14.68 KB, text/plain)
2012-05-04 07:05 UTC, Igor
Details
e1000_main.c.patch (753 bytes, patch)
2012-05-04 20:30 UTC, Tushar
Details | Diff
dmesg (50.04 KB, application/octet-stream)
2012-05-05 07:19 UTC, Floris Mouwen
Details

Description Igor 2012-04-20 09:08:48 UTC
Created attachment 72997 [details]
config file, lspci, dmesg, lspci, version atd more debug info.

System hangs when interface, that use e1000 driver,  configured to receive IP from dhcp.

Message:
[  242.556028] INFO: task kworker/1:3:622 blocked for more than 120 seconds.
[  242.562825] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  242.570661] kworker/1:3     D ffffffff8180cb40     0   622      2 0x00000000
[  242.570675]  ffff8801a172dad0 0000000000000046 ffff8801a172dfd8 00000000000137c0
[  242.570695]  ffff8801a172c010 00000000000137c0 00000000000137c0 00000000000137c0
[  242.570716]  ffff8801a172dfd8 00000000000137c0 ffff8801b69116e0 ffff8801a16344a0
[  242.570738] Call Trace:
[  242.570755]  [<ffffffff81669d39>] schedule+0x29/0x70
[  242.570765]  [<ffffffff81667f8d>] schedule_timeout+0x1fd/0x2e0
[  242.570777]  [<ffffffff8108d5ba>] ? update_curr+0x14a/0x1e0
[  242.570788]  [<ffffffff81669b8b>] wait_for_common+0xdb/0x180
[  242.570799]  [<ffffffff8108ecb8>] ? idle_balance+0xf8/0x150
[  242.570809]  [<ffffffff81086d90>] ? try_to_wake_up+0x2d0/0x2d0
[  242.570819]  [<ffffffff8166aacf>] ? _raw_spin_lock_irqsave+0x2f/0x40
[  242.570829]  [<ffffffff81669d0d>] wait_for_completion+0x1d/0x20
[  242.570839]  [<ffffffff8106fd51>] wait_on_work+0x1a1/0x1b0
[  242.570849]  [<ffffffff8106e0d0>] ? do_work_for_cpu+0x30/0x30
[  242.570858]  [<ffffffff8106fe7d>] __cancel_work_timer+0x4d/0x170
[  242.570869]  [<ffffffff810e1321>] ? synchronize_irq+0x51/0xf0
[  242.570878]  [<ffffffff8106ffd0>] cancel_work_sync+0x10/0x20
[  242.570915]  [<ffffffffa004fff5>] e1000_down_and_stop+0x25/0x50 [e1000]
[  242.570933]  [<ffffffffa005554f>] e1000_down+0x14f/0x230 [e1000]
[  242.570954]  [<ffffffffa0055b50>] ? e1000_change_mtu+0x1c0/0x1c0 [e1000]
[  242.570972]  [<ffffffffa0055bcd>] e1000_reset_task+0x7d/0xa0 [e1000]
[  242.570983]  [<ffffffff8106ecdb>] process_one_work+0x12b/0x470
[  242.570993]  [<ffffffff81071846>] worker_thread+0x176/0x420
[  242.571002]  [<ffffffff810716d0>] ? manage_workers+0x120/0x120
[  242.571011]  [<ffffffff8107639e>] kthread+0x9e/0xb0
[  242.571023]  [<ffffffff81674464>] kernel_thread_helper+0x4/0x10
[  242.571033]  [<ffffffff81076300>] ? kthread_freezable_should_stop+0x70/0x70
[  242.571043]  [<ffffffff81674460>] ? gs_change+0x13/0x13
[  362.568027] INFO: task kworker/1:3:622 blocked for more than 120 seconds.

Looks like there is a deadlock in e1000 driver. This lock happened when eth1 , that use e1000 driver, configured to receive ip dynamically, from dhcp server. No hangs happened when interface works with static ip. Same hardware is working stable with 3.0.0 kernel.

Same bug reported in debian Bug#665693
http://lists.debian.org/debian-kernel/2012/03/msg00811.html.

Relevant discussion in LKML:
https://lkml.org/lkml/2011/11/17/434

It looks like patch from vanilla did NOT solve the problem.
https://git.kernel.org/?p=linux/kernel/git/torvalds/linux.git;a=commitdiff;h=3a3847e007aae732d64d8fd1374126393e9879a3;hp=1032c736e81cdf490ae62f86da7efe67c3c3e61d

I have tested this on ubuntu's unmodified mainline kernels v3.3 and 3.4.0-rc3
https://wiki.ubuntu.com/KernelMainlineBuilds

Same problem found in Ubuntu's kernel 3.2.0

Last kernel that is working for me is 3.0.0



Last kernel t
Comment 1 Igor 2012-04-20 09:19:53 UTC
I could not reproduce same error on old machine with single socket P4 32 bit CPU.
Comment 2 Floris Mouwen 2012-04-29 15:24:37 UTC
I can confirm this problem. On my linux box i have 2 network interfaces. Eth1 is using the e1000 driver. On Ubuntu 11.10 with kernel 3.0.0 there are no problems. After the upgrade to Ubuntu 12.04 the e1000 device is not working and on the console i get the same task kworker messages. I tried the latest 3.4.0-rc4 kernel without Ubuntu patches. The problem is still there.
I have a dhclient running on eth0 (RTL8168e/8111e) for my isp cable connection. On network device eth1 (e1000) my private IPv4 range and a IPv6 range with DHCP, DHCPv6, radvd running. After the booting of the 3.2 or 3.4 rc4 kernel is completed the e1000 networking device stops working. Sometimes i see during booting the message that the e1000 device has been reset.

I also attached lspci, dmesg logs and my kernelconfig.
Comment 3 Floris Mouwen 2012-04-29 15:25:46 UTC
Created attachment 73121 [details]
lspci, dmesg and kernel config
Comment 4 Tushar 2012-04-30 21:06:34 UTC
FYI, I start looking into details.
Comment 5 Tushar 2012-04-30 22:48:29 UTC
Has anybody reproduced with 3.3.4 kernel?
Comment 6 Floris Mouwen 2012-05-01 21:42:19 UTC
Today i compiled the 3.3.4 kernel without ubuntu patches and the e1000 driver is still not working. Here is the dmesg:

[  241.068051] INFO: task kworker/3:1:34 blocked for more than 120 seconds.
[  241.068056] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  241.068061] kworker/3:1     D 0000000000000003     0    34      2 0x00000000
[  241.068071]  ffff8801395f9b00 0000000000000046 ffff880134c69cd8 0000000000000000
[  241.068080]  ffff880139b944d0 ffff8801395f9fd8 ffff8801395f9fd8 ffff8801395f9fd8
[  241.068088]  ffff880139b90000 ffff880139b944d0 0000000000000002 7fffffffffffffff
[  241.068095] Call Trace:
[  241.068110]  [<ffffffff8164ea6f>] schedule+0x3f/0x60
[  241.068118]  [<ffffffff8164d0a5>] schedule_timeout+0x2a5/0x320
[  241.068128]  [<ffffffff81088b93>] ? dequeue_entity+0x123/0x300
[  241.068136]  [<ffffffff8164e8af>] wait_for_common+0xdf/0x180
[  241.068143]  [<ffffffff81081340>] ? try_to_wake_up+0x2c0/0x2c0
[  241.068150]  [<ffffffff8164ea2d>] wait_for_completion+0x1d/0x20
[  241.068158]  [<ffffffff8106c0d1>] wait_on_work+0x191/0x1a0
[  241.068164]  [<ffffffff8106a280>] ? do_work_for_cpu+0x30/0x30
[  241.068171]  [<ffffffff8106d63e>] __cancel_work_timer+0x8e/0x150
[  241.068178]  [<ffffffff8106d730>] cancel_work_sync+0x10/0x20
[  241.068215]  [<ffffffffa0111645>] e1000_down_and_stop+0x25/0x50 [e1000]
[  241.068230]  [<ffffffffa011531f>] e1000_down+0x14f/0x200 [e1000]
[  241.068244]  [<ffffffffa01181c0>] ? e1000_change_mtu+0x1c0/0x1c0 [e1000]
[  241.068258]  [<ffffffffa011822e>] e1000_reset_task+0x6e/0x90 [e1000]
[  241.068266]  [<ffffffff8106ceea>] process_one_work+0x11a/0x480
[  241.068273]  [<ffffffff8106dc84>] worker_thread+0x164/0x370
[  241.068280]  [<ffffffff8106db20>] ? manage_workers.isra.28+0x230/0x230
[  241.068286]  [<ffffffff81072463>] kthread+0x93/0xa0
[  241.068293]  [<ffffffff81659024>] kernel_thread_helper+0x4/0x10
[  241.068300]  [<ffffffff810723d0>] ? kthread_freezable_should_stop+0x70/0x70
[  241.068307]  [<ffffffff81659020>] ? gs_change+0x13/0x13
[  264.804132] device eth1 entered promiscuous mode
[  302.415376] device eth1 left promiscuous mode
[  361.068045] INFO: task kworker/3:1:34 blocked for more than 120 seconds.
[  361.070780] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  361.073628] kworker/3:1     D 0000000000000003     0    34      2 0x00000000
[  361.073646]  ffff8801395f9b00 0000000000000046 ffff880134c69cd8 0000000000000000
[  361.073676]  ffff880139b944d0 ffff8801395f9fd8 ffff8801395f9fd8 ffff8801395f9fd8
[  361.073705]  ffff880139b90000 ffff880139b944d0 0000000000000002 7fffffffffffffff
[  361.073735] Call Trace:
[  361.073756]  [<ffffffff8164ea6f>] schedule+0x3f/0x60
[  361.073775]  [<ffffffff8164d0a5>] schedule_timeout+0x2a5/0x320
[  361.073796]  [<ffffffff81088b93>] ? dequeue_entity+0x123/0x300
[  361.073816]  [<ffffffff8164e8af>] wait_for_common+0xdf/0x180
[  361.073837]  [<ffffffff81081340>] ? try_to_wake_up+0x2c0/0x2c0
[  361.073856]  [<ffffffff8164ea2d>] wait_for_completion+0x1d/0x20
[  361.073876]  [<ffffffff8106c0d1>] wait_on_work+0x191/0x1a0
[  361.073894]  [<ffffffff8106a280>] ? do_work_for_cpu+0x30/0x30
[  361.073913]  [<ffffffff8106d63e>] __cancel_work_timer+0x8e/0x150
[  361.073933]  [<ffffffff8106d730>] cancel_work_sync+0x10/0x20
[  361.073978]  [<ffffffffa0111645>] e1000_down_and_stop+0x25/0x50 [e1000]
[  361.074006]  [<ffffffffa011531f>] e1000_down+0x14f/0x200 [e1000]
[  361.074034]  [<ffffffffa01181c0>] ? e1000_change_mtu+0x1c0/0x1c0 [e1000]
[  361.074062]  [<ffffffffa011822e>] e1000_reset_task+0x6e/0x90 [e1000]
[  361.074083]  [<ffffffff8106ceea>] process_one_work+0x11a/0x480
[  361.074103]  [<ffffffff8106dc84>] worker_thread+0x164/0x370
[  361.074122]  [<ffffffff8106db20>] ? manage_workers.isra.28+0x230/0x230
[  361.074142]  [<ffffffff81072463>] kthread+0x93/0xa0
[  361.074160]  [<ffffffff81659024>] kernel_thread_helper+0x4/0x10
[  361.074180]  [<ffffffff810723d0>] ? kthread_freezable_should_stop+0x70/0x70
[  361.074200]  [<ffffffff81659020>] ? gs_change+0x13/0x13
Comment 7 Tushar 2012-05-02 00:31:18 UTC

(In reply to comment #6)
> Today i compiled the 3.3.4 kernel without ubuntu patches and the e1000 driver
> is still not working. Here is the dmesg:
> 
> [  241.068051] INFO: task kworker/3:1:34 blocked for more than 120 seconds.
> [  241.068056] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables
> this
> message.
> [  241.068061] kworker/3:1     D 0000000000000003     0    34      2
> 0x00000000
> [  241.068071]  ffff8801395f9b00 0000000000000046 ffff880134c69cd8
> 0000000000000000
> [  241.068080]  ffff880139b944d0 ffff8801395f9fd8 ffff8801395f9fd8
> ffff8801395f9fd8
> [  241.068088]  ffff880139b90000 ffff880139b944d0 0000000000000002
> 7fffffffffffffff
> [  241.068095] Call Trace:
> [  241.068110]  [<ffffffff8164ea6f>] schedule+0x3f/0x60
> [  241.068118]  [<ffffffff8164d0a5>] schedule_timeout+0x2a5/0x320
> [  241.068128]  [<ffffffff81088b93>] ? dequeue_entity+0x123/0x300
> [  241.068136]  [<ffffffff8164e8af>] wait_for_common+0xdf/0x180
> [  241.068143]  [<ffffffff81081340>] ? try_to_wake_up+0x2c0/0x2c0
> [  241.068150]  [<ffffffff8164ea2d>] wait_for_completion+0x1d/0x20
> [  241.068158]  [<ffffffff8106c0d1>] wait_on_work+0x191/0x1a0
> [  241.068164]  [<ffffffff8106a280>] ? do_work_for_cpu+0x30/0x30
> [  241.068171]  [<ffffffff8106d63e>] __cancel_work_timer+0x8e/0x150
> [  241.068178]  [<ffffffff8106d730>] cancel_work_sync+0x10/0x20
> [  241.068215]  [<ffffffffa0111645>] e1000_down_and_stop+0x25/0x50 [e1000]
> [  241.068230]  [<ffffffffa011531f>] e1000_down+0x14f/0x200 [e1000]
> [  241.068244]  [<ffffffffa01181c0>] ? e1000_change_mtu+0x1c0/0x1c0 [e1000]
> [  241.068258]  [<ffffffffa011822e>] e1000_reset_task+0x6e/0x90 [e1000]
> [  241.068266]  [<ffffffff8106ceea>] process_one_work+0x11a/0x480
> [  241.068273]  [<ffffffff8106dc84>] worker_thread+0x164/0x370
> [  241.068280]  [<ffffffff8106db20>] ? manage_workers.isra.28+0x230/0x230
> [  241.068286]  [<ffffffff81072463>] kthread+0x93/0xa0
> [  241.068293]  [<ffffffff81659024>] kernel_thread_helper+0x4/0x10
> [  241.068300]  [<ffffffff810723d0>] ?
> kthread_freezable_should_stop+0x70/0x70
> [  241.068307]  [<ffffffff81659020>] ? gs_change+0x13/0x13
> [  264.804132] device eth1 entered promiscuous mode
> [  302.415376] device eth1 left promiscuous mode
> [  361.068045] INFO: task kworker/3:1:34 blocked for more than 120 seconds.
> [  361.070780] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables
> this
> message.
> [  361.073628] kworker/3:1     D 0000000000000003     0    34      2
> 0x00000000
> [  361.073646]  ffff8801395f9b00 0000000000000046 ffff880134c69cd8
> 0000000000000000
> [  361.073676]  ffff880139b944d0 ffff8801395f9fd8 ffff8801395f9fd8
> ffff8801395f9fd8
> [  361.073705]  ffff880139b90000 ffff880139b944d0 0000000000000002
> 7fffffffffffffff
> [  361.073735] Call Trace:
> [  361.073756]  [<ffffffff8164ea6f>] schedule+0x3f/0x60
> [  361.073775]  [<ffffffff8164d0a5>] schedule_timeout+0x2a5/0x320
> [  361.073796]  [<ffffffff81088b93>] ? dequeue_entity+0x123/0x300
> [  361.073816]  [<ffffffff8164e8af>] wait_for_common+0xdf/0x180
> [  361.073837]  [<ffffffff81081340>] ? try_to_wake_up+0x2c0/0x2c0
> [  361.073856]  [<ffffffff8164ea2d>] wait_for_completion+0x1d/0x20
> [  361.073876]  [<ffffffff8106c0d1>] wait_on_work+0x191/0x1a0
> [  361.073894]  [<ffffffff8106a280>] ? do_work_for_cpu+0x30/0x30
> [  361.073913]  [<ffffffff8106d63e>] __cancel_work_timer+0x8e/0x150
> [  361.073933]  [<ffffffff8106d730>] cancel_work_sync+0x10/0x20
> [  361.073978]  [<ffffffffa0111645>] e1000_down_and_stop+0x25/0x50 [e1000]
> [  361.074006]  [<ffffffffa011531f>] e1000_down+0x14f/0x200 [e1000]
> [  361.074034]  [<ffffffffa01181c0>] ? e1000_change_mtu+0x1c0/0x1c0 [e1000]
> [  361.074062]  [<ffffffffa011822e>] e1000_reset_task+0x6e/0x90 [e1000]
> [  361.074083]  [<ffffffff8106ceea>] process_one_work+0x11a/0x480
> [  361.074103]  [<ffffffff8106dc84>] worker_thread+0x164/0x370
> [  361.074122]  [<ffffffff8106db20>] ? manage_workers.isra.28+0x230/0x230
> [  361.074142]  [<ffffffff81072463>] kthread+0x93/0xa0
> [  361.074160]  [<ffffffff81659024>] kernel_thread_helper+0x4/0x10
> [  361.074180]  [<ffffffff810723d0>] ?
> kthread_freezable_should_stop+0x70/0x70
> [  361.074200]  [<ffffffff81659020>] ? gs_change+0x13/0x13

I have 3.3.4 installed however I don't have this issue occurring on my system.
Is there any error message driver logs in dmesg log right before printing call trace with kworker?
Comment 8 Tushar 2012-05-02 01:33:23 UTC
I don't see much in dmesg log (attached) but do see MTU gets change.
"eth1 changing MTU from 1500 to 9014"
 
Is there anything in your system's network config that help causing this issue to occur that you want to share?
Comment 9 Floris Mouwen 2012-05-02 06:09:14 UTC
This linux box is used for iscsi,smbd, IPv4 NAT and IPv6 router (with tunnel to a IPv6 Broker). eth0 uses dhclient for dynamic ip adresses on my internet cable modem (public address). eth1 is my internal network. For IPv4 clients on eth1 i use dhcpd. For IPv6 radvd and dhcpd is running on eth1. For the IPv6 tunnel i use aiccu to a dutch tunnel broker.

network interfaces
eth0: RTL8168e/8111e at 0xffffc90000650000, 1c:6f:65:5d:06:82, XID 0c200000 IRQ 42
eth1: (PCI:33MHz:32-bit) 00:1b:21:8d:b9:b1
eth1: Intel(R) PRO/1000 Network Connection

ifconfig:
eth0      Link encap:Ethernet  HWaddr 1c:6f:65:5d:06:82
          inet addr:94.209.xxx.xxx  Bcast:255.255.255.255  Mask:255.255.248.0
          UP BROADCAST RUNNING MULTICAST  MTU:576  Metric:1
          RX packets:167095 errors:0 dropped:0 overruns:0 frame:0
          TX packets:96044 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:150744445 (150.7 MB)  TX bytes:8718928 (8.7 MB)
          Interrupt:42

eth2      Link encap:Ethernet  HWaddr 9c:eb:e8:04:7c:31
          inet addr:192.168.0.1  Bcast:192.168.0.255  Mask:255.255.255.0
          inet6 addr: fe80::9eeb:e8ff:fe04:7c31/64 Scope:Link
          inet6 addr: 2001:xxxx:xxxx::xxxx/64 Scope:Global
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:162151 errors:0 dropped:0 overruns:0 frame:0
          TX packets:314136 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:14495563 (14.4 MB)  TX bytes:384336984 (384.3 MB)

lo        Link encap:Local Loopback
          inet addr:127.0.0.1  Mask:255.0.0.0
          inet6 addr: ::1/128 Scope:Host
          UP LOOPBACK RUNNING  MTU:16436  Metric:1
          RX packets:1207 errors:0 dropped:0 overruns:0 frame:0
          TX packets:1207 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0
          RX bytes:101145 (101.1 KB)  TX bytes:101145 (101.1 KB)

sixxs     Link encap:UNSPEC  HWaddr 00-00-00-00-00-00-00-00-00-00-00-00-00-00-00                                                                                                                     -00
          inet6 addr: fe80::4b8:2ff:2fa:2/64 Scope:Link
          inet6 addr: 2001:7b8:xxxx:xxxx::xxxx/64 Scope:Global
          UP POINTOPOINT RUNNING NOARP MULTICAST  MTU:1280  Metric:1
          RX packets:4000 errors:0 dropped:0 overruns:0 frame:0
          TX packets:2649 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:500
          RX bytes:3393260 (3.3 MB)  TX bytes:791482 (791.4 KB)

route -n
0.0.0.0         94.209.xxxx.1     0.0.0.0         UG    100    0        0 eth0
94.209.xxxx.0     0.0.0.0         255.255.248.0   U     0      0        0 eth0
192.168.0.0     0.0.0.0         255.255.255.0   U     0      0        0 eth1

route -6 -n
2001:7b8:xxxx:xxxx::/64          ::                         U    256 0     1 sixxs
2001:7b8:xxxx::/64             ::                         U    256 0     0 eth1
fe80::/64                      ::                         U    256 0     0 eth1
fe80::/64                      ::                         U    256 0     0 sixxs
::/0                           2001:7b8:2ff:2fa::1        UG   1024 0     0 sixxs
::/0                           ::                         !n   -1  1  4476 lo
::1/128                        ::                         Un   0   1    43 lo
2001:7b8:xxxx:xxxx::/128         ::                         Un   0   1     0 lo
2001:7b8:xxxx:xxxx::2/128        ::                         Un   0   1  2042 lo
2001:7b8:xxxx::/128            ::                         Un   0   1     0 lo
2001:7b8:xxxx::xxxx/128           ::                         Un   0   1   432 lo
fe80::/128                     ::                         Un   0   1     0 lo
fe80::/128                     ::                         Un   0   1     0 lo
fe80::4b8:2ff:2fa:2/128        ::                         Un   0   1     0 lo
fe80::9eeb:e8ff:fe04:7c31/128  ::                         Un   0   1    61 lo
ff00::/8                       ::                         U    256 0     0 eth1
ff00::/8                       ::                         U    256 0     0 sixxs
::/0                           ::                         !n   -1  1  4476 lo
Comment 10 Floris Mouwen 2012-05-02 06:11:02 UTC
The eth2 in my ifconfig is a temporary network card right. Normally it is on the Intel e1000 network card on eth1.
Comment 11 Igor 2012-05-02 06:23:33 UTC
Deadlock happened to me only when I use DHCP client to get an IPv4 address from DHCP server (same reported in debian Bug#665693).

Ubuntu (and Debian, I think) use dhclient3 for dynamic IP configuration which is part of isc-dhcp-client package.

dpkg -s isc-dhcp-client
Package: isc-dhcp-client
Maintainer: Ubuntu Developers <ubuntu-devel-discuss@lists.ubuntu.com>
Source: isc-dhcp
Version: 4.1.ESV-R4-0ubuntu5
Provides: dhcp3-client

Command line that is running:
dhclient3 -e IF_METRIC=100 -pf /var/run/dhclient.eth1.pid -lf /var/lib/dhcp/dhclient.eth1.leases -1 eth1

Pay attention that same user space utilities and configuration works properly with 3.0.0-17 kernel.
Comment 12 Tushar 2012-05-03 19:46:11 UTC
Can somebody upload the lspci -vvv content *after* issue occur?
Comment 13 Floris Mouwen 2012-05-03 20:56:50 UTC
I noticed (and it was not in my ifconfig output) that with the default 1500 mtu there is no problem. With mtu 9014 i get these problems. mtu 9014 always worked with kernel 3.0 and before. lspci -vvv output after the issue (with 9014 mtu) will be here in a moment.
Comment 14 Floris Mouwen 2012-05-03 21:40:52 UTC
Created attachment 73174 [details]
lspci -vvv after the problem with mtu 9014 and kernel 3.4.0
Comment 15 Tushar 2012-05-04 00:11:56 UTC
(In reply to comment #13)
> I noticed (and it was not in my ifconfig output) that with the default 1500
> mtu
> there is no problem. With mtu 9014 i get these problems. mtu 9014 always
> worked
> with kernel 3.0 and before. lspci -vvv output after the issue (with 9014 mtu)
> will be here in a moment.

This is good to know. As I suspected. I am going to try repro again then with MTU 9014.
Comment 16 Tushar 2012-05-04 00:29:21 UTC
no repro on 3.3.4 with MTU=9014.
I will burn Ubuntu 12.04 Desktop 64bit tomorrow and will try repro.
Comment 17 Igor 2012-05-04 06:49:18 UTC
Created attachment 73178 [details]
lspci -vvv from working system, kernel 3.0.0-17
Comment 18 Igor 2012-05-04 07:05:11 UTC
Created attachment 73179 [details]
the lspci -vvv content *after* issue occur, kernel 3.4.0-030400rc3
Comment 19 Igor 2012-05-04 07:06:48 UTC
Pay attention to the message about adapter reset in dmesg:

e1000 0000:04:02.1: eth1: Reset adapter

Maybe related to the problem.
Comment 20 Tushar 2012-05-04 07:18:37 UTC
(In reply to comment #19)
> Pay attention to the message about adapter reset in dmesg:
> 
> e1000 0000:04:02.1: eth1: Reset adapter
> 
> Maybe related to the problem.

Alright this makes more sense now. Looks like reset does not complete successfully. I think it would make more sense to find out why reset occurs. DO you have more info in dmesg log about reset? Is there Tx hang messages in dmesg log?
Comment 21 Tushar 2012-05-04 07:58:00 UTC
(In reply to comment #20)
> (In reply to comment #19)
> > Pay attention to the message about adapter reset in dmesg:
> > 
> > e1000 0000:04:02.1: eth1: Reset adapter
> > 
> > Maybe related to the problem.
> 
> Alright this makes more sense now. Looks like reset does not complete
> successfully. I think it would make more sense to find out why reset occurs.
> DO
> you have more info in dmesg log about reset? Is there Tx hang messages in
> dmesg
> log?

Igor, 
I think I see what's going on. I am not in front of my linux box right now. I will send you patch to test tomorrow morning.
Comment 22 Igor 2012-05-04 13:02:26 UTC
No, there is no "TX hang" message in dmesg.

BTW all messages attached.
Comment 23 Tushar 2012-05-04 20:30:27 UTC
Created attachment 73189 [details]
e1000_main.c.patch 

test patch for the issue.
Comment 24 Tushar 2012-05-04 20:32:05 UTC
I have attached a patch - e1000_main.c.patch - for testing.
Please try this patch.
Comment 25 Floris Mouwen 2012-05-05 07:19:02 UTC
Created attachment 73190 [details]
dmesg

I have compiled kernel 3.4.0-rc5 with the e1000_main.c.patch and it looks like that jumbo frames is working again now. When i enable jumbo frames the network device is still working and i don't get any task kworker messages.
In my dmesg you will still see a [    9.348211] e1000 0000:03:00.0: eth1: Reset adapter message but the patch prevents a deadlock.

I also try to compile this on the Kernel 3.2 source with Ubuntu patches. My iscsitarget does not compile on the 3.4.0-rc5 headers. Within a couple of hours i can give you that result as well.
Comment 26 Igor 2012-05-05 09:04:57 UTC
I can confirm that patch is working. No more blocked kworker task. Interface receives ip from dhcp, working as expected and stable.

I have tested 3.4.0-rc3 kernel.

Tushar, thank you a lot!
Comment 27 Floris Mouwen 2012-05-05 11:31:18 UTC
This patch is also working on the 3.2.14 kernel.

And also a big thanks from me Tushar!
Comment 28 Florian Mickler 2012-08-12 09:30:40 UTC
A patch referencing this bug report has been merged in Linux v3.4:

commit 8ce6909f77ba1b7bcdea65cc2388fd1742b6d669
Author: Tushar Dave <tushar.n.dave@intel.com>
Date:   Thu May 17 01:04:50 2012 +0000

    e1000: Prevent reset task killing itself.
Comment 29 Florian Mickler 2012-08-12 09:34:36 UTC
A patch referencing this bug report has been merged in Linux v3.4:

commit 39c2028531332cab1325637c2100f3189fa1be72
Merge: 5c7dd71 8ce6909
Author: Linus Torvalds <torvalds@linux-foundation.org>
Date:   Thu May 17 16:30:26 2012 -0700

Note You need to log in before you can comment on or make changes to this bug.