Created attachment 102321 [details] Screen photo with kernel panic message I type "ssh use@thehost -L 3389:127.0.0.1:3389" and suddenly the kernel panic. Networking setup is rather tricky: 1. thehost is tunneled thought the ipip6 tunnel; 2. the ipip6 tunnel is protected by IPsec; 3. IPv6 connection is forwarded thought veth to other network namespace where it gets into TUN-based tunnelling program. OCR-ed text from the photo: <upper lines trimmed and can't scroll up due to the panic> [621700.153844] [<c0508eb3>] ? nf_iterate+Ox3e/Ox6b [621700.153981] W868a29d>1 ? xfrm6_prepare_output+0x45/0x45 [ipu6] [621700.154118] [<c05081'37>] ? nf_hook_slow+0x57/0xec [621700.154260] [a868a330>] xfrm6_output+0x93/Oxf4 [ipu6] [621700.154414] W868a475>1 xfrm6_output+0x24/0x5f [ipu6] [621700.157013] W868a29d>1 ? xfrm6_prepare_output+0x45/0x45 Eipu6] [621700.157013] W8663d17>1 ip6_loca1_out+0x20/0x23 [ipu6] [621700.157013] [adl3e0a4>] ip6_tn1_xmit2+0x37e/Ox4le [ip6_tunne1] [621700.157013] Eadl3e471>1 ip6_tn1_xmit+Oxe8/0x29a [ip6_tunnel] [621700.157013] [<c04eb93f>] deu_hard_start_xmit+Ox28d/Ox426 [621700.157013] [<c04ebf03>1 deuqueue_xmit+Ox268/0x3lf [621700.157013] [<c0508eb3>] ? nf_iterate+Ox3e/Ox6b [621700.157013] [<c04f16bc>] neigh_direct_output+Oxf/Ox11 [621700.1570131 [<c0510eef>] ip_finish_output+Ox2f2/0x37c [621700.157013] [<c0508f37>] ? nf_hook_slow+Ox57/0xec [621700.157013] [<c0510bfd>1 ? ip_fragment+Ox7a1/0x7a1 [621700.157013] [<c0511edl>] ip_output+Ox78/Oxbb [621700.157013] [<c0510bfd>] ? ip_fragment+Ox7a1/0x7a1 [621700.1570131 [<c05117dd>1 ip_local_out+Ox20/0x23 [621700.157013] [<c0511a91>] ip_gueue_xmit+Ox2b1/0x30b [621700.157013] [<c0523644>1 tcp_transmit_skb+0x67f/Ox6d9 [621700.157013] [<c0523e5c>1 tcp write_xmit+Ox732/0x8lf [621700.157013] [<c0523f81>] tcp_push_one+0x38/0x3a [621700.157013] [<c0519270] tcp_sendmsg+0x854/0xab1 [621700.157013] [<c0535e6b>] inet_sendmsg+0x54/0x7e [621700.157013] [<c04d9864>] sock_aio_write+Oxb9/0xdO [621700.157013] [<cOldd7e9>1 do_sync_write+0x84/0xcl [621700.157013] [<cOldde08>] ufs_write+Ox9f/Ox144 [621700.157013] [CcOlde09b>1 sys_write+Ox41/0x6c [621700.157013] [<c058cb6f>1 syscall_call+0x7/Oxb [621700.157013] Code: de e8 el c2 e5 c7 58 5d 5b 5e 5f 5d c3 55 89 e5 57 56 53 8d 64 24 fie 3e ed 74 26 00 89 45 f0 8b 80 414 01 0 0 00 89 55 e8 89 4d ec <f6> 40 62 01 Of 84 c8 00 00 00 31 c9 c7 04 24 ff ff ff ff ba 20 [621700.157013] EIP: W8684c71>1 ipub_local error+Oxle/Oxf9 Eipu6] 33 :ESP 0068:d89dda60 [621700.157013] CR2: 0000000000000062 [621700.283679] Kernel panic - not syncing: Fatal exception in interrupt [621700.287422] drm_kms_helper: panic occurred, switching back to text console
Reproduced again. The backtrace differs a bit (for example, between tcp_transmit_skb and tcp_write_xmit I also see "fuse_lookup_name+0x189/0x9b [fuse]").
Note: "ping thehost" works normally. "nc thehost 22" also. But "ssh thehost" fails (even before asking the password, etc.). Adding "nosmp" kernel option does not fix the problem. Shall I try reproducing with netconsole?
Reproduced in simplified environment (without my tunneling program, without network namespaces, without veth): ipip6 tunnel -> IPsec (configured manually using setkey) -> Miredo -> usb0 network to Android.
Captured more accurate backtrace using netconsole: [ 2741.588826] BUG: unable to handle kernel NULL pointer dereference at 00000062 [ 2741.589125] IP: [<f85b5c71>] ipv6_local_error+0x1e/0xf9 [ipv6] [ 2741.589383] *pde = 00000000 [ 2741.589496] Oops: 0000 [#1] SMP [ 2741.589629] Modules linked in: ip6_tunnel tunnel6 ah6 esp6 xfrm6_mode_transport netconsole rndis_host cdc_ether usbnet xt_REDIRECT xt_state veth devsysrq(O) arptable_filter arp_tables xt_TCPMSS xt_tcpudp ipt_MASQUERADE nf_conntrack_ipv4 nf_nat xt_conntrack xt_multiport ip6table_filter ipt_REJECT ipt_ULOG tifm_sd frandom(O) cfbfillrect drm_kms_helper twofish_generic twofish_common serpent_sse2_i586 gf128mul ablk_helper blowfish_common xcbc xfrm_algo md_mod firewire_core cordic snd_hda_codec_realtek sg snd_mixer_oss snd_seq_dummy uvcvideo snd_seq_midi_event snd_seq videobuf2_memops ssb crc16 ehci_pci ehci_hcd pci_hotplug [last unloaded: netconsole][ 2741.595074] Pid: 14084, comm: ssh Tainted: G O 3.8.3 #12 ASUSTeK Computer INC. 1015PEM/1015PE [ 2741.595400] EIP: 0060:[<f85b5c71>] EFLAGS: 00210286 CPU: 2 [ 2741.595629] EIP is at ipv6_local_error+0x1e/0xf9 [ipv6] [ 2741.595800] EAX: 00000000 EBX: f50d3f00 ECX: f1c73a88 EDX: 0000005a [ 2741.596000] ESI: f16958dc EDI: f1c73aac EBP: f1c73a7c ESP: f1c73a60 [ 2741.596201] DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068 [ 2741.596376] CR0: 80050033 CR2: 00000062 CR3: 32c17000 CR4: 000007d0 [ 2741.596576] DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000 [ 2741.596775] DR6: ffff0ff0 DR7: 00000400 [ 2741.596778] Process ssh (pid: 14084, ti=f1c72000 task=f1f10000 task.ti=f1c72000) [ 2741.596778] Stack: [ 2741.596778] f28bc5d0 0000005a f1c73a88 f50d3f00 f50d3f00 f16958dc f1c73aac f1c73ad4 [ 2741.596778] f85bb22b f1c73ab4 70040120[ 2741.596778] 0050d67b 00000004 00000000[ 2741.596778] [<f85bb22b>] xfrm6_local_error+0x4a/0x64 [ipv6] [ 2741.596778] [<c0508eb3>] ? nf_iterate+0x3e/0x6b [ 2741.596778] [<f85bb29d>] ? xfrm6_prepare_output+0x45/0x45 [ipv6] [ 2741.596778] [<c0508f37>] ? nf_hook_slow+0x57/0xec [ 2741.596778] [<f85bb330>] __xfrm6_output+0x93/0xf4 [ipv6] [ 2741.596778] [<f85bb475>] xfrm6_output+0x24/0x5f [ipv6] [ 2741.596778] [<f85bb29d>] ? xfrm6_prepare_output+0x45/0x45 [ipv6] [ 2741.596778] [<f8594d17>] ip6_local_out+0x20/0x23 [ipv6] [ 2741.596778] [<fbb230a4>] ip6_tnl_xmit2+0x37e/0x41e [ip6_tunnel] [ 2741.596778] [<fbb23471>] ip6_tnl_xmit+0xe8/0x29a [ip6_tunnel] [ 2741.596778] [<c04eb93f>] dev_hard_start_xmit+0x28d/0x426 [ 2741.596778] [<c04ebf03>] dev_queue_xmit+0x268/0x31f [ 2741.596778] [<c0508eb3>] ? nf_iterate+0x3e/0x6b [ 2741.596778] [<c04f16bc>] neigh_direct_output+0xf/0x11 [ 2741.596778] [<c0510eef>] ip_finish_output+0x2f2/0x37c [ 2741.596778] [<c0508f37>] ? nf_hook_slow+0x57/0xec [ 2741.596778] [<c0510bfd>] ? ip_fragment+0x7a1/0x7a1 [ 2741.596778] [<c0511ed1>] ip_output+0x78/0xbb [ 2741.596778] [<c0510bfd>] ? ip_fragment+0x7a1/0x7a1 [ 2741.596778] [<c05117dd>] ip_local_out+0x20/0x23 [ 2741.596778] [<c0511a91>] ip_queue_xmit+0x2b1/0x30b [ 2741.596778] [<c0523644>] tcp_transmit_skb+0x67f/0x6d9 [ 2741.596778] [<c0523e5c>] tcp_write_xmit+0x732/0x81f [ 2741.596778] [<c0523f81>] tcp_push_one+0x38/0x3a [ 2741.596778] [<c0519270>] tcp_sendmsg+0x854/0xab1 [ 2741.596778] [<c0535e6b>] inet_sendmsg+0x54/0x7e [ 2741.596778] [<c04d9864>] sock_aio_write+0xb9/0xd0 [ 2741.596778] [<c01dd7e9>] do_sync_write+0x84/0xc4 [ 2741.596778] [<c01dde08>] vfs_write+0x9f/0x144 [ 2741.596778] [<c01de09b>] sys_write+0x41/0x6c [ 2741.596778] [<c058cb6f>] syscall_call+0x7/0xb [ 2741.596778] Code: d8 e8 e1 b2 f2 c7 58 5a 5b 5e 5f 5d c3 55 89 e5 57 56 24 00 a4 00 89 55 e8 89 4d ec <f6> 40 62 01 0f 84 c8 00 00 00 31 c9 c7 ff [ 2741.596778] EIP: [<f85b5c71>] ipv6_local_error+0x1e/0xf9 [ipv6] SS:ESP 0068:f1c73a60 [ 2741.596778] CR2: 0000000000000062 [ 2741.800783] ---[ end trace 0e559811e06204e1 ]--- [ 2741.800818] Kernel panic - not syncing: Fatal exception in interrupt
One more important thing: the underlying interface also need to be configured using RAs. Created script to automatically reproduce the bug: https://gist.github.com/vi/5640512
Without IPsec it does not panic, but still works poorly (SSH connection fails to go beyond initial version exchange). When I was writing the script there was also a kernel panic triggered by exiting a shell (that caused network namespace with that tunnel to disappear).
Created attachment 102711 [details] Kernel+initramfs for Qemu to reproduce the bug Created a initramfs image based on files from my system that shows the panic in action.
Updated backtrace (using Qemu, not "Not tainted"): [ 0.000000] tsc: Fast TSC calibration failed [ 2.357897] Failed to access perfctr msr (MSR c1 is 0) [ 54.600710] BUG: unable to handle kernel NULL pointer dereference at 00000062 [ 54.600710] IP: [<c90f0c71>] ipv6_local_error+0x1e/0xf9 [ipv6] [ 54.600710] *pde = 00000000 [ 54.600710] Oops: 0000 [#1] SMP [ 54.600710] Modules linked in: ip6_tunnel esp6 ah6 af_key xfrm6_tunnel tunnel6 xfrm6_mode_ro xfrm6_mode_transport ipv6 xfrm_algo veth crypto_null netconsole [ 54.600710] Pid: 1396, comm: ssh Not tainted 3.8.3 #12 Bochs Bochs [ 54.600710] EIP: 0060:[<c90f0c71>] EFLAGS: 00200286 CPU: 0 [ 54.600710] EIP is at ipv6_local_error+0x1e/0xf9 [ipv6] [ 54.600710] EAX: 00000000 EBX: c7940520 ECX: c7949a88 EDX: 0000005a [ 54.600710] ESI: c795f0dc EDI: c7949aac EBP: c7949a7c ESP: c7949a60 [ 54.600710] DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068 [ 54.600710] CR0: 80050033 CR2: 00000062 CR3: 0797b000 CR4: 00000690 [ 54.600710] DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000 [ 54.600710] DR6: 00000000 DR7: 00000000 [ 54.600710] Process ssh (pid: 1396, ti=c7948000 task=c7954b00 task.ti=c7948000) [ 54.600710] Stack: [ 54.600710] fffffffa 0000005a c7949a88 c7940520 c7940520 c795f0dc c7949aac c7949ad4 [ 54.600710] c90f622b 000004c6 00000000 00000000 c7949a98 c0133bd4 c7949ac4 0000fc00 [ 54.600710] 00000000 00000000 01000000 c7910000 c7949ae0 c7949acc c53308e0 00000528 [ 54.600710] Call Trace: [ 54.600710] [<c90f622b>] xfrm6_local_error+0x4a/0x64 [ipv6] [ 54.600710] [<c0133bd4>] ? irq_exit+0x94/0x96 [ 54.600710] [<c90f6330>] __xfrm6_output+0x93/0xf4 [ipv6] [ 54.600710] [<c90f6475>] xfrm6_output+0x24/0x5f [ipv6] [ 54.600710] [<c90f6451>] ? xfrm6_output_finish+0x1a/0x1a [ipv6] [ 54.600710] [<c90cfd17>] ip6_local_out+0x20/0x23 [ipv6] [ 54.600710] [<c93880a4>] ip6_tnl_xmit2+0x37e/0x41e [ip6_tunnel] [ 54.600710] [<c9388471>] ip6_tnl_xmit+0xe8/0x29a [ip6_tunnel] [ 54.600710] [<c04eb93f>] dev_hard_start_xmit+0x28d/0x426 [ 54.600710] [<c015535d>] ? sched_clock_local+0x13/0x178 [ 54.600710] [<c04ebf03>] dev_queue_xmit+0x268/0x31f [ 54.600710] [<c04f16bc>] neigh_direct_output+0xf/0x11 [ 54.600710] [<c0510eef>] ip_finish_output+0x2f2/0x37c [ 54.600710] [<c0511ed1>] ip_output+0x78/0xbb [ 54.600710] [<c0133bd4>] ? irq_exit+0x94/0x96 [ 54.600710] [<c0103ad5>] ? do_IRQ+0x8d/0xa2 [ 54.600710] [<c05117dd>] ip_local_out+0x20/0x23 [ 54.600710] [<c0511a91>] ip_queue_xmit+0x2b1/0x30b [ 54.600710] [<c0592bb3>] ? common_interrupt+0x33/0x38 [ 54.600710] [<c0523644>] tcp_transmit_skb+0x67f/0x6d9 [ 54.600710] [<c0592bb3>] ? common_interrupt+0x33/0x38 [ 54.600710] [<c0523e5c>] tcp_write_xmit+0x732/0x81f [ 54.600710] [<c0523f81>] tcp_push_one+0x38/0x3a [ 54.600710] [<c0519270>] tcp_sendmsg+0x854/0xab1 [ 54.600710] [<c015a0d0>] ? task_tick_fair+0x521/0x5ac [ 54.600710] [<c0535e6b>] inet_sendmsg+0x54/0x7e [ 54.600710] [<c04d9864>] sock_aio_write+0xb9/0xd0 [ 54.600710] [<c01dd7e9>] do_sync_write+0x84/0xc4 [ 54.600710] [<c01dde08>] vfs_write+0x9f/0x144 [ 54.600710] [<c01de09b>] sys_write+0x41/0x6c [ 54.600710] [<c058cb6f>] syscall_call+0x7/0xb [ 54.600710] Code: d8 e8 e1 02 3f f7 58 5a 5b 5e 5f 5d c3 55 89 e5 57 56 53 8d 64 24 f0 3e 8d 74 26 00 89 45 f0 8b 80 a4 01 00 00 89 55 e8 89 4d ec <f6> 40 62 01 0f 84 c8 00 00 00 31 c9 c7 04 24 ff ff ff ff ba 20 [ 54.600710] EIP: [<c90f0c71>] ipv6_local_error+0x1e/0xf9 [ipv6] SS:ESP 0068:c7949a60 [ 54.600710] CR2: 0000000000000062 [ 56.295790] ---[ end trace b41673c4b7b4e010 ]--- [ 56.298407] Kernel panic - not syncing: Fatal exception in interrupt [ 56.303504] general protection fault: fffa [#2] SMP [ 56.303504] Modules linked in: ip6_tunnel esp6 ah6 af_key xfrm6_tunnel tunnel6 xfrm6_mode_ro xfrm6_mode_transport ipv6 xfrm_algo veth crypto_null netconsole [ 56.303504] Pid: 1396, comm: ssh Tainted: G D 3.8.3 #12 Bochs Bochs [ 56.303504] EIP: 0060:[<c0585eb6>] EFLAGS: 00200246 CPU: 0 [ 56.303504] EIP is at panic+0x144/0x180 [ 56.303504] EAX: 00000000 EBX: 00000009 ECX: 0000008b EDX: 00200046 [ 56.303504] ESI: 00000000 EDI: 00000000 EBP: c7949910 ESP: c79498f8 [ 56.303504] DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068 [ 56.303504] CR0: 80050033 CR2: 00000062 CR3: 0797b000 CR4: 00000690 [ 56.303504] DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000 [ 56.303504] DR6: 00000000 DR7: 00000000 [ 56.303504] Process ssh (pid: 1396, ti=c7948000 task=c7954b00 task.ti=c7948000) [ 56.303504] Stack: [ 56.303504] c06fd766 c0b6af40 00000009 00000009 00200246 c7949a24 c7949928 c058d7d6 [ 56.303504] c06f2a97 00000009 00000062 c7949a24 c7949954 c058594c c06fc92f 00000062 [ 56.303504] 00000000 00200246 00000000 c7949954 00000000 c7954b00 c7949a24 c794998c [ 56.303504] Call Trace: [ 56.303504] [<c058d7d6>] oops_end+0x95/0xa2 [ 56.303504] [<c058594c>] no_context+0x17d/0x186 [ 56.303504] [<c0585c31>] __bad_area_nosemaphore+0x132/0x13b [ 56.303504] [<c883d594>] ? veth_xmit+0x4e/0x90 [veth] [ 56.303504] [<c058f7e2>] ? __do_page_fault+0x436/0x436 [ 56.303504] [<c0585c52>] bad_area_nosemaphore+0x18/0x1a [ 56.303504] [<c058f777>] __do_page_fault+0x3cb/0x436 [ 56.303504] [<c010727e>] ? sched_clock+0x8/0xb [ 56.303504] [<c015535d>] ? sched_clock_local+0x13/0x178 [ 56.303504] [<c058f7e2>] ? __do_page_fault+0x436/0x436 [ 56.303504] [<c058f7ef>] do_page_fault+0xd/0xf [ 56.303504] [<c058d0cf>] error_code+0x67/0x6c [ 56.303504] [<c90f0c71>] ? ipv6_local_error+0x1e/0xf9 [ipv6] [ 56.303504] [<c90f622b>] xfrm6_local_error+0x4a/0x64 [ipv6] [ 56.303504] [<c0133bd4>] ? irq_exit+0x94/0x96 [ 56.303504] [<c90f6330>] __xfrm6_output+0x93/0xf4 [ipv6] [ 56.303504] [<c90f6475>] xfrm6_output+0x24/0x5f [ipv6] [ 56.303504] [<c90f6451>] ? xfrm6_output_finish+0x1a/0x1a [ipv6] [ 56.303504] [<c90cfd17>] ip6_local_out+0x20/0x23 [ipv6] [ 56.303504] [<c93880a4>] ip6_tnl_xmit2+0x37e/0x41e [ip6_tunnel] [ 56.303504] [<c9388471>] ip6_tnl_xmit+0xe8/0x29a [ip6_tunnel] [ 56.303504] [<c04eb93f>] dev_hard_start_xmit+0x28d/0x426 [ 56.303504] [<c015535d>] ? sched_clock_local+0x13/0x178 [ 56.303504] [<c04ebf03>] dev_queue_xmit+0x268/0x31f [ 56.303504] [<c04f16bc>] neigh_direct_output+0xf/0x11 [ 56.303504] [<c0510eef>] ip_finish_output+0x2f2/0x37c [ 56.303504] [<c0511ed1>] ip_output+0x78/0xbb [ 56.303504] [<c0133bd4>] ? irq_exit+0x94/0x96 [ 56.303504] [<c0103ad5>] ? do_IRQ+0x8d/0xa2 [ 56.303504] [<c05117dd>] ip_local_out+0x20/0x23 [ 56.303504] [<c0511a91>] ip_queue_xmit+0x2b1/0x30b [ 56.303504] [<c0592bb3>] ? common_interrupt+0x33/0x38 [ 56.303504] [<c0523644>] tcp_transmit_skb+0x67f/0x6d9 [ 56.303504] [<c0592bb3>] ? common_interrupt+0x33/0x38 [ 56.303504] [<c0523e5c>] tcp_write_xmit+0x732/0x81f [ 56.303504] [<c0523f81>] tcp_push_one+0x38/0x3a [ 56.303504] [<c0519270>] tcp_sendmsg+0x854/0xab1 [ 56.303504] [<c015a0d0>] ? task_tick_fair+0x521/0x5ac [ 56.303504] [<c0535e6b>] inet_sendmsg+0x54/0x7e [ 56.303504] [<c04d9864>] sock_aio_write+0xb9/0xd0 [ 56.303504] [<c01dd7e9>] do_sync_write+0x84/0xc4 [ 56.303504] [<c01dde08>] vfs_write+0x9f/0x144 [ 56.303504] [<c01de09b>] sys_write+0x41/0x6c [ 56.303504] [<c058cb6f>] syscall_call+0x7/0xb [ 56.303504] Code: e7 65 df ff 8b 55 f0 4a 75 ed 83 c3 64 69 05 34 af b6 c0 e8 03 00 00 39 c3 7c be 83 3d 34 af b6 c0 00 74 05 e8 14 80 bb ff fb 90 <8d> 74 26 00 31 db 39 fb 7c 13 83 f6 01 89 f0 ff 15 28 af b6 c0 [ 56.303504] EIP: [<c0585eb6>] panic+0x144/0x180 SS:ESP 0068:c79498f8 [ 56.303504] ---[ end trace b41673c4b7b4e011 ]---
Meant 'Now "Not tainted"'...
Can you still reproduce this with a fresh kernel?
Created attachment 107030 [details] Photo of the kernel panic with v3.11-rc2 Reproduced with v3.11-rc2 (3b2f64d00c46e1e4e9bd0bb9bb12619adac27a4b). Shall I create "git bisect"-style script that will allow building the kernel and trying to reproduce (using QEmu) the bug [semi-]automatically?
Thanks, but I don't think I need more information for now. Please stand by to test patches.
(In reply to _Vi from comment #6) > Without IPsec it does not panic, but still works poorly (SSH connection > fails to go beyond initial version exchange). Can you also still reproduce this? There were recent changes to how MTU updates were propagated.
Discussion on netdev: http://thread.gmane.org/gmane.linux.network/277590
This pull request by Steffen Klassert carries the fixes to stop the panic: <http://article.gmane.org/gmane.linux.network/281469>