Bug 204669 - TLS: TLS record double free
Summary: TLS: TLS record double free
Status: NEW
Alias: None
Product: Networking
Classification: Unclassified
Component: IPV4 (show other bugs)
Hardware: All Linux
: P1 blocking
Assignee: Stephen Hemminger
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2019-08-22 12:34 UTC by Mallesham
Modified: 2019-09-09 04:57 UTC (History)
2 users (show)

See Also:
Kernel Version: v5.3-rc4
Subsystem:
Regression: No
Bisected commit-id:


Attachments
tls crash log (11.01 KB, text/plain)
2019-08-22 12:34 UTC, Mallesham
Details

Description Mallesham 2019-08-22 12:34:10 UTC
Created attachment 284565 [details]
tls crash log

TLS module is crashing While running SSL record encryption using Klts_send_[file] 

Precondition:
1) Installed 5.3-rc4.
2) Nitrox5 card pluggin.


Steps to produce the issue:
1) Install n5pf.ko.(drivers/crypto/cavium/nitrox)
2) Install tls.ko if not is installed by default(net/tls)
3) Taken uperf tool from git.
   3.1) Modified uperf to use tls module by using setsocket.
   3.2) Modified uperf tool to support sendfile with SSL.
 

Test:
1) Running uperf with 4threads.
2) Each Thread send the data using sendfile over SSL protocol.


After few seconds kernel is crashing because of record list corruption


[  270.888952] ------------[ cut here ]------------
[  270.890450] list_del corruption, ffff91cc3753a800->prev is LIST_POISON2 (dead000000000122)
[  270.891194] WARNING: CPU: 1 PID: 7387 at lib/list_debug.c:50 __list_del_entry_valid+0x62/0x90
[  270.892037] Modules linked in: n5pf(OE) netconsole tls(OE) bonding intel_rapl_msr intel_rapl_common sb_edac x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm iTCO_wdt iTCO_vendor_support irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel crypto_simd mei_me cryptd glue_helper ipmi_si sg mei lpc_ich pcspkr joydev ioatdma i2c_i801 ipmi_devintf ipmi_msghandler wmi ip_tables xfs libcrc32c sd_mod mgag200 drm_vram_helper ttm drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm isci libsas ahci scsi_transport_sas libahci crc32c_intel serio_raw igb libata ptp pps_core dca i2c_algo_bit dm_mirror dm_region_hash dm_log dm_mod [last unloaded: nitrox_drv]
[  270.896836] CPU: 1 PID: 7387 Comm: uperf Kdump: loaded Tainted: G           OE     5.3.0-rc4 #1
[  270.897711] Hardware name: Supermicro SYS-1027R-N3RF/X9DRW, BIOS 3.0c 03/24/2014
[  270.898597] RIP: 0010:__list_del_entry_valid+0x62/0x90
[  270.899478] Code: 00 00 00 c3 48 89 fe 48 89 c2 48 c7 c7 e0 f9 ee 8d e8 b2 cf c8 ff 0f 0b 31 c0 c3 48 89 fe 48 c7 c7 18 fa ee 8d e8 9e cf c8 ff <0f> 0b 31 c0 c3 48 89 f2 48 89 fe 48 c7 c7 50 fa ee 8d e8 87 cf c8
[  270.901321] RSP: 0018:ffffb6ea86eb7c20 EFLAGS: 00010282
[  270.902240] RAX: 0000000000000000 RBX: ffff91cc3753c000 RCX: 0000000000000000
[  270.903157] RDX: ffff91bc3f867080 RSI: ffff91bc3f857738 RDI: ffff91bc3f857738
[  270.904074] RBP: ffff91bc36020940 R08: 0000000000000560 R09: 0000000000000000
[  270.904988] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
[  270.905902] R13: ffff91cc3753a800 R14: ffff91cc37cc6400 R15: ffff91cc3753a800
[  270.906809] FS:  00007f454a88d700(0000) GS:ffff91bc3f840000(0000) knlGS:0000000000000000
[  270.907715] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  270.908606] CR2: 00007f453c00292c CR3: 000000103554e003 CR4: 00000000001606e0
[  270.909490] Call Trace:
[  270.910373]  tls_tx_records+0x138/0x1c0 [tls]
[  270.911262]  tls_sw_sendpage+0x3e0/0x420 [tls]
[  270.912154]  inet_sendpage+0x52/0x90
[  270.913045]  ? direct_splice_actor+0x40/0x40
[  270.913941]  kernel_sendpage+0x1a/0x30
[  270.914831]  sock_sendpage+0x20/0x30
[  270.915714]  pipe_to_sendpage+0x62/0x90
[  270.916592]  __splice_from_pipe+0x80/0x180
[  270.917461]  ? direct_splice_actor+0x40/0x40
[  270.918334]  splice_from_pipe+0x5d/0x90
[  270.919208]  direct_splice_actor+0x35/0x40
[  270.920086]  splice_direct_to_actor+0x103/0x230
[  270.920966]  ? generic_pipe_buf_nosteal+0x10/0x10
[  270.921850]  do_splice_direct+0x9a/0xd0
[  270.922733]  do_sendfile+0x1c9/0x3d0
[  270.923612]  __x64_sys_sendfile64+0x5c/0xc0
                      

(gdb) list *(tls_tx_records+0x138)
0x2d18 is in tls_tx_records (./include/linux/list.h:131).
126	 * Note: list_empty() on entry does not return true after this, the entry is
127	 * in an undefined state.
128	 */
129	static inline void __list_del_entry(struct list_head *entry)
130	{
131		if (!__list_del_entry_valid(entry))
132			return;
133	
134		__list_del(entry->prev, entry->next);
135	}
(gdb) 
(gdb) list *(tls_sw_sendpage+0x3e0)
0x48e0 is in tls_sw_sendpage (/home/mjatharkonda/5_3_rc4/tls/tls_sw.c:1211).
1206	
1207		if (num_async) {
1208			/* Transmit if any encryptions have completed */
1209			if (test_and_clear_bit(BIT_TX_SCHEDULED, &ctx->tx_bitmask)) {
1210				cancel_delayed_work(&ctx->tx_work.work);
1211				tls_tx_records(sk, flags);
1212			}
1213		}
1214	sendpage_end:
1215		ret = sk_stream_error(sk, flags, ret);
(gdb) 



Attached complete crash log
Comment 1 Mallesham 2019-09-02 13:22:23 UTC
Hi All,

observing the race condition in the following case(skb memory shortage):
When am running performance test, most of time we are hitting the case(skb memory shortage) in tls and tcp layer code.

wait_for_sndbuf:
                set_bit(SOCK_NOSPACE, &sk->sk_socket->flags);
wait_for_memory:
                ret = sk_stream_wait_memory(sk, &timeo);
                if (ret) {
                        tls_trim_both_msgs(sk, msg_pl->sg.size);
                        goto sendpage_end;
                }

sk_stream_wait_memory calls sk_wait_event, sk_wait_event release lock and acquire lock.

root cause:
if the enqueue thread completing the request, in the tcplayer if the skb shortage observed, enqueue thread release_sock, work_handler is scheduling hence both enqueue thread and work_handler in the critical section. Because of this record double free observed.

Please suggest, what other locking mechanisms can be used.

Note You need to log in before you can comment on or make changes to this bug.