Bug 99301

Summary: socket shutdown of L2CAP ERTM channel causes hung tasks when S or I frame ACK is pending
Product: Drivers Reporter: Dean Jenkins (Dean_Jenkins)
Component: BluetoothAssignee: linux-bluetooth (linux-bluetooth)
Status: NEW ---    
Severity: blocking CC: Dean_Jenkins, szg00000
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: verified on kernels 3.8 to 4.0.4 and bluetooth-next Subsystem:
Regression: No Bisected commit-id:
Attachments: Avoids hung task based on kernel 3.18.14 (RPi)
Uses hold and put functions to protect sk and chan structures based on 3.18.14 (RPi)
l2cap_ertm_shutdown_deadlock_patches.tgz

Description Dean Jenkins 2015-06-01 08:19:13 UTC
We have managed to trigger a L2CAP ERTM hung task deadlock using the Bluez l2test utility.

So far we have caused L2CAP ERTM deadlocks in:
Our own project running a i.MX6 using kernel 3.8 
ARM Raspberry Pi running Raspian kernel 3.18.11+
x86 Dell laptop running Mageia 4 kernel 3.14.32
PC running Fedora with kernel 4.0.4 (this was a test with an iPhone)
sanjc verified the failure occurs on bluetooth-next git repo.

I have sent the kernel.org Bluetooth maintainers instructions on how to trigger the deadlock.
http://marc.info/?l=linux-bluetooth&m=143289425224598&w=2

The hung tasks look like the following traces, wait for about 10 minutes (need to wait for 2 times the hung task detector period plus L2CAP timer expiry time) after the initial hidden failure of __l2cap_wait_ack() occurs, dmesg shows:

From ARM Raspberry Pi:
[  601.116119] INFO: task kworker/0:1:19 blocked for more than 120 seconds.
[  601.116156]       Not tainted 3.18.11+ #781
[  601.116169] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  601.116183] kworker/0:1     D c054cbcc     0    19      2 0x00000000
[  601.116537] Workqueue: events l2cap_retrans_timeout [bluetooth]
[  601.116685] [<c054cbcc>] (__schedule) from [<c054cfa4>] (schedule+0x40/0x8c)
[  601.116727] [<c054cfa4>] (schedule) from [<c054d3c8>] (schedule_preempt_disabled+0x30/0x40)
[  601.116812] [<c054d3c8>] (schedule_preempt_disabled) from [<c054ee14>] (__mutex_lock_slowpath+0xb8/0x18c)
[  601.116856] [<c054ee14>] (__mutex_lock_slowpath) from [<c054ef20>] (mutex_lock+0x38/0x3c)
[  601.117085] [<c054ef20>] (mutex_lock) from [<bf05173c>] (l2cap_retrans_timeout+0x24/0x5c [bluetooth])
[  601.117371] [<bf05173c>] (l2cap_retrans_timeout [bluetooth]) from [<c003aa68>] (process_one_work+0x13c/0x444)
[  601.117410] [<c003aa68>] (process_one_work) from [<c003b954>] (worker_thread+0x190/0x580)
[  601.117445] [<c003b954>] (worker_thread) from [<c003fdb4>] (kthread+0xd8/0xf4)
[  601.117488] [<c003fdb4>] (kthread) from [<c000e8c8>] (ret_from_fork+0x14/0x20)
[  601.117569] INFO: task kworker/u3:2:327 blocked for more than 120 seconds.
[  601.117588]       Not tainted 3.18.11+ #781
[  601.117601] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  601.117614] kworker/u3:2    D c054cbcc     0   327      2 0x00000000
[  601.117806] Workqueue: hci0 hci_rx_work [bluetooth]
[  601.117941] [<c054cbcc>] (__schedule) from [<c054cfa4>] (schedule+0x40/0x8c)
[  601.117981] [<c054cfa4>] (schedule) from [<c054d3c8>] (schedule_preempt_disabled+0x30/0x40)
[  601.118056] [<c054d3c8>] (schedule_preempt_disabled) from [<c054ee14>] (__mutex_lock_slowpath+0xb8/0x18c)
[  601.118092] [<c054ee14>] (__mutex_lock_slowpath) from [<c054ef20>] (mutex_lock+0x38/0x3c)
[  601.118307] [<c054ef20>] (mutex_lock) from [<bf04d2c8>] (l2cap_get_chan_by_scid+0x28/0x98 [bluetooth])
[  601.118696] [<bf04d2c8>] (l2cap_get_chan_by_scid [bluetooth]) from [<bf054590>] (l2cap_recv_frame+0xa4/0x2cec [bluetooth])
[  601.119146] [<bf054590>] (l2cap_recv_frame [bluetooth]) from [<bf058194>] (l2cap_recv_acldata+0x214/0x22c [bluetooth])
[  601.119559] [<bf058194>] (l2cap_recv_acldata [bluetooth]) from [<bf02bf88>] (hci_rx_work+0x180/0x288 [bluetooth])
[  601.119801] [<bf02bf88>] (hci_rx_work [bluetooth]) from [<c003aa68>] (process_one_work+0x13c/0x444)
[  601.119839] [<c003aa68>] (process_one_work) from [<c003b954>] (worker_thread+0x190/0x580)
[  601.119883] [<c003b954>] (worker_thread) from [<c003fdb4>] (kthread+0xd8/0xf4)
[  601.119919] [<c003fdb4>] (kthread) from [<c000e8c8>] (ret_from_fork+0x14/0x20)

From x86 Mageia 4:
[15942.455042] INFO: task kworker/5:1:54 blocked for more than 120 seconds.
[15942.455049]       Tainted: G           O 3.14.32-desktop-1.mga4 #1
[15942.455051] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[15942.455054] kworker/5:1     D ffff88022e352e00     0    54      2 0x00000000
[15942.455102] Workqueue: events l2cap_retrans_timeout [bluetooth]
[15942.455104]  ffff8800cb9dfd70 0000000000000046 ffff880223ec9800 ffff8800cb9dffd8
[15942.455106]  0000000000012e00 0000000000012e00 ffff880223ec9800 ffff8801a66b2390
[15942.455108]  ffff8801a66b2394 ffff880223ec9800 00000000ffffffff ffff8801a66b2398
[15942.455110] Call Trace:
[15942.455115]  [<ffffffff81638969>] schedule_preempt_disabled+0x29/0x70
[15942.455117]  [<ffffffff8163a4a2>] __mutex_lock_slowpath+0x132/0x1b0
[15942.455119]  [<ffffffff8163a53f>] mutex_lock+0x1f/0x2f
[15942.455133]  [<ffffffffa07395d0>] l2cap_retrans_timeout+0x30/0x80 [bluetooth]
[15942.455137]  [<ffffffff8108166c>] process_one_work+0x17c/0x420
[15942.455139]  [<ffffffff81082281>] worker_thread+0x121/0x3b0
[15942.455141]  [<ffffffff81082160>] ? rescuer_thread+0x360/0x360
[15942.455144]  [<ffffffff81088922>] kthread+0xd2/0xf0
[15942.455145]  [<ffffffff81088850>] ? kthread_create_on_node+0x170/0x170
[15942.455148]  [<ffffffff8164488c>] ret_from_fork+0x7c/0xb0
[15942.455150]  [<ffffffff81088850>] ? kthread_create_on_node+0x170/0x170
[15942.455158] INFO: task kworker/u17:3:785 blocked for more than 120 seconds.
[15942.455159]       Tainted: G           O 3.14.32-desktop-1.mga4 #1
[15942.455160] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[15942.455161] kworker/u17:3   D ffff88022e212e00     0   785      2 0x00000000
[15942.455172] Workqueue: hci0 hci_rx_work [bluetooth]
[15942.455173]  ffff880222ec7c10 0000000000000046 ffff88021dce0000 ffff880222ec7fd8
[15942.455175]  0000000000012e00 0000000000012e00 ffff88021dce0000 ffff8801b6421350
[15942.455176]  ffff8801b6421354 ffff88021dce0000 00000000ffffffff ffff8801b6421358
[15942.455178] Call Trace:
[15942.455181]  [<ffffffff81638969>] schedule_preempt_disabled+0x29/0x70
[15942.455183]  [<ffffffff8163a4a2>] __mutex_lock_slowpath+0x132/0x1b0
[15942.455184]  [<ffffffff8163a53f>] mutex_lock+0x1f/0x2f
[15942.455198]  [<ffffffffa07346d7>] l2cap_get_chan_by_scid+0x27/0xa0 [bluetooth]
[15942.455212]  [<ffffffffa073e1be>] l2cap_recv_frame+0x3ae/0x29c0 [bluetooth]
[15942.455215]  [<ffffffff810915c8>] ? update_rq_clock.part.88+0x18/0x40
[15942.455229]  [<ffffffffa0741be3>] l2cap_recv_acldata+0x2c3/0x350 [bluetooth]
[15942.455240]  [<ffffffffa0715e96>] hci_rx_work+0x1b6/0x3c0 [bluetooth]
[15942.455242]  [<ffffffff8108166c>] process_one_work+0x17c/0x420
[15942.455244]  [<ffffffff81082281>] worker_thread+0x121/0x3b0
[15942.455246]  [<ffffffff81082160>] ? rescuer_thread+0x360/0x360
[15942.455248]  [<ffffffff81088922>] kthread+0xd2/0xf0
[15942.455250]  [<ffffffff81088850>] ? kthread_create_on_node+0x170/0x170
[15942.455252]  [<ffffffff8164488c>] ret_from_fork+0x7c/0xb0
[15942.455254]  [<ffffffff81088850>] ? kthread_create_on_node+0x170/0x170

To reproduce the failure do:

USING L2TEST
Raspberry Pi or x86 PC is the l2test client (unit under test)
laptop is the l2test server and has a BT_ADDR of <lap_BT_ADDR>

On the laptop use the command:
l2test -i hci0 -X ertm -d
This command listens for connections with optional L2CAP ERTM supported and dumps out the received data.

On the RPi or 86 PC use the command:
l2test -s -b 1024 -B /dev/urandom -X ertm -P 4113 <lap_BT_ADDR>
This command sends 1024 bytes to the laptop via PSM 4113 used by l2test over an ERTM channel

l2test -s -b 1024 -B /dev/urandom -X ertm -P 4113 5C:F3:70:68:BA:DD
l2test[10366]: Connected to 5C:F3:70:68:BA:DD (bredr, psm 4113, scid 64)
l2test[10366]: Local device 5C:F3:70:68:BA:DA (bredr, psm 4113, scid 64)
l2test[10366]: Options [imtu 672, omtu 672, flush_to 65535, mode 3, handle 11, class 0x000000, priority 0, rcvbuf 163840]
l2test[10366]: Sending ...
l2test[10366]: Closing channel ...

Using L2CAP ERTM, the channel is instructed to close but hangs.
l2test fails to terminate.

The defect is caused by __l2cap_wait_ack() when it waits for all pending L2CAP ACKs (S or I frame) but takes all the locks whilst waiting (sleeping) so blocks processing of received L2CAP frames and blocks the recovery L2CAP timers. It causes a deadlock as __l2cap_wait_ack() runs in an infinite loop which periodically sleeps every 200ms.
Comment 1 Dean Jenkins 2015-06-01 08:56:04 UTC
Usual case:
Unit under test:                              Mobile Phone
Establish ERTM channel
...
I frame: Media Browsing request --->
                                         <--- S (or I) frame ACK
...
                                         <--- I frame: Media data
S (or I) frame ACK --->
...
Close ERTM channel

Normally, the service will wait for the media data to arrive before closing the channel is requested. This case is OK

Failure case:
Unit under test:                              Mobile phone
Establish ERTM channel
...
I frame: Media Browsing request --->
Close ERTM channel
                                         <--- S (or I) frame ACK

The failure occurs when the userland decides to close the ERTM channel before the S (or I) frame ACK is processed. This means that userland is performing a scenario that causes the mediaplayer browsing request to be aborted.

So you can see that the window for failure is small as timing is critical. Processor loading probably influences the size of the failure window.

The l2test testcase mimics the failure case by closing the channel after sending some data and not waiting for the ACK; no return data is used. The failure is repeatable 100% so far using l2test as there is a design flaw in the architecture of the locking mechanism used in the L2CAP ERTM channel closure procedure via sock shutdown.
Comment 2 Dean Jenkins 2015-06-02 07:33:47 UTC
Created attachment 178541 [details]
Avoids hung task based on kernel 3.18.14 (RPi)

This is a proposal but has side effects as memory potentially could be freed whilst the locks are not held.
Comment 3 Dean Jenkins 2015-06-02 07:37:33 UTC
Created attachment 178551 [details]
Uses hold and put functions to protect sk and chan structures based on 3.18.14 (RPi)

This patch improves the solution by preventing sk and chan structures from being freed during the execution of l2cap_sock_shutdown().

However, the conn structure is at potential risk of freeing from l2cap_conn_del() which needs further analysis.
Comment 4 Dean Jenkins 2015-06-05 08:17:27 UTC
Created attachment 178881 [details]
l2cap_ertm_shutdown_deadlock_patches.tgz

Based on bluetooth-next, these are 7 patches proposed to fix the L2CAP ERTM shutdown deadlock issue. These patches are forward-ported and untested.

Testing is ongoing on kernel 3.8 so and so far the deadlock does not occur.

I have put the proposed patches into bugzilla so that people are aware that we are working on a fix.
Comment 5 Dean Jenkins 2015-06-24 08:54:54 UTC
v2 proposal for a solution has now been posted to
http://marc.info/?l=linux-bluetooth&m=143507884817418&w=2