Bug 114611 - queue_delayed_work_on does not preserve order
Summary: queue_delayed_work_on does not preserve order
Status: NEW
Alias: None
Product: Timers
Classification: Unclassified
Component: Other (show other bugs)
Hardware: All Linux
: P1 normal
Assignee: john stultz
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2016-03-14 23:50 UTC by Petr Vandrovec
Modified: 2017-03-03 21:24 UTC (History)
5 users (show)

See Also:
Kernel Version: 4.4.0
Subsystem:
Regression: No
Bisected commit-id:


Attachments
dmesg (98.89 KB, text/plain)
2016-03-14 23:50 UTC, Petr Vandrovec
Details
Test Patch (864 bytes, patch)
2016-03-26 01:50 UTC, nickkrause
Details | Diff

Description Petr Vandrovec 2016-03-14 23:50:16 UTC
Created attachment 209161 [details]
dmesg

I have no idea whether it is bug in mptsas.c's expectations of workqueues, in workqueues, or in add_timer implementation.

But mptsas for each event it receives from the HBA calls queue_delayed_work_on(smp_processor_id(), ioc->fw_event_q, &fw_event->work, delay), with identical delay of 1ms (msec2jiffies(1)).

Problem is that when two events are generated shortly after each other, workqueue will actually run 2nd work before 1st work, despite both being scheduled with same delay.

You can see below event 12h being delivered first before event 0Fh, but mptsas's mptsas_firmware_event_work will get invoked first for event 0Fh - and fail miserably, as mptsas needs to discover new topology first, and that happens in response to event 12h, which was not processed yet.

[60203.665295] mptbase: ioc0: MPT event:(12h) : SAS PHY Link Status: Phy=3: Rate 3.0 Gbps
[60203.665314] mptbase: Event data:
[60203.665319]  a0039003 de17daf4 50050567

[60203.665330] mptbase: ioc0: Routing Event to event handler #14
[60203.665382] mptsas: ioc0: mptsas_add_fw_event: add (fw_event=0xffff880037dcf9c0)on cpuid 0
[60203.665403] mptbase: ioc0: MPT event:(0Fh) : SAS Device Status Change: Added: id=3 channel=0
[60203.665406] mptbase: Event data:
[60203.665409]  00030003 b0030000 00001c01 0003a003 37139f97 5000c29f ffffffff ffffffff 0000ffff

[60203.665423] mptbase: ioc0: Routing Event to event handler #14
[60203.665430] mptsas: ioc0: mptsas_add_fw_event: add (fw_event=0xffff88003b9dbe00)on cpuid 0
[60203.667640] mptsas: ioc0: mptsas_firmware_event_work: fw_event=(0xffff88003b9dbe00), event = (0x0f)
[60203.680831] mptsas: ioc0: ERROR - mptsas_hotplug_work 4381 phy info is NULL
[60203.680843] mptsas: ioc0: mptsas_free_fw_event: kfree (fw_event=0xffff88003b9dbe00)
[60203.680854] mptsas: ioc0: mptsas_firmware_event_work: fw_event=(0xffff880037dcf9c0), event = (0x12)
[60203.691674]  port-2:3: mptsas: ioc0: add port 3, sas_addr (0x5000c29f37139f97)
[60203.691694]  phy-2:3: mptsas: ioc0: add phy 3, phy-obj (0xffff880034b02c00)
[60203.691756] mptsas: ioc0: mptsas_free_fw_event: kfree (fw_event=0xffff880037dcf9c0)

Is it bug in mptsas, and mptsas should not use queue_delayed_work() as it depends on event ordering, or is it bug in workqueue (which at the end calls add_timer_on, which at the end uses hlist_add_head() in __internal_add_timer)?

It worked correctly year ago - it seems that commit 1bd04bf6f removed 'FIFO guarantee' from timers, without updating code that depends on it :-(
Comment 1 nickkrause 2016-03-26 01:49:23 UTC
See if this patch fixes it.
Comment 2 nickkrause 2016-03-26 01:50:16 UTC
Created attachment 210701 [details]
Test Patch
Comment 3 Alexey Makhalov 2016-08-12 20:40:09 UTC
This patch didn't fix an issue.
Comment 4 Yuhua 2017-03-02 04:17:48 UTC
Any update for this issue ?

Note You need to log in before you can comment on or make changes to this bug.