Bug 71331 - mlock yields processor to lower priority process
Summary: mlock yields processor to lower priority process
Status: NEW
Alias: None
Product: Other
Classification: Unclassified
Component: Other (show other bugs)
Hardware: i386 Linux
: P1 low
Assignee: other_other
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2014-03-01 15:02 UTC by Bud Davis
Modified: 2014-03-20 15:47 UTC (History)
2 users (show)

See Also:
Kernel Version: 2.6 and up
Subsystem:
Regression: No
Bisected commit-id:


Attachments
C program that demonstrates the defect (1.37 KB, text/plain)
2014-03-01 15:02 UTC, Bud Davis
Details

Description Bud Davis 2014-03-01 15:02:16 UTC
Created attachment 127731 [details]
C program that demonstrates the defect

Description of problem:
A SCHED_FIFO task that calls mlock will yield the processor to a lower priority process.

Version-Release number of selected component (if applicable):
Worked in RHEL 5, doesn't work now.  All versions 6+

How reproducible:
test c program attached.

Steps to Reproduce:
1.compile attached program

% gcc -Wall -g x.c -o x -lpthread -lrt

2.
 
execute program as root, on a single cpu
% taskset -c 2 chrt --fifo 50 ./x

3.

review output.

Actual results:


[root@linux1 bdavis]# taskset -c 2 chrt --fifo 50 ./x
taskA
taskB
main thread starts A and B
taskB wokeup
taskA wokeup
taskA done
taskB done

Note that taskB should run to completion prior to taskA running.



Expected results:

This results were obtained with the mlock() call commented out.

[root@linux1 bdavis]# taskset -c 2 chrt --fifo 50 ./x
taskA
taskB
main thread starts A and B
taskB wokeup
taskB done
taskA wokeup
taskA done


Additional info:

Example program creates two threads at different priorities.  When made ready to run by posting the semaphore, the highest priority task should run to completion.
If mlock() is called inside the high priority task, the lower priority task pre-empts and runs.  This behaviour is incorrect.
Comment 1 Alexey Dobriyan 2014-03-11 11:28:22 UTC
cond_resched in do_mlockall bugged?
Comment 2 Alexey Dobriyan 2014-03-12 10:33:29 UTC
reproduced in 3.14.0-rc6
Comment 3 Artem Fetishev 2014-03-20 15:47:38 UTC
It seems that the actual problem is in lru_add_drain_all(). The function schedules a work and makes the current task to wait until that work is done. Waiting is done by putting the current task into TASK_UNINTERRUPTABLE state.

The full call stack is:
sys_mlockall -> lru_add_drain_all -> flush_work -> wait_for_completion -> wait_for_common -> __wait_for_common -> do_wait_for_common -> schedule_timeout -> schedule.

Before calling schedule_timeout(), do_wait_for_common() changes the task state to TASK_UNINTERRUPTABLE.

lru_add_drain_all() was added to the sys_mlockall() in v2.6.28. RHEL5 uses v2.6.18. This can explain the incorrect behavior in the newer versions.

Also note, that lru_add_drain_all() is called only when MCL_CURRENT is specified. If one tries to request the locking of future pages only mlockall(MCL_FUTURE), the behavior of the attached program will be correct.

Note You need to log in before you can comment on or make changes to this bug.