Bug 197939 - schedule_delayed_work off by one jiffy
Summary: schedule_delayed_work off by one jiffy
Status: NEW
Alias: None
Product: Timers
Classification: Unclassified
Component: Other (show other bugs)
Hardware: All Linux
: P1 low
Assignee: john stultz
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2017-11-20 06:30 UTC by raj
Modified: 2020-10-13 23:12 UTC (History)
1 user (show)

See Also:
Kernel Version: 4.10.0-19-generic
Subsystem:
Regression: No
Bisected commit-id:


Attachments

Description raj 2017-11-20 06:30:28 UTC
Hi,

We are using Kernel 4.10.0-19-generic. In one of our test drivers, we assign a task to the global work queue, using schedule_delayed_work().  

The issue we observe, is that when we call schedule_delayed_work(task, n)  with delay of "n" jiffies, the task is getting scheduled after a delay of "n+1" and not "n" jiffies as expected.

This worked fine till the kernel version 4.4.0-87-generic. The observation is on kernel 4.10.0-19-generic and above. 

A sample test code to highlight the difference between the kernel is given below:

        static struct tq_struct g_Task = { routine:interrupt_handler,  data:NULL };

        static void interrupt_handler( struct work_struct *irrelevant )  {
                printk("check 1 Jiffies: %llu \n", get_jiffies_64());	
                schedule_delayed_work( &g_Task, 1);
                printk("Check 2 Jiffies: %llu \n", get_jiffies_64());	
        }

The dmesg output on 4.10 is as below (diff of 2 jiffies)
    [ 1047.975259] check 1 Jiffies: 4295153276
    [ 1047.975263] Check 2 Jiffies: 4295153276
    [ 1047.983255] check 1 Jiffies: 4295153278
    [ 1047.983259] Check 2 Jiffies: 4295153278
    [ 1047.991255] check 1 Jiffies: 4295153280
    [ 1047.991260] Check 2 Jiffies: 4295153280

The dmesg output on 4.4 is as below (diff of 1 jiffies)
    [  830.490657] check 1 Jiffies: 4295098955
    [  830.490664] Check 2 Jiffies: 4295098955
    [  830.494657] check 1 Jiffies: 4295098956
    [  830.494667] Check 2 Jiffies: 4295098956
    [  830.498656] check 1 Jiffies: 4295098957
    [  830.498663] Check 2 Jiffies: 4295098957

Thanks,
Rajendra
Comment 1 john stultz 2017-11-20 19:54:24 UTC
It might be more useful if you can bisect down to the specific commit that caused the change in behavior.

All of the timer/delay interfaces promise "not less then" behavior, so we may not return/fire early, but there's no guarantee to fire close to the requested time. 

Thus it is often the case where timers delay for N+1 the requested number of ticks, since we don't know where in-between the ticks we might be. Thus if half of a tick had passed, firing at the next jiffy would be returning after only half a jiffy, thus we must wait for 2 jiffies.

So as described it doesn't seem to technically be broken, but it would be good to narrow down the change in behavior to make sure it was intentional.
Comment 2 Senthil Kumar Nagappan 2020-10-13 17:46:51 UTC
Root Cause:
Looks like granularity at each level adds up extra delay to the one requested.

root@nskubuntu:~# uname -r
5.4.0-47-generic

When we add a timer to expires after 1 tick, it expires after 2 ticks. The granularity at level 0 is 1 tick.

Oct 13 22:26:57 nskubuntu kernel: [115889.955708] check Jiffies: 4323865246 
Oct 13 22:26:57 nskubuntu kernel: [115889.963723] check Jiffies: 4323865248 
Oct 13 22:26:57 nskubuntu kernel: [115889.971739] check Jiffies: 4323865250 
Oct 13 22:26:57 nskubuntu kernel: [115889.979723] check Jiffies: 4323865252 
Oct 13 22:26:57 nskubuntu kernel: [115889.987737] check Jiffies: 4323865254 
Oct 13 22:26:57 nskubuntu kernel: [115889.995726] check Jiffies: 4323865256 
Oct 13 22:26:57 nskubuntu kernel: [115890.003723] check Jiffies: 4323865258 
Oct 13 22:26:57 nskubuntu kernel: [115890.011739] check Jiffies: 4323865260 
Oct 13 22:26:57 nskubuntu kernel: [115890.019725] check Jiffies: 4323865262 
Oct 13 22:26:57 nskubuntu kernel: [115890.027737] check Jiffies: 4323865264 

When we add a timer to expires after 64 tick, it expires after 72 ticks. The granularity at level 1 is 8 ticks.
Oct 13 22:28:14 nskubuntu kernel: [115967.247050] check Jiffies: 4323884569 
Oct 13 22:28:14 nskubuntu kernel: [115967.535125] check Jiffies: 4323884641 
Oct 13 22:28:15 nskubuntu kernel: [115967.823118] check Jiffies: 4323884713 
Oct 13 22:28:15 nskubuntu kernel: [115968.107091] check Jiffies: 4323884784 
Oct 13 22:28:15 nskubuntu kernel: [115968.395077] check Jiffies: 4323884856 
Oct 13 22:28:15 nskubuntu kernel: [115968.683116] check Jiffies: 4323884928 
Oct 13 22:28:16 nskubuntu kernel: [115968.971041] check Jiffies: 4323885000 
Oct 13 22:28:16 nskubuntu kernel: [115969.263116] check Jiffies: 4323885073 
Oct 13 22:28:16 nskubuntu kernel: [115969.547107] check Jiffies: 4323885144 
Oct 13 22:28:17 nskubuntu kernel: [115969.835098] check Jiffies: 4323885216 

When we add a timer to expires after 512 tick, it expires after 576 ticks. The granularity at level 2 is 64 ticks
Oct 13 22:31:24 nskubuntu kernel: [116157.609664] check Jiffies: 4323932160 
Oct 13 22:31:27 nskubuntu kernel: [116159.913630] check Jiffies: 4323932736 
Oct 13 22:31:29 nskubuntu kernel: [116162.217594] check Jiffies: 4323933312 
Oct 13 22:31:31 nskubuntu kernel: [116164.521617] check Jiffies: 4323933888 
Oct 13 22:31:34 nskubuntu kernel: [116166.825527] check Jiffies: 4323934464 
Oct 13 22:31:36 nskubuntu kernel: [116169.133540] check Jiffies: 4323935041 
Oct 13 22:31:38 nskubuntu kernel: [116171.433534] check Jiffies: 4323935616 
Oct 13 22:31:40 nskubuntu kernel: [116173.741493] check Jiffies: 4323936193 


Changes:
calc_index, calc_wheel_index, LVL_START may need changes. Still working on it to find the right changes.

Test Code:

int count = 1;
int delay_jiffies = 1; //need to change this to 64 and then to 512 for each run
struct timer_list my_timer;

/*
 * TIMER FUNCTION
 * */

static void timer_function(struct timer_list *t){
	unsigned long local_jiffies;
	local_jiffies = jiffies;
        printk("check Jiffies: %lu \n", local_jiffies);
	if(count < 10)
	{
		mod_timer(&my_timer, local_jiffies + delay_jiffies);
		count++;
	}
 
}

static int hello_init(void)
{
	my_timer.expires = jiffies + 1 ;
	my_timer.function = timer_function;
	add_timer(&my_timer);
	return 0;
}

static void hello_exit(void)
{
	del_timer(&my_timer);
}

module_init(hello_init);
module_exit(hello_exit);
Comment 3 Thomas Gleixner 2020-10-13 23:12:28 UTC
On Tue, Oct 13 2020 at 17:46, bugzilla-daemon wrote:
> --- Comment #2 from Senthil Kumar Nagappan (sknagappan@yahoo.com) ---
> Root Cause:
> Looks like granularity at each level adds up extra delay to the one
> requested.

That's not a bug, that's a feature.

> When we add a timer to expires after 1 tick, it expires after 2 ticks. The
> granularity at level 0 is 1 tick.

Correct, because the timer is guaranteed to sleep at least 1 jiffie.

So if the timer is queued right at the edge of the next jiffie then it
would expire immediately, which would violate the guarantee that it
sleeps at least 1 jiffie.

> When we add a timer to expires after 64 tick, it expires after 72 ticks. The
> granularity at level 1 is 8 ticks.

Correct, because the longer sleep times are batched to the granularity
for power saving reasons.

See the big comment on top of kernel/time/timers.c and the changelog of
the commit which introduced the redesign for in depth explanation.

  https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=500462a9de657f86edaa102f8ab6bff7f7e43fc2

The batching in the outer levels (farther out expiry time) has been
there before this change already. The change made this batching an
implicit property of the wheel storage while the previous variant had to
do an explicit computation to achieve that.

So again, that's not a bug. It's a feature and documented behaviour.

Thanks,

        tglx

Note You need to log in before you can comment on or make changes to this bug.