Bug 199987

Summary: include/trace/events/timer.h:290 suspicious rcu_dereference_check() usage!
Product: Platform Specific/Hardware Reporter: Martin Peres (martin.peres)
Component: x86-64Assignee: platform_x86_64 (platform_x86_64)
Status: RESOLVED INSUFFICIENT_DATA    
Severity: normal CC: lakshminarayana.vudum, rostedt
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 4.17.0-rc7 Subsystem:
Regression: No Bisected commit-id:

Description Martin Peres 2018-06-08 11:55:42 UTC
We have seen the following WARNING on one of our CI machines. The reproduction rate is currently unknown has this has been the first time we hit it:

[  238.566193] =============================
[  238.566198] WARNING: suspicious RCU usage
[  238.566203] 4.17.0-rc7-g0e963d962be7-drmtip_57+ #1 Tainted: G     U           
[  238.566206] -----------------------------
[  238.566211] ./include/trace/events/timer.h:290 suspicious rcu_dereference_check() usage!
[  238.566215] 
               other info that might help us debug this:

[  238.566222] 
               rcu_scheduler_active = 2, debug_locks = 0
[  238.566228] 1 lock held by swapper/1/0:
[  238.566231]  #0: 0000000044acc168 (hrtimer_bases.lock){-.-.}, at: lock_hrtimer_base.isra.2+0x1f/0x40
[  238.566247] 
               stack backtrace:
[  238.566254] CPU: 1 PID: 0 Comm: swapper/1 Tainted: G     U            4.17.0-rc7-g0e963d962be7-drmtip_57+ #1
[  238.566258] Hardware name: Intel Corporation CannonLake Client Platform/CannonLake Y LPDDR4 RVP, BIOS CNLSFWR1.R00.X122.B01.1801151045 01/15/2018
[  238.566262] Call Trace:
[  238.566270]  dump_stack+0x67/0x9b
[  238.566277]  hrtimer_try_to_cancel+0x1b1/0x1f0
[  238.566285]  hrtimer_cancel+0x10/0x20
[  238.566293]  __tick_nohz_idle_restart_tick+0x4a/0xd0
[  238.566299]  tick_nohz_idle_exit+0x9c/0xb0
[  238.566307]  do_idle+0x14f/0x250
[  238.566316]  cpu_startup_entry+0x6a/0x70
[  238.566324]  start_secondary+0x19d/0x1f0
[  238.566332]  secondary_startup_64+0xa5/0xb0

[  238.566355] ======================================================
[  238.566356] WARNING: possible circular locking dependency detected
[  238.566357] 4.17.0-rc7-g0e963d962be7-drmtip_57+ #1 Tainted: G     U           
[  238.566358] ------------------------------------------------------
[  238.566359] swapper/1/0 is trying to acquire lock:
[  238.566360] 000000005f4cb210 ((console_sem).lock){-.-.}, at: down_trylock+0xa/0x30

[  238.566365] but task is already holding lock:
[  238.566365] 0000000044acc168 (hrtimer_bases.lock){-.-.}, at: lock_hrtimer_base.isra.2+0x1f/0x40

[  238.566370] which lock already depends on the new lock.


[  238.566372] the existing dependency chain (in reverse order) is:

[  238.566374] -> #4 (hrtimer_bases.lock){-.-.}:
[  238.566377]        lock_hrtimer_base.isra.2+0x1f/0x40
[  238.566378]        hrtimer_start_range_ns+0x5a/0x3a0
[  238.566379]        __enqueue_rt_entity+0x209/0x360
[  238.566380]        enqueue_rt_entity+0x28/0x50
[  238.566381]        enqueue_task_rt+0x2a/0xb0
[  238.566382]        __sched_setscheduler+0x2fd/0x920
[  238.566383]        _sched_setscheduler+0x72/0x90
[  238.566384]        watchdog_enable+0x93/0xb0
[  238.566385]        smpboot_thread_fn+0x107/0x280
[  238.566385]        kthread+0x119/0x130
[  238.566386]        ret_from_fork+0x3a/0x50

[  238.566387] -> #3 (&rt_b->rt_runtime_lock){-.-.}:
[  238.566391]        __enqueue_rt_entity+0x1ab/0x360
[  238.566392]        enqueue_rt_entity+0x28/0x50
[  238.566393]        enqueue_task_rt+0x2a/0xb0
[  238.566393]        __sched_setscheduler+0x2fd/0x920
[  238.566394]        _sched_setscheduler+0x72/0x90
[  238.566395]        watchdog_enable+0x93/0xb0
[  238.566396]        smpboot_thread_fn+0x107/0x280
[  238.566397]        kthread+0x119/0x130
[  238.566398]        ret_from_fork+0x3a/0x50

[  238.566399] -> #2 (&rq->lock){-.-.}:
[  238.566402]        task_fork_fair+0x36/0x160
[  238.566403]        sched_fork+0x123/0x280
[  238.566404]        copy_process.part.7+0x5f4/0x1d30
[  238.566405]        _do_fork+0xe2/0x700
[  238.566406]        kernel_thread+0x20/0x30
[  238.566407]        rest_init+0x1d/0x220
[  238.566408]        start_kernel+0x4a2/0x4c2
[  238.566409]        secondary_startup_64+0xa5/0xb0

[  238.566410] -> #1 (&p->pi_lock){-.-.}:
[  238.566414]        try_to_wake_up+0x37/0x650
[  238.566414]        up+0x3b/0x50
[  238.566415]        __up_console_sem+0x2e/0x50
[  238.566416]        console_unlock+0x32f/0x640
[  238.566417]        register_framebuffer+0x24c/0x350
[  238.566418]        __drm_fb_helper_initial_config_and_unlock+0x22a/0x4a0
[  238.566419]        intel_fbdev_initial_config+0xf/0x20 [i915]
[  238.566420]        async_run_entry_fn+0x34/0x160
[  238.566421]        process_one_work+0x229/0x6a0
[  238.566422]        worker_thread+0x35/0x380
[  238.566423]        kthread+0x119/0x130
[  238.566424]        ret_from_fork+0x3a/0x50

[  238.566425] -> #0 ((console_sem).lock){-.-.}:
[  238.566428]        _raw_spin_lock_irqsave+0x33/0x50
[  238.566429]        down_trylock+0xa/0x30
[  238.566430]        __down_trylock_console_sem+0x20/0x80
[  238.566431]        console_trylock+0xe/0x60
[  238.566432]        vprintk_emit+0x22e/0x4d0
[  238.566433]        printk+0x4d/0x69
[  238.566434]        lockdep_rcu_suspicious+0x1b/0xf0
[  238.566434]        hrtimer_try_to_cancel+0x1b1/0x1f0
[  238.566435]        hrtimer_cancel+0x10/0x20
[  238.566436]        __tick_nohz_idle_restart_tick+0x4a/0xd0
[  238.566437]        tick_nohz_idle_exit+0x9c/0xb0
[  238.566438]        do_idle+0x14f/0x250
[  238.566439]        cpu_startup_entry+0x6a/0x70
[  238.566440]        start_secondary+0x19d/0x1f0
[  238.566440]        secondary_startup_64+0xa5/0xb0

[  238.566442] other info that might help us debug this:

[  238.566444] Chain exists of:
[  238.566444]   (console_sem).lock --> &rt_b->rt_runtime_lock --> hrtimer_bases.lock

[  238.566450]  Possible unsafe locking scenario:

[  238.566452]        CPU0                    CPU1
[  238.566452]        ----                    ----
[  238.566453]   lock(hrtimer_bases.lock);
[  238.566455]                                lock(&rt_b->rt_runtime_lock);
[  238.566458]                                lock(hrtimer_bases.lock);
[  238.566460]   lock((console_sem).lock);

[  238.566462]  *** DEADLOCK ***

[  238.566464] 1 lock held by swapper/1/0:
[  238.566465]  #0: 0000000044acc168 (hrtimer_bases.lock){-.-.}, at: lock_hrtimer_base.isra.2+0x1f/0x40

[  238.566469] stack backtrace:
[  238.566471] CPU: 1 PID: 0 Comm: swapper/1 Tainted: G     U            4.17.0-rc7-g0e963d962be7-drmtip_57+ #1
[  238.566472] Hardware name: Intel Corporation CannonLake Client Platform/CannonLake Y LPDDR4 RVP, BIOS CNLSFWR1.R00.X122.B01.1801151045 01/15/2018
[  238.566473] Call Trace:
[  238.566474]  dump_stack+0x67/0x9b
[  238.566475]  print_circular_bug.isra.18+0x1c8/0x2b0
[  238.566476]  __lock_acquire+0x1897/0x1b50
[  238.566477]  ? lock_acquire+0xa6/0x210
[  238.566478]  lock_acquire+0xa6/0x210
[  238.566479]  ? down_trylock+0xa/0x30
[  238.566480]  ? vprintk_emit+0x22e/0x4d0
[  238.566481]  _raw_spin_lock_irqsave+0x33/0x50
[  238.566482]  ? down_trylock+0xa/0x30
[  238.566482]  down_trylock+0xa/0x30
[  238.566483]  __down_trylock_console_sem+0x20/0x80
[  238.566484]  console_trylock+0xe/0x60
[  238.566485]  vprintk_emit+0x22e/0x4d0
[  238.566486]  printk+0x4d/0x69
[  238.566487]  lockdep_rcu_suspicious+0x1b/0xf0
[  238.566488]  hrtimer_try_to_cancel+0x1b1/0x1f0
[  238.566488]  hrtimer_cancel+0x10/0x20
[  238.566489]  __tick_nohz_idle_restart_tick+0x4a/0xd0
[  238.566490]  tick_nohz_idle_exit+0x9c/0xb0
[  238.566491]  do_idle+0x14f/0x250
[  238.566492]  cpu_startup_entry+0x6a/0x70
[  238.566493]  start_secondary+0x19d/0x1f0
[  238.566494]  secondary_startup_64+0xa5/0xb0

Kernel config: https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_57/kconfig.txt
Boot log: https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_57/fi-cnl-y3/boot1.log
Logs during the execution of the tests: https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_57/fi-cnl-y3/dmesg1.log
Comment 1 Martin Peres 2018-06-08 12:01:36 UTC
Adding Steven Rostedt to the Cc: list, as per Chris Wilson's recommendation.
Comment 2 Lakshminarayana Vudum 2018-09-07 07:47:01 UTC
This issue occurred only once and that too 3 months ago with drmtip_57.
Comment 3 Lakshminarayana Vudum 2018-09-07 08:03:16 UTC
This issue can be closed as the boot log and test results are not found any more.