Latest working kernel version: unknown Earliest failing kernel version: 2.6.26-rc2 Distribution: Gentoo Hardware Environment: Intel 965 chipset, Core 2 Duo Software Environment: mplayer Problem Description: Just lockdep warning in dmesg Steps to reproduce: while playing HD video got this. [52478.405955] ======================================================= [52478.405955] [ INFO: possible circular locking dependency detected ] [52478.405955] 2.6.26-rc2-devel #1 [52478.405955] ------------------------------------------------------- [52478.405955] mplayer/4814 is trying to acquire lock: [52478.405955] (&pcsp_chip.substream_lock){++..}, at: [<f88af285>] pcsp_do_timer+0x51/0x2a8 [snd_pcsp] [52478.405955] [52478.405955] but task is already holding lock: [52478.405955] (&cpu_base->lock){++..}, at: [<c0136ac4>] hrtimer_interrupt+0x6c/0x154 [52478.405955] [52478.405955] which lock already depends on the new lock. [52478.405955] [52478.405955] [52478.405955] the existing dependency chain (in reverse order) is: [52478.405955] [52478.405955] -> #5 (&cpu_base->lock){++..}: [52478.405955] [<c013fc8e>] __lock_acquire+0x982/0xb37 [52478.405955] [<c013feaf>] lock_acquire+0x6c/0x89 [52478.405955] [<c030783a>] _spin_lock_irqsave+0x25/0x55 [52478.405955] [<c01366c2>] lock_hrtimer_base+0x18/0x33 [52478.405955] [<c01367aa>] hrtimer_start+0x19/0x100 [52478.405955] [<c011b656>] enqueue_rt_entity+0xf7/0x10e [52478.405955] [<c011b6cb>] enqueue_task_rt+0x36/0x4a [52478.405955] [<c0118235>] enqueue_task+0x4c/0x58 [52478.405955] [<c011825a>] activate_task+0x19/0x1f [52478.405955] [<c011a9f7>] try_to_wake_up+0x17c/0x1be [52478.405955] [<c011aa61>] wake_up_process+0xf/0x11 [52478.405955] [<c0303060>] migration_call+0xb4/0x403 [52478.405955] [<c0440d1b>] migration_init+0x38/0x45 [52478.405955] [<c043439b>] kernel_init+0x49/0x22a [52478.405955] [<c01045af>] kernel_thread_helper+0x7/0x10 [52478.405955] [<ffffffff>] 0xffffffff [52478.405955] [52478.405955] -> #4 (&rt_b->rt_runtime_lock){++..}: [52478.405955] [<c013fc8e>] __lock_acquire+0x982/0xb37 [52478.405955] [<c013feaf>] lock_acquire+0x6c/0x89 [52478.405955] [<c030748f>] _spin_lock+0x1c/0x49 [52478.405955] [<c011b613>] enqueue_rt_entity+0xb4/0x10e [52478.405955] [<c011b6cb>] enqueue_task_rt+0x36/0x4a [52478.405955] [<c0118235>] enqueue_task+0x4c/0x58 [52478.405955] [<c011825a>] activate_task+0x19/0x1f [52478.405955] [<c011a9f7>] try_to_wake_up+0x17c/0x1be [52478.405955] [<c011aa61>] wake_up_process+0xf/0x11 [52478.405955] [<c0303060>] migration_call+0xb4/0x403 [52478.405955] [<c0440d1b>] migration_init+0x38/0x45 [52478.405955] [<c043439b>] kernel_init+0x49/0x22a [52478.405955] [<c01045af>] kernel_thread_helper+0x7/0x10 [52478.405955] [<ffffffff>] 0xffffffff [52478.405955] [52478.405955] -> #3 (&rq->rq_lock_key){++..}: [52478.405955] [<c013fc8e>] __lock_acquire+0x982/0xb37 [52478.405955] [<c013feaf>] lock_acquire+0x6c/0x89 [52478.405955] [<c030748f>] _spin_lock+0x1c/0x49 [52478.405955] [<c011a634>] task_rq_lock+0x36/0x5d [52478.405955] [<c011a8a5>] try_to_wake_up+0x2a/0x1be [52478.405955] [<c011aa44>] default_wake_function+0xb/0xd [52478.405955] [<c0117dad>] __wake_up_common+0x2f/0x5a [52478.405955] [<c0119c2a>] complete+0x2b/0x3e [52478.405955] [<c0133b93>] kthread+0x1f/0x64 [52478.405955] [<c01045af>] kernel_thread_helper+0x7/0x10 [52478.405955] [<ffffffff>] 0xffffffff [52478.405955] [52478.405955] -> #2 (&q->lock){++..}: [52478.405955] [<c013fc8e>] __lock_acquire+0x982/0xb37 [52478.405955] [<c013feaf>] lock_acquire+0x6c/0x89 [52478.405955] [<c030783a>] _spin_lock_irqsave+0x25/0x55 [52478.405955] [<c0119c96>] __wake_up+0x15/0x3b [52478.405955] [<f88c6f2d>] snd_pcm_post_stop+0x51/0x5a [snd_pcm] [52478.405955] [<f88c6353>] snd_pcm_action_single+0x33/0x4b [snd_pcm] [52478.405955] [<f88c7238>] snd_pcm_action+0x68/0x74 [snd_pcm] [52478.405955] [<f88c7268>] snd_pcm_stop+0x11/0x13 [snd_pcm] [52478.405955] [<f88c7302>] snd_pcm_drop+0x98/0xbb [snd_pcm] [52478.405955] [<f88c9cf0>] snd_pcm_common_ioctl1+0xe7c/0xf6d [snd_pcm] [52478.405955] [<f88ca4a3>] snd_pcm_playback_ioctl1+0x355/0x36c [snd_pcm] [52478.405955] [<f88ca4f2>] snd_pcm_kernel_ioctl+0x38/0x5d [snd_pcm] [52478.405955] [<f89fc93e>] snd_pcm_oss_change_params+0xaee/0xea7 [snd_pcm_oss] [52478.405955] [<f89fd04f>] snd_pcm_oss_get_active_substream+0x34/0x4c [snd_pcm_oss] [52478.405955] [<f89fd075>] snd_pcm_oss_get_format+0xe/0x20 [snd_pcm_oss] [52478.405955] [<f89fe246>] snd_pcm_oss_ioctl+0x3c0/0x9a5 [snd_pcm_oss] [52478.405955] [<c0179202>] vfs_ioctl+0x22/0x67 [52478.405955] [<c01794ab>] do_vfs_ioctl+0x264/0x27b [52478.405955] [<c01794ee>] sys_ioctl+0x2c/0x46 [52478.405955] [<c01038b9>] sysenter_past_esp+0x6a/0xb1 [52478.405955] [<ffffffff>] 0xffffffff [52478.405955] [52478.405955] -> #1 (&substream->self_group.lock){.+..}: [52478.405955] [<c013fc8e>] __lock_acquire+0x982/0xb37 [52478.405955] [<c013feaf>] lock_acquire+0x6c/0x89 [52478.405955] [<c030748f>] _spin_lock+0x1c/0x49 [52478.405955] [<f88af2b3>] pcsp_do_timer+0x7f/0x2a8 [snd_pcsp] [52478.405955] [<c0136372>] run_hrtimer_pending+0x3a/0xcb [52478.405955] [<c0136419>] run_hrtimer_softirq+0x16/0x18 [52478.405955] [<c0127f51>] __do_softirq+0x6f/0xe9 [52478.405955] [<c01059c3>] do_softirq+0x5e/0xa8 [52478.405955] [<ffffffff>] 0xffffffff [52478.405955] [52478.405955] -> #0 (&pcsp_chip.substream_lock){++..}: [52478.405955] [<c013fbb5>] __lock_acquire+0x8a9/0xb37 [52478.405955] [<c013feaf>] lock_acquire+0x6c/0x89 [52478.405955] [<c030783a>] _spin_lock_irqsave+0x25/0x55 [52478.405955] [<f88af285>] pcsp_do_timer+0x51/0x2a8 [snd_pcsp] [52478.405955] [<c0136311>] __run_hrtimer+0x49/0x70 [52478.405955] [<c0136b43>] hrtimer_interrupt+0xeb/0x154 [52478.405955] [<c0110670>] smp_apic_timer_interrupt+0x6c/0x80 [52478.405955] [<c0104417>] apic_timer_interrupt+0x33/0x38 [52478.405955] [<f89ff159>] snd_pcm_plug_write_transfer+0x75/0x9f [snd_pcm_oss] [52478.405955] [<f89fb6d8>] snd_pcm_oss_write2+0x82/0xc0 [snd_pcm_oss] [52478.405955] [<f89fcfb6>] snd_pcm_oss_write+0x165/0x1ca [snd_pcm_oss] [52478.405955] [<c016f557>] vfs_write+0x8c/0x108 [52478.405955] [<c016faee>] sys_write+0x3b/0x60 [52478.405955] [<c01038b9>] sysenter_past_esp+0x6a/0xb1 [52478.405955] [<ffffffff>] 0xffffffff [52478.405955] [52478.405955] other info that might help us debug this: [52478.405955] [52478.405955] 2 locks held by mplayer/4814: [52478.405955] #0: (&runtime->oss.params_lock){--..}, at: [<f89fceb7>] snd_pcm_oss_write+0x66/0x1ca [snd_pcm_oss] [52478.405955] #1: (&cpu_base->lock){++..}, at: [<c0136ac4>] hrtimer_interrupt+0x6c/0x154 [52478.405955] [52478.405955] stack backtrace: [52478.405955] Pid: 4814, comm: mplayer Not tainted 2.6.26-rc2-devel #1 [52478.405955] [<c013de71>] print_circular_bug_tail+0x5b/0x66 [52478.405955] [<c013fbb5>] __lock_acquire+0x8a9/0xb37 [52478.405955] [<c013feaf>] lock_acquire+0x6c/0x89 [52478.405955] [<f88af285>] ? pcsp_do_timer+0x51/0x2a8 [snd_pcsp] [52478.405955] [<c030783a>] _spin_lock_irqsave+0x25/0x55 [52478.405955] [<f88af285>] ? pcsp_do_timer+0x51/0x2a8 [snd_pcsp] [52478.405955] [<f88af234>] ? pcsp_do_timer+0x0/0x2a8 [snd_pcsp] [52478.405955] [<f88af285>] pcsp_do_timer+0x51/0x2a8 [snd_pcsp] [52478.405955] [<c0136134>] ? __remove_hrtimer+0x62/0x6a [52478.405955] [<f88af234>] ? pcsp_do_timer+0x0/0x2a8 [snd_pcsp] [52478.405955] [<c0136311>] __run_hrtimer+0x49/0x70 [52478.405955] [<c0136b43>] hrtimer_interrupt+0xeb/0x154 [52478.405955] [<c0110670>] smp_apic_timer_interrupt+0x6c/0x80 [52478.405955] [<c0104417>] apic_timer_interrupt+0x33/0x38 [52478.405955] [<f8a0022d>] ? linear_transfer+0xce/0x155 [snd_pcm_oss] [52478.405955] [<f89ff159>] snd_pcm_plug_write_transfer+0x75/0x9f [snd_pcm_oss] [52478.405955] [<f89fb6d8>] snd_pcm_oss_write2+0x82/0xc0 [snd_pcm_oss] [52478.405955] [<f89fcfb6>] snd_pcm_oss_write+0x165/0x1ca [snd_pcm_oss] [52478.405955] [<f89fce51>] ? snd_pcm_oss_write+0x0/0x1ca [snd_pcm_oss] [52478.405955] [<c016f557>] vfs_write+0x8c/0x108 [52478.405955] [<c016faee>] sys_write+0x3b/0x60 [52478.405955] [<c01038b9>] sysenter_past_esp+0x6a/0xb1 [52478.405955] =======================
Reply-To: akpm@linux-foundation.org (switched to email. Please respond via emailed reply-to-all, not via the bugzilla web interface). On Thu, 15 May 2008 03:43:36 -0700 (PDT) bugme-daemon@bugzilla.kernel.org wrote: > http://bugzilla.kernel.org/show_bug.cgi?id=10701 > > Summary: snd_pcsp lockdep warning > Product: Drivers > Version: 2.5 > KernelVersion: 2.6.26-rc2 > Platform: All > OS/Version: Linux > Tree: Mainline > Status: NEW > Severity: normal > Priority: P1 > Component: Sound(ALSA) > AssignedTo: perex@perex.cz > ReportedBy: nuclearcat@nuclearcat.com > > > Latest working kernel version: unknown > Earliest failing kernel version: 2.6.26-rc2 > Distribution: Gentoo > Hardware Environment: Intel 965 chipset, Core 2 Duo > Software Environment: mplayer > Problem Description: Just lockdep warning in dmesg This looks like a problem in the core hrtimers code? > Steps to reproduce: while playing HD video got this. > [52478.405955] ======================================================= > [52478.405955] [ INFO: possible circular locking dependency detected ] > [52478.405955] 2.6.26-rc2-devel #1 > [52478.405955] ------------------------------------------------------- > [52478.405955] mplayer/4814 is trying to acquire lock: > [52478.405955] (&pcsp_chip.substream_lock){++..}, at: [<f88af285>] > pcsp_do_timer+0x51/0x2a8 [snd_pcsp] > [52478.405955] > [52478.405955] but task is already holding lock: > [52478.405955] (&cpu_base->lock){++..}, at: [<c0136ac4>] > hrtimer_interrupt+0x6c/0x154 > [52478.405955] > [52478.405955] which lock already depends on the new lock. > [52478.405955] > [52478.405955] > [52478.405955] the existing dependency chain (in reverse order) is: > [52478.405955] > [52478.405955] -> #5 (&cpu_base->lock){++..}: > [52478.405955] [<c013fc8e>] __lock_acquire+0x982/0xb37 > [52478.405955] [<c013feaf>] lock_acquire+0x6c/0x89 > [52478.405955] [<c030783a>] _spin_lock_irqsave+0x25/0x55 > [52478.405955] [<c01366c2>] lock_hrtimer_base+0x18/0x33 > [52478.405955] [<c01367aa>] hrtimer_start+0x19/0x100 > [52478.405955] [<c011b656>] enqueue_rt_entity+0xf7/0x10e > [52478.405955] [<c011b6cb>] enqueue_task_rt+0x36/0x4a > [52478.405955] [<c0118235>] enqueue_task+0x4c/0x58 > [52478.405955] [<c011825a>] activate_task+0x19/0x1f > [52478.405955] [<c011a9f7>] try_to_wake_up+0x17c/0x1be > [52478.405955] [<c011aa61>] wake_up_process+0xf/0x11 > [52478.405955] [<c0303060>] migration_call+0xb4/0x403 > [52478.405955] [<c0440d1b>] migration_init+0x38/0x45 > [52478.405955] [<c043439b>] kernel_init+0x49/0x22a > [52478.405955] [<c01045af>] kernel_thread_helper+0x7/0x10 > [52478.405955] [<ffffffff>] 0xffffffff > [52478.405955] > [52478.405955] -> #4 (&rt_b->rt_runtime_lock){++..}: > [52478.405955] [<c013fc8e>] __lock_acquire+0x982/0xb37 > [52478.405955] [<c013feaf>] lock_acquire+0x6c/0x89 > [52478.405955] [<c030748f>] _spin_lock+0x1c/0x49 > [52478.405955] [<c011b613>] enqueue_rt_entity+0xb4/0x10e > [52478.405955] [<c011b6cb>] enqueue_task_rt+0x36/0x4a > [52478.405955] [<c0118235>] enqueue_task+0x4c/0x58 > [52478.405955] [<c011825a>] activate_task+0x19/0x1f > [52478.405955] [<c011a9f7>] try_to_wake_up+0x17c/0x1be > [52478.405955] [<c011aa61>] wake_up_process+0xf/0x11 > [52478.405955] [<c0303060>] migration_call+0xb4/0x403 > [52478.405955] [<c0440d1b>] migration_init+0x38/0x45 > [52478.405955] [<c043439b>] kernel_init+0x49/0x22a > [52478.405955] [<c01045af>] kernel_thread_helper+0x7/0x10 > [52478.405955] [<ffffffff>] 0xffffffff > [52478.405955] > [52478.405955] -> #3 (&rq->rq_lock_key){++..}: > [52478.405955] [<c013fc8e>] __lock_acquire+0x982/0xb37 > [52478.405955] [<c013feaf>] lock_acquire+0x6c/0x89 > [52478.405955] [<c030748f>] _spin_lock+0x1c/0x49 > [52478.405955] [<c011a634>] task_rq_lock+0x36/0x5d > [52478.405955] [<c011a8a5>] try_to_wake_up+0x2a/0x1be > [52478.405955] [<c011aa44>] default_wake_function+0xb/0xd > [52478.405955] [<c0117dad>] __wake_up_common+0x2f/0x5a > [52478.405955] [<c0119c2a>] complete+0x2b/0x3e > [52478.405955] [<c0133b93>] kthread+0x1f/0x64 > [52478.405955] [<c01045af>] kernel_thread_helper+0x7/0x10 > [52478.405955] [<ffffffff>] 0xffffffff > [52478.405955] > [52478.405955] -> #2 (&q->lock){++..}: > [52478.405955] [<c013fc8e>] __lock_acquire+0x982/0xb37 > [52478.405955] [<c013feaf>] lock_acquire+0x6c/0x89 > [52478.405955] [<c030783a>] _spin_lock_irqsave+0x25/0x55 > [52478.405955] [<c0119c96>] __wake_up+0x15/0x3b > [52478.405955] [<f88c6f2d>] snd_pcm_post_stop+0x51/0x5a [snd_pcm] > [52478.405955] [<f88c6353>] snd_pcm_action_single+0x33/0x4b [snd_pcm] > [52478.405955] [<f88c7238>] snd_pcm_action+0x68/0x74 [snd_pcm] > [52478.405955] [<f88c7268>] snd_pcm_stop+0x11/0x13 [snd_pcm] > [52478.405955] [<f88c7302>] snd_pcm_drop+0x98/0xbb [snd_pcm] > [52478.405955] [<f88c9cf0>] snd_pcm_common_ioctl1+0xe7c/0xf6d > [snd_pcm] > [52478.405955] [<f88ca4a3>] snd_pcm_playback_ioctl1+0x355/0x36c > [snd_pcm] > [52478.405955] [<f88ca4f2>] snd_pcm_kernel_ioctl+0x38/0x5d [snd_pcm] > [52478.405955] [<f89fc93e>] snd_pcm_oss_change_params+0xaee/0xea7 > [snd_pcm_oss] > [52478.405955] [<f89fd04f>] snd_pcm_oss_get_active_substream+0x34/0x4c > [snd_pcm_oss] > [52478.405955] [<f89fd075>] snd_pcm_oss_get_format+0xe/0x20 > [snd_pcm_oss] > [52478.405955] [<f89fe246>] snd_pcm_oss_ioctl+0x3c0/0x9a5 > [snd_pcm_oss] > [52478.405955] [<c0179202>] vfs_ioctl+0x22/0x67 > [52478.405955] [<c01794ab>] do_vfs_ioctl+0x264/0x27b > [52478.405955] [<c01794ee>] sys_ioctl+0x2c/0x46 > [52478.405955] [<c01038b9>] sysenter_past_esp+0x6a/0xb1 > [52478.405955] [<ffffffff>] 0xffffffff > [52478.405955] > [52478.405955] -> #1 (&substream->self_group.lock){.+..}: > [52478.405955] [<c013fc8e>] __lock_acquire+0x982/0xb37 > [52478.405955] [<c013feaf>] lock_acquire+0x6c/0x89 > [52478.405955] [<c030748f>] _spin_lock+0x1c/0x49 > [52478.405955] [<f88af2b3>] pcsp_do_timer+0x7f/0x2a8 [snd_pcsp] > [52478.405955] [<c0136372>] run_hrtimer_pending+0x3a/0xcb > [52478.405955] [<c0136419>] run_hrtimer_softirq+0x16/0x18 > [52478.405955] [<c0127f51>] __do_softirq+0x6f/0xe9 > [52478.405955] [<c01059c3>] do_softirq+0x5e/0xa8 > [52478.405955] [<ffffffff>] 0xffffffff > [52478.405955] > [52478.405955] -> #0 (&pcsp_chip.substream_lock){++..}: > [52478.405955] [<c013fbb5>] __lock_acquire+0x8a9/0xb37 > [52478.405955] [<c013feaf>] lock_acquire+0x6c/0x89 > [52478.405955] [<c030783a>] _spin_lock_irqsave+0x25/0x55 > [52478.405955] [<f88af285>] pcsp_do_timer+0x51/0x2a8 [snd_pcsp] > [52478.405955] [<c0136311>] __run_hrtimer+0x49/0x70 > [52478.405955] [<c0136b43>] hrtimer_interrupt+0xeb/0x154 > [52478.405955] [<c0110670>] smp_apic_timer_interrupt+0x6c/0x80 > [52478.405955] [<c0104417>] apic_timer_interrupt+0x33/0x38 > [52478.405955] [<f89ff159>] snd_pcm_plug_write_transfer+0x75/0x9f > [snd_pcm_oss] > [52478.405955] [<f89fb6d8>] snd_pcm_oss_write2+0x82/0xc0 [snd_pcm_oss] > [52478.405955] [<f89fcfb6>] snd_pcm_oss_write+0x165/0x1ca > [snd_pcm_oss] > [52478.405955] [<c016f557>] vfs_write+0x8c/0x108 > [52478.405955] [<c016faee>] sys_write+0x3b/0x60 > [52478.405955] [<c01038b9>] sysenter_past_esp+0x6a/0xb1 > [52478.405955] [<ffffffff>] 0xffffffff > [52478.405955] > [52478.405955] other info that might help us debug this: > [52478.405955] > [52478.405955] 2 locks held by mplayer/4814: > [52478.405955] #0: (&runtime->oss.params_lock){--..}, at: [<f89fceb7>] > snd_pcm_oss_write+0x66/0x1ca [snd_pcm_oss] > [52478.405955] #1: (&cpu_base->lock){++..}, at: [<c0136ac4>] > hrtimer_interrupt+0x6c/0x154 > [52478.405955] > [52478.405955] stack backtrace: > [52478.405955] Pid: 4814, comm: mplayer Not tainted 2.6.26-rc2-devel #1 > [52478.405955] [<c013de71>] print_circular_bug_tail+0x5b/0x66 > [52478.405955] [<c013fbb5>] __lock_acquire+0x8a9/0xb37 > [52478.405955] [<c013feaf>] lock_acquire+0x6c/0x89 > [52478.405955] [<f88af285>] ? pcsp_do_timer+0x51/0x2a8 [snd_pcsp] > [52478.405955] [<c030783a>] _spin_lock_irqsave+0x25/0x55 > [52478.405955] [<f88af285>] ? pcsp_do_timer+0x51/0x2a8 [snd_pcsp] > [52478.405955] [<f88af234>] ? pcsp_do_timer+0x0/0x2a8 [snd_pcsp] > [52478.405955] [<f88af285>] pcsp_do_timer+0x51/0x2a8 [snd_pcsp] > [52478.405955] [<c0136134>] ? __remove_hrtimer+0x62/0x6a > [52478.405955] [<f88af234>] ? pcsp_do_timer+0x0/0x2a8 [snd_pcsp] > [52478.405955] [<c0136311>] __run_hrtimer+0x49/0x70 > [52478.405955] [<c0136b43>] hrtimer_interrupt+0xeb/0x154 > [52478.405955] [<c0110670>] smp_apic_timer_interrupt+0x6c/0x80 > [52478.405955] [<c0104417>] apic_timer_interrupt+0x33/0x38 > [52478.405955] [<f8a0022d>] ? linear_transfer+0xce/0x155 [snd_pcm_oss] > [52478.405955] [<f89ff159>] snd_pcm_plug_write_transfer+0x75/0x9f > [snd_pcm_oss] > [52478.405955] [<f89fb6d8>] snd_pcm_oss_write2+0x82/0xc0 [snd_pcm_oss] > [52478.405955] [<f89fcfb6>] snd_pcm_oss_write+0x165/0x1ca [snd_pcm_oss] > [52478.405955] [<f89fce51>] ? snd_pcm_oss_write+0x0/0x1ca [snd_pcm_oss] > [52478.405955] [<c016f557>] vfs_write+0x8c/0x108 > [52478.405955] [<c016faee>] sys_write+0x3b/0x60 > [52478.405955] [<c01038b9>] sysenter_past_esp+0x6a/0xb1 > [52478.405955] =======================
bugme-daemon@bugzilla.kernel.org wrote: > This looks like a problem in the core hrtimers code? The last time I looked into that, my impression was it is an alsa core problem. We had a very lengthy discussion with Takashi about these locking problems, during which we came up with the current locking technique of snd-pcsp, but I was never convinced that it works right. It seemed to work though, and this was long ago. Some parts of that mailing are in pcsp_lib.c as a comments. Other parts I'll have to dig within a few days and come back with more info...
*** Bug 10703 has been marked as a duplicate of this bug. ***
On Thu, 15 May 2008, Andrew Morton wrote: > (switched to email. Please respond via emailed reply-to-all, not via the > bugzilla web interface). > > On Thu, 15 May 2008 03:43:36 -0700 (PDT) bugme-daemon@bugzilla.kernel.org > wrote: > > > http://bugzilla.kernel.org/show_bug.cgi?id=10701 > > > > Summary: snd_pcsp lockdep warning > > Product: Drivers > > Version: 2.5 > > KernelVersion: 2.6.26-rc2 > > Platform: All > > OS/Version: Linux > > Tree: Mainline > > Status: NEW > > Severity: normal > > Priority: P1 > > Component: Sound(ALSA) > > AssignedTo: perex@perex.cz > > ReportedBy: nuclearcat@nuclearcat.com > > > > > > Latest working kernel version: unknown > > Earliest failing kernel version: 2.6.26-rc2 > > Distribution: Gentoo > > Hardware Environment: Intel 965 chipset, Core 2 Duo > > Software Environment: mplayer > > Problem Description: Just lockdep warning in dmesg > > This looks like a problem in the core hrtimers code? No, this is a problem in the sound code. It uses the hrtimer callback mode HRTIMER_CB_IRQSAFE. This mode was never intended to provide a lock free call mode and was created for very fast callbacks like wakeups and the like. Also I find it pretty amazing that the pscp code has the following comment: /* * We need the hrtimer_start as a tasklet to avoid * the nasty locking problem. :( * The problem: * - The timer handler is called with the cpu_base->lock * already held by hrtimer code. * - snd_pcm_period_elapsed() takes the * substream->self_group.lock. * So far so good. * But the snd_pcsp_trigger() is called with the * substream->self_group.lock held, and it calls * hrtimer_start(), which takes the cpu_base->lock. * You see the problem. We have the code pathes * which take two locks in a reverse order. This * can deadlock and the lock validator complains. * The only solution I could find was to move the * hrtimer_start() into a tasklet. -stsp */ So the author was well aware of locking problem and the whole code is just a stupid hack around the problem without solving it. This code is not at all suited for HRTIMER_CB_IRQSAFE. Thanks, tglx
bugme-daemon@bugzilla.kernel.org wrote: > No, this is a problem in the sound code. Ack. > mode HRTIMER_CB_IRQSAFE. This mode was never intended to provide a > lock free call mode and was created for very fast callbacks like > wakeups and the like. But I indeed need a low-latency callback. That driver is very latency-sensitive. What would you suggest? > Also I find it pretty amazing that the pscp code > has the following comment: You must be thankfull for it being there at all. As otherwise you'd waste a lot more time. :) > So the author was well aware of locking problem and the whole code is Ack. That's actually what I've already admitted in that thread. > just a stupid hack around the problem without solving it. Well, yes. I tried to solve it in alsa core, but that was considered too intrusive and was not applied IIRC, and we ended up with what we have (and yes, it was discussed and stuff). If you can suggest a simple solution to our problem, it would be really great. Though this time I guess it won't get away in a couple of hacks anymore. I was never happy with the current solution too, as you can see. Otherwise I wouldn't document it so verbosely. But I didn't have anything else. And I don't beleive it can be properly solved only in a driver, without touching an alsa core. > This code is > not at all suited for HRTIMER_CB_IRQSAFE. Please suggest an alternative. But I personally think HRTIMER_CB_IRQSAFE can stay, we only need to get the locking problem solved. I'll investigate into that again, within a few days.
Reply-To: akpm@linux-foundation.org On Fri, 16 May 2008 14:46:31 +0200 (CEST) Thomas Gleixner <tglx@linutronix.de> wrote: > On Thu, 15 May 2008, Andrew Morton wrote: > > > (switched to email. Please respond via emailed reply-to-all, not via the > > bugzilla web interface). > > > > On Thu, 15 May 2008 03:43:36 -0700 (PDT) bugme-daemon@bugzilla.kernel.org > wrote: > > > > > http://bugzilla.kernel.org/show_bug.cgi?id=10701 > > > > > > Summary: snd_pcsp lockdep warning > > > Product: Drivers > > > Version: 2.5 > > > KernelVersion: 2.6.26-rc2 > > > Platform: All > > > OS/Version: Linux > > > Tree: Mainline > > > Status: NEW > > > Severity: normal > > > Priority: P1 > > > Component: Sound(ALSA) > > > AssignedTo: perex@perex.cz > > > ReportedBy: nuclearcat@nuclearcat.com > > > > > > > > > Latest working kernel version: unknown > > > Earliest failing kernel version: 2.6.26-rc2 > > > Distribution: Gentoo > > > Hardware Environment: Intel 965 chipset, Core 2 Duo > > > Software Environment: mplayer > > > Problem Description: Just lockdep warning in dmesg > > > > This looks like a problem in the core hrtimers code? > > No, this is a problem in the sound code. It uses the hrtimer callback > mode HRTIMER_CB_IRQSAFE. This mode was never intended to provide a > lock free call mode and was created for very fast callbacks like > wakeups and the like. Also I find it pretty amazing that the pscp code > has the following comment: > > /* > * We need the hrtimer_start as a tasklet to avoid > * the nasty locking problem. :( > * The problem: > * - The timer handler is called with the cpu_base->lock > * already held by hrtimer code. > * - snd_pcm_period_elapsed() takes the > * substream->self_group.lock. > * So far so good. > * But the snd_pcsp_trigger() is called with the > * substream->self_group.lock held, and it calls > * hrtimer_start(), which takes the cpu_base->lock. > * You see the problem. We have the code pathes > * which take two locks in a reverse order. This > * can deadlock and the lock validator complains. > * The only solution I could find was to move the > * hrtimer_start() into a tasklet. -stsp > */ > > So the author was well aware of locking problem and the whole code is > just a stupid hack around the problem without solving it. This code is > not at all suited for HRTIMER_CB_IRQSAFE. You sound impressed! So what's the fix? HRTIMER_CB_SOFTIRQ?
On Fri, 16 May 2008, Andrew Morton wrote: > > So the author was well aware of locking problem and the whole code is > > just a stupid hack around the problem without solving it. This code is > > not at all suited for HRTIMER_CB_IRQSAFE. > > You sound impressed! So what's the fix? HRTIMER_CB_SOFTIRQ? Yeah, impressed by creativity. HRTIMER_CB_SOFTIRQ should be the right thing. Thanks, tglx
*** This bug has been marked as a duplicate of bug 10679 ***
bugme-daemon@bugzilla.kernel.org wrote: > Yeah, impressed by creativity. HRTIMER_CB_SOFTIRQ should be the right > thing. The creativity is a consequence of the lack of documentation. Besides, it seems HRTIMER_CB_IRQSAFE does the callback sometimes from hardirq context and sometimes from the softirq context. At least it was so in the past. So the differences were not obvious from the testings I've done back then. Before the driver started to use the hrtimer code, the same callback was executed inside the timer irq handler, and it was very important to make it the first handler in the chain. Otherwise the latencies were unacceptable. So the most obvious way to convert it was to use HRTIMER_CB_IRQSAFE, as it seemed to me. Is there a reason, besides locking problem, why it should not be used for that? I'll try HRTIMER_CB_SOFTIRQ tomorrow and see if the latencies are tolerable, but is there a real reason for increasing the latencies that way?
bugme-daemon@bugzilla.kernel.org wrote: > Yeah, impressed by creativity. HRTIMER_CB_SOFTIRQ should be the right > thing. Actually, HRTIMER_CB_SOFTIRQ doesn't even help at all. I applied the patch attached to this message, and the bug is still there. And I don't see the increased latencies too, just nothing changed. And it looks like even with HRTIMER_CB_SOFTIRQ, the callback is still called with the cpu_base->lock held. Am I doing something wrong? What was supposed to change? diff --git a/sound/drivers/pcsp/pcsp.c b/sound/drivers/pcsp/pcsp.c index 54a1f90..1899cf0 100644 --- a/sound/drivers/pcsp/pcsp.c +++ b/sound/drivers/pcsp/pcsp.c @@ -96,7 +96,7 @@ static int __devinit snd_card_pcsp_probe(int devnum, struct device *dev) return -EINVAL; hrtimer_init(&pcsp_chip.timer, CLOCK_MONOTONIC, HRTIMER_MODE_REL); - pcsp_chip.timer.cb_mode = HRTIMER_CB_IRQSAFE; + pcsp_chip.timer.cb_mode = HRTIMER_CB_SOFTIRQ; pcsp_chip.timer.function = pcsp_do_timer; card = snd_card_new(index, id, THIS_MODULE, 0); diff --git a/sound/drivers/pcsp/pcsp_lib.c b/sound/drivers/pcsp/pcsp_lib.c index ac6238e..c30820d 100644 --- a/sound/drivers/pcsp/pcsp_lib.c +++ b/sound/drivers/pcsp/pcsp_lib.c @@ -9,7 +9,6 @@ #include <linux/module.h> #include <linux/moduleparam.h> #include <sound/pcm.h> -#include <linux/interrupt.h> #include <asm/io.h> #include "pcsp.h" @@ -18,31 +17,6 @@ module_param(nforce_wa, bool, 0444); MODULE_PARM_DESC(nforce_wa, "Apply NForce chipset workaround " "(expect bad sound)"); -static void pcsp_start_timer(unsigned long dummy) -{ - hrtimer_start(&pcsp_chip.timer, ktime_set(0, 0), HRTIMER_MODE_REL); -} - -/* - * We need the hrtimer_start as a tasklet to avoid - * the nasty locking problem. :( - * The problem: - * - The timer handler is called with the cpu_base->lock - * already held by hrtimer code. - * - snd_pcm_period_elapsed() takes the - * substream->self_group.lock. - * So far so good. - * But the snd_pcsp_trigger() is called with the - * substream->self_group.lock held, and it calls - * hrtimer_start(), which takes the cpu_base->lock. - * You see the problem. We have the code pathes - * which take two locks in a reverse order. This - * can deadlock and the lock validator complains. - * The only solution I could find was to move the - * hrtimer_start() into a tasklet. -stsp - */ -static DECLARE_TASKLET(pcsp_start_timer_tasklet, pcsp_start_timer, 0); - enum hrtimer_restart pcsp_do_timer(struct hrtimer *handle) { unsigned long flags; @@ -64,8 +38,6 @@ enum hrtimer_restart pcsp_do_timer(struct hrtimer *handle) return HRTIMER_RESTART; } - /* hrtimer calls us from both hardirq and softirq contexts, - * so irqsave :( */ spin_lock_irqsave(&chip->substream_lock, flags); /* Takashi Iwai says regarding this extra lock: @@ -167,7 +139,7 @@ static void pcsp_start_playing(struct snd_pcsp *chip) atomic_set(&chip->timer_active, 1); chip->thalf = 0; - tasklet_schedule(&pcsp_start_timer_tasklet); + hrtimer_start(&pcsp_chip.timer, ktime_set(0, 0), HRTIMER_MODE_REL); } static void pcsp_stop_playing(struct snd_pcsp *chip)
This bug is not the same as #10679.
At Fri, 16 May 2008 21:32:49 +0200 (CEST), Thomas Gleixner wrote: > > On Fri, 16 May 2008, Andrew Morton wrote: > > > So the author was well aware of locking problem and the whole code is > > > just a stupid hack around the problem without solving it. This code is > > > not at all suited for HRTIMER_CB_IRQSAFE. > > > > You sound impressed! So what's the fix? HRTIMER_CB_SOFTIRQ? > > Yeah, impressed by creativity. HRTIMER_CB_SOFTIRQ should be the right > thing. Well, it'd be basically a similar way like snd-pcsp currently does ("a stupid hack" :) But, it's good to have a fix, anyway, since this sounds like a generic problem with a callback in a spinlock. If the callback requires another own lock, this can easily lead to a AB/BA deadlock. Actually, ALSA PCM core had sometimes similar problems, too. thanks, Takashi
On Sat, 17 May 2008, Takashi Iwai wrote: > At Fri, 16 May 2008 21:32:49 +0200 (CEST), > Thomas Gleixner wrote: > > > > On Fri, 16 May 2008, Andrew Morton wrote: > > > > So the author was well aware of locking problem and the whole code is > > > > just a stupid hack around the problem without solving it. This code is > > > > not at all suited for HRTIMER_CB_IRQSAFE. > > > > > > You sound impressed! So what's the fix? HRTIMER_CB_SOFTIRQ? > > > > Yeah, impressed by creativity. HRTIMER_CB_SOFTIRQ should be the right > > thing. > > Well, it'd be basically a similar way like snd-pcsp currently does ("a > stupid hack" :) But, it's good to have a fix, anyway, since this > sounds like a generic problem with a callback in a spinlock. If the > callback requires another own lock, this can easily lead to a AB/BA > deadlock. Actually, ALSA PCM core had sometimes similar problems, > too. HRTIMER_CB_SOFTIRQ is not a stupid hack :) It's the default for hrtimers and it does not hold any locks when calling the callback. We really want to avoid tons of callbacks in the timer interrupt itself. Thanks, tglx
bugme-daemon@bugzilla.kernel.org wrote: > Actually, HRTIMER_CB_SOFTIRQ doesn't even > help at all. > I applied the patch attached to this message, > and the bug is still there. No, that was the wrong testing. The patch actually worked. :) Here's the updated version. diff --git a/sound/drivers/pcsp/pcsp.c b/sound/drivers/pcsp/pcsp.c index 54a1f90..1899cf0 100644 --- a/sound/drivers/pcsp/pcsp.c +++ b/sound/drivers/pcsp/pcsp.c @@ -96,7 +96,7 @@ static int __devinit snd_card_pcsp_probe(int devnum, struct device *dev) return -EINVAL; hrtimer_init(&pcsp_chip.timer, CLOCK_MONOTONIC, HRTIMER_MODE_REL); - pcsp_chip.timer.cb_mode = HRTIMER_CB_IRQSAFE; + pcsp_chip.timer.cb_mode = HRTIMER_CB_SOFTIRQ; pcsp_chip.timer.function = pcsp_do_timer; card = snd_card_new(index, id, THIS_MODULE, 0); diff --git a/sound/drivers/pcsp/pcsp_lib.c b/sound/drivers/pcsp/pcsp_lib.c index ac6238e..c30820d 100644 --- a/sound/drivers/pcsp/pcsp_lib.c +++ b/sound/drivers/pcsp/pcsp_lib.c @@ -9,7 +9,6 @@ #include <linux/module.h> #include <linux/moduleparam.h> #include <sound/pcm.h> -#include <linux/interrupt.h> #include <asm/io.h> #include "pcsp.h" @@ -18,34 +17,8 @@ module_param(nforce_wa, bool, 0444); MODULE_PARM_DESC(nforce_wa, "Apply NForce chipset workaround " "(expect bad sound)"); -static void pcsp_start_timer(unsigned long dummy) -{ - hrtimer_start(&pcsp_chip.timer, ktime_set(0, 0), HRTIMER_MODE_REL); -} - -/* - * We need the hrtimer_start as a tasklet to avoid - * the nasty locking problem. :( - * The problem: - * - The timer handler is called with the cpu_base->lock - * already held by hrtimer code. - * - snd_pcm_period_elapsed() takes the - * substream->self_group.lock. - * So far so good. - * But the snd_pcsp_trigger() is called with the - * substream->self_group.lock held, and it calls - * hrtimer_start(), which takes the cpu_base->lock. - * You see the problem. We have the code pathes - * which take two locks in a reverse order. This - * can deadlock and the lock validator complains. - * The only solution I could find was to move the - * hrtimer_start() into a tasklet. -stsp - */ -static DECLARE_TASKLET(pcsp_start_timer_tasklet, pcsp_start_timer, 0); - enum hrtimer_restart pcsp_do_timer(struct hrtimer *handle) { - unsigned long flags; unsigned char timer_cnt, val; int periods_elapsed; u64 ns; @@ -64,9 +37,7 @@ enum hrtimer_restart pcsp_do_timer(struct hrtimer *handle) return HRTIMER_RESTART; } - /* hrtimer calls us from both hardirq and softirq contexts, - * so irqsave :( */ - spin_lock_irqsave(&chip->substream_lock, flags); + spin_lock_irq(&chip->substream_lock); /* Takashi Iwai says regarding this extra lock: If the irq handler handles some data on the DMA buffer, it should @@ -132,7 +105,7 @@ enum hrtimer_restart pcsp_do_timer(struct hrtimer *handle) chip->period_ptr %= buffer_bytes; } - spin_unlock_irqrestore(&chip->substream_lock, flags); + spin_unlock_irq(&chip->substream_lock); if (!atomic_read(&chip->timer_active)) return HRTIMER_NORESTART; @@ -146,7 +119,7 @@ enum hrtimer_restart pcsp_do_timer(struct hrtimer *handle) exit_nr_unlock2: snd_pcm_stream_unlock(substream); exit_nr_unlock1: - spin_unlock_irqrestore(&chip->substream_lock, flags); + spin_unlock_irq(&chip->substream_lock); return HRTIMER_NORESTART; } @@ -167,7 +140,7 @@ static void pcsp_start_playing(struct snd_pcsp *chip) atomic_set(&chip->timer_active, 1); chip->thalf = 0; - tasklet_schedule(&pcsp_start_timer_tasklet); + hrtimer_start(&pcsp_chip.timer, ktime_set(0, 0), HRTIMER_MODE_REL); } static void pcsp_stop_playing(struct snd_pcsp *chip)
At Sat, 17 May 2008 17:24:22 +0200 (CEST), Thomas Gleixner wrote: > > On Sat, 17 May 2008, Takashi Iwai wrote: > > At Fri, 16 May 2008 21:32:49 +0200 (CEST), > > Thomas Gleixner wrote: > > > > > > On Fri, 16 May 2008, Andrew Morton wrote: > > > > > So the author was well aware of locking problem and the whole code is > > > > > just a stupid hack around the problem without solving it. This code > is > > > > > not at all suited for HRTIMER_CB_IRQSAFE. > > > > > > > > You sound impressed! So what's the fix? HRTIMER_CB_SOFTIRQ? > > > > > > Yeah, impressed by creativity. HRTIMER_CB_SOFTIRQ should be the right > > > thing. > > > > Well, it'd be basically a similar way like snd-pcsp currently does ("a > > stupid hack" :) But, it's good to have a fix, anyway, since this > > sounds like a generic problem with a callback in a spinlock. If the > > callback requires another own lock, this can easily lead to a AB/BA > > deadlock. Actually, ALSA PCM core had sometimes similar problems, > > too. > > HRTIMER_CB_SOFTIRQ is not a stupid hack :) It's the default for > hrtimers and it does not hold any locks when calling the callback. We > really want to avoid tons of callbacks in the timer interrupt itself. Yes, this would be an easy workaround for snd-pcsp problem, I guess. A better fix would be to make snd-pcsp hrtimer calback just change the pc-speaker port (0x61 and 0x42) and update the position. These should be really fast path. Then, occasionally invoke a tasklet that will call snd_pcm_period_elapsed(), which could be a (relatively) slow path and lead to a spinlock mess. thanks, Takashi
bugme-daemon@bugzilla.kernel.org wrote: > A better fix would be to make snd-pcsp hrtimer calback just change the > pc-speaker port (0x61 and 0x42) and update the position. These should > be really fast path. But even that will require the snd_pcm_stream_lock() IIRC, because of an access to the (fake) DMA buffer and other stuff. And that lock, in turn, requires the pcsp_chip.substream_lock. Which is exactly the lock that triggers a warning. So this will unlikely to help. But I think I don't understand the dependancy chain yet. snd-pcsp does not call hrtimer_start() under any lock (because of the tasklet hack). So I don't see how it could introduce the dependancy pcsp_chip.substream_lock -> cpu_base->lock. Moreso, this problem never happened since the tasklet workaround was added, and now it started to happen again. But the pcsp code did not change. From the traces it seems the dependancy is introduced in an alsa core somewhere. Using HRTIMER_CB_SOFTIRQ will of course get rid of the problem by removing the backward dependancy part, but then we'll never know who recently introduced a forward part. > Then, occasionally invoke a tasklet that will > call snd_pcm_period_elapsed(), which could be a (relatively) slow > path and lead to a spinlock mess. From the traces it seems the problem triggers when the handler is just trying to take the pcsp_chip.substream_lock. It doesn't look like snd_pcm_period_elapsed() gets involved, but I might be wrong...
The HRTIMER_CB_SOFTIRQ is a good workaround indeed. So I would like to take this as a 2.6.26 fix. Could you post this to LKML and me? Anyway, what I meant is that the part touching I/O ports could be a faster path than softirq. And the hrtimer code can use a bit simpler locks. The hrtimer callback needs just one thing about DMA buffer - the buffer is allocated and the address doesn't change during the callback. So, we'd need the lock for the hr callback and for the PCM prepare, hwparams and free callbacks, i.e. to assure that the hrtimer callback is finished when these PCM callbacks may change the DMA buffer. Thus this lock doesn't have to be PCM substream lock. The current lock mess may come from the PCM trigger (STOP) called from snd_pcm_period_elapsed(). So, putting snd_pcm_period_elapsed() out of hrtimer lock is anyway necessary. A tasklet for snd_pcm_period_elapsed() is needed for this reason, too.
Created attachment 16183 [details] patch to try Denys, could you please check and see if the patch fixes the problem for you?
bugme-daemon@bugzilla.kernel.org wrote: > The HRTIMER_CB_SOFTIRQ is a good workaround indeed. So I would like to take > this as a 2.6.26 fix. Could you post this to LKML and me? OK, will do. Lets wait for the reporter to confirm the fix. > Anyway, what I meant is that the part touching I/O ports could be a faster > path > than softirq. Ah, so you are talking about a different problem. Lets move it to alsa-devel then to avoid the confusion. I'll reply to the list.
It is not easy to trigger it again on my PC. It happens only once actually. If you think it is fixed, maybe better if i will close bug?
Interesting. For me it became 100% reproducible on rc2. Lets not close it then for now. Especially, right now no patch is applied yet.
This should be fixed by commit 4b7afb0d0d23b298a7e6d30eaba0679449542d2e Please reopen this bug if it still occurs after this commit (e.g. in 2.6.26-rc4 or later).