Bug 10701

Summary: snd_pcsp lockdep warning
Product: Drivers Reporter: Denys Fedoryshchenko (nuclearcat)
Component: Sound(ALSA)Assignee: Stas Sergeev (stsp2)
Status: CLOSED CODE_FIX    
Severity: normal CC: bunk, perex, rjw, tiwai
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 2.6.26-rc2 Subsystem:
Regression: Yes Bisected commit-id:
Bug Depends on:    
Bug Blocks: 10492    
Attachments: patch to try

Description Denys Fedoryshchenko 2008-05-15 03:43:36 UTC
Latest working kernel version: unknown
Earliest failing kernel version: 2.6.26-rc2
Distribution: Gentoo
Hardware Environment: Intel 965 chipset, Core 2 Duo
Software Environment: mplayer
Problem Description: Just lockdep warning in dmesg

Steps to reproduce: while playing HD video got this.
[52478.405955] =======================================================
[52478.405955] [ INFO: possible circular locking dependency detected ]
[52478.405955] 2.6.26-rc2-devel #1
[52478.405955] -------------------------------------------------------
[52478.405955] mplayer/4814 is trying to acquire lock:
[52478.405955]  (&pcsp_chip.substream_lock){++..}, at: [<f88af285>] pcsp_do_timer+0x51/0x2a8 [snd_pcsp]
[52478.405955]
[52478.405955] but task is already holding lock:
[52478.405955]  (&cpu_base->lock){++..}, at: [<c0136ac4>] hrtimer_interrupt+0x6c/0x154
[52478.405955]
[52478.405955] which lock already depends on the new lock.
[52478.405955]
[52478.405955]
[52478.405955] the existing dependency chain (in reverse order) is:
[52478.405955]
[52478.405955] -> #5 (&cpu_base->lock){++..}:
[52478.405955]        [<c013fc8e>] __lock_acquire+0x982/0xb37
[52478.405955]        [<c013feaf>] lock_acquire+0x6c/0x89
[52478.405955]        [<c030783a>] _spin_lock_irqsave+0x25/0x55
[52478.405955]        [<c01366c2>] lock_hrtimer_base+0x18/0x33
[52478.405955]        [<c01367aa>] hrtimer_start+0x19/0x100
[52478.405955]        [<c011b656>] enqueue_rt_entity+0xf7/0x10e
[52478.405955]        [<c011b6cb>] enqueue_task_rt+0x36/0x4a
[52478.405955]        [<c0118235>] enqueue_task+0x4c/0x58
[52478.405955]        [<c011825a>] activate_task+0x19/0x1f
[52478.405955]        [<c011a9f7>] try_to_wake_up+0x17c/0x1be
[52478.405955]        [<c011aa61>] wake_up_process+0xf/0x11
[52478.405955]        [<c0303060>] migration_call+0xb4/0x403
[52478.405955]        [<c0440d1b>] migration_init+0x38/0x45
[52478.405955]        [<c043439b>] kernel_init+0x49/0x22a
[52478.405955]        [<c01045af>] kernel_thread_helper+0x7/0x10
[52478.405955]        [<ffffffff>] 0xffffffff
[52478.405955]
[52478.405955] -> #4 (&rt_b->rt_runtime_lock){++..}:
[52478.405955]        [<c013fc8e>] __lock_acquire+0x982/0xb37
[52478.405955]        [<c013feaf>] lock_acquire+0x6c/0x89
[52478.405955]        [<c030748f>] _spin_lock+0x1c/0x49
[52478.405955]        [<c011b613>] enqueue_rt_entity+0xb4/0x10e
[52478.405955]        [<c011b6cb>] enqueue_task_rt+0x36/0x4a
[52478.405955]        [<c0118235>] enqueue_task+0x4c/0x58
[52478.405955]        [<c011825a>] activate_task+0x19/0x1f
[52478.405955]        [<c011a9f7>] try_to_wake_up+0x17c/0x1be
[52478.405955]        [<c011aa61>] wake_up_process+0xf/0x11
[52478.405955]        [<c0303060>] migration_call+0xb4/0x403
[52478.405955]        [<c0440d1b>] migration_init+0x38/0x45
[52478.405955]        [<c043439b>] kernel_init+0x49/0x22a
[52478.405955]        [<c01045af>] kernel_thread_helper+0x7/0x10
[52478.405955]        [<ffffffff>] 0xffffffff
[52478.405955]
[52478.405955] -> #3 (&rq->rq_lock_key){++..}:
[52478.405955]        [<c013fc8e>] __lock_acquire+0x982/0xb37
[52478.405955]        [<c013feaf>] lock_acquire+0x6c/0x89
[52478.405955]        [<c030748f>] _spin_lock+0x1c/0x49
[52478.405955]        [<c011a634>] task_rq_lock+0x36/0x5d
[52478.405955]        [<c011a8a5>] try_to_wake_up+0x2a/0x1be
[52478.405955]        [<c011aa44>] default_wake_function+0xb/0xd
[52478.405955]        [<c0117dad>] __wake_up_common+0x2f/0x5a
[52478.405955]        [<c0119c2a>] complete+0x2b/0x3e
[52478.405955]        [<c0133b93>] kthread+0x1f/0x64
[52478.405955]        [<c01045af>] kernel_thread_helper+0x7/0x10
[52478.405955]        [<ffffffff>] 0xffffffff
[52478.405955]
[52478.405955] -> #2 (&q->lock){++..}:
[52478.405955]        [<c013fc8e>] __lock_acquire+0x982/0xb37
[52478.405955]        [<c013feaf>] lock_acquire+0x6c/0x89
[52478.405955]        [<c030783a>] _spin_lock_irqsave+0x25/0x55
[52478.405955]        [<c0119c96>] __wake_up+0x15/0x3b
[52478.405955]        [<f88c6f2d>] snd_pcm_post_stop+0x51/0x5a [snd_pcm]
[52478.405955]        [<f88c6353>] snd_pcm_action_single+0x33/0x4b [snd_pcm]
[52478.405955]        [<f88c7238>] snd_pcm_action+0x68/0x74 [snd_pcm]
[52478.405955]        [<f88c7268>] snd_pcm_stop+0x11/0x13 [snd_pcm]
[52478.405955]        [<f88c7302>] snd_pcm_drop+0x98/0xbb [snd_pcm]
[52478.405955]        [<f88c9cf0>] snd_pcm_common_ioctl1+0xe7c/0xf6d [snd_pcm]
[52478.405955]        [<f88ca4a3>] snd_pcm_playback_ioctl1+0x355/0x36c [snd_pcm]
[52478.405955]        [<f88ca4f2>] snd_pcm_kernel_ioctl+0x38/0x5d [snd_pcm]
[52478.405955]        [<f89fc93e>] snd_pcm_oss_change_params+0xaee/0xea7 [snd_pcm_oss]
[52478.405955]        [<f89fd04f>] snd_pcm_oss_get_active_substream+0x34/0x4c [snd_pcm_oss]
[52478.405955]        [<f89fd075>] snd_pcm_oss_get_format+0xe/0x20 [snd_pcm_oss]
[52478.405955]        [<f89fe246>] snd_pcm_oss_ioctl+0x3c0/0x9a5 [snd_pcm_oss]
[52478.405955]        [<c0179202>] vfs_ioctl+0x22/0x67
[52478.405955]        [<c01794ab>] do_vfs_ioctl+0x264/0x27b
[52478.405955]        [<c01794ee>] sys_ioctl+0x2c/0x46
[52478.405955]        [<c01038b9>] sysenter_past_esp+0x6a/0xb1
[52478.405955]        [<ffffffff>] 0xffffffff
[52478.405955]
[52478.405955] -> #1 (&substream->self_group.lock){.+..}:
[52478.405955]        [<c013fc8e>] __lock_acquire+0x982/0xb37
[52478.405955]        [<c013feaf>] lock_acquire+0x6c/0x89
[52478.405955]        [<c030748f>] _spin_lock+0x1c/0x49
[52478.405955]        [<f88af2b3>] pcsp_do_timer+0x7f/0x2a8 [snd_pcsp]
[52478.405955]        [<c0136372>] run_hrtimer_pending+0x3a/0xcb
[52478.405955]        [<c0136419>] run_hrtimer_softirq+0x16/0x18
[52478.405955]        [<c0127f51>] __do_softirq+0x6f/0xe9
[52478.405955]        [<c01059c3>] do_softirq+0x5e/0xa8
[52478.405955]        [<ffffffff>] 0xffffffff
[52478.405955]
[52478.405955] -> #0 (&pcsp_chip.substream_lock){++..}:
[52478.405955]        [<c013fbb5>] __lock_acquire+0x8a9/0xb37
[52478.405955]        [<c013feaf>] lock_acquire+0x6c/0x89
[52478.405955]        [<c030783a>] _spin_lock_irqsave+0x25/0x55
[52478.405955]        [<f88af285>] pcsp_do_timer+0x51/0x2a8 [snd_pcsp]
[52478.405955]        [<c0136311>] __run_hrtimer+0x49/0x70
[52478.405955]        [<c0136b43>] hrtimer_interrupt+0xeb/0x154
[52478.405955]        [<c0110670>] smp_apic_timer_interrupt+0x6c/0x80
[52478.405955]        [<c0104417>] apic_timer_interrupt+0x33/0x38
[52478.405955]        [<f89ff159>] snd_pcm_plug_write_transfer+0x75/0x9f [snd_pcm_oss]
[52478.405955]        [<f89fb6d8>] snd_pcm_oss_write2+0x82/0xc0 [snd_pcm_oss]
[52478.405955]        [<f89fcfb6>] snd_pcm_oss_write+0x165/0x1ca [snd_pcm_oss]
[52478.405955]        [<c016f557>] vfs_write+0x8c/0x108
[52478.405955]        [<c016faee>] sys_write+0x3b/0x60
[52478.405955]        [<c01038b9>] sysenter_past_esp+0x6a/0xb1
[52478.405955]        [<ffffffff>] 0xffffffff
[52478.405955]
[52478.405955] other info that might help us debug this:
[52478.405955]
[52478.405955] 2 locks held by mplayer/4814:
[52478.405955]  #0:  (&runtime->oss.params_lock){--..}, at: [<f89fceb7>] snd_pcm_oss_write+0x66/0x1ca [snd_pcm_oss]
[52478.405955]  #1:  (&cpu_base->lock){++..}, at: [<c0136ac4>] hrtimer_interrupt+0x6c/0x154
[52478.405955]
[52478.405955] stack backtrace:
[52478.405955] Pid: 4814, comm: mplayer Not tainted 2.6.26-rc2-devel #1
[52478.405955]  [<c013de71>] print_circular_bug_tail+0x5b/0x66
[52478.405955]  [<c013fbb5>] __lock_acquire+0x8a9/0xb37
[52478.405955]  [<c013feaf>] lock_acquire+0x6c/0x89
[52478.405955]  [<f88af285>] ? pcsp_do_timer+0x51/0x2a8 [snd_pcsp]
[52478.405955]  [<c030783a>] _spin_lock_irqsave+0x25/0x55
[52478.405955]  [<f88af285>] ? pcsp_do_timer+0x51/0x2a8 [snd_pcsp]
[52478.405955]  [<f88af234>] ? pcsp_do_timer+0x0/0x2a8 [snd_pcsp]
[52478.405955]  [<f88af285>] pcsp_do_timer+0x51/0x2a8 [snd_pcsp]
[52478.405955]  [<c0136134>] ? __remove_hrtimer+0x62/0x6a
[52478.405955]  [<f88af234>] ? pcsp_do_timer+0x0/0x2a8 [snd_pcsp]
[52478.405955]  [<c0136311>] __run_hrtimer+0x49/0x70
[52478.405955]  [<c0136b43>] hrtimer_interrupt+0xeb/0x154
[52478.405955]  [<c0110670>] smp_apic_timer_interrupt+0x6c/0x80
[52478.405955]  [<c0104417>] apic_timer_interrupt+0x33/0x38
[52478.405955]  [<f8a0022d>] ? linear_transfer+0xce/0x155 [snd_pcm_oss]
[52478.405955]  [<f89ff159>] snd_pcm_plug_write_transfer+0x75/0x9f [snd_pcm_oss]
[52478.405955]  [<f89fb6d8>] snd_pcm_oss_write2+0x82/0xc0 [snd_pcm_oss]
[52478.405955]  [<f89fcfb6>] snd_pcm_oss_write+0x165/0x1ca [snd_pcm_oss]
[52478.405955]  [<f89fce51>] ? snd_pcm_oss_write+0x0/0x1ca [snd_pcm_oss]
[52478.405955]  [<c016f557>] vfs_write+0x8c/0x108
[52478.405955]  [<c016faee>] sys_write+0x3b/0x60
[52478.405955]  [<c01038b9>] sysenter_past_esp+0x6a/0xb1
[52478.405955]  =======================
Comment 1 Anonymous Emailer 2008-05-15 11:03:06 UTC
Reply-To: akpm@linux-foundation.org

(switched to email.  Please respond via emailed reply-to-all, not via the
bugzilla web interface).

On Thu, 15 May 2008 03:43:36 -0700 (PDT) bugme-daemon@bugzilla.kernel.org wrote:

> http://bugzilla.kernel.org/show_bug.cgi?id=10701
> 
>            Summary: snd_pcsp lockdep warning
>            Product: Drivers
>            Version: 2.5
>      KernelVersion: 2.6.26-rc2
>           Platform: All
>         OS/Version: Linux
>               Tree: Mainline
>             Status: NEW
>           Severity: normal
>           Priority: P1
>          Component: Sound(ALSA)
>         AssignedTo: perex@perex.cz
>         ReportedBy: nuclearcat@nuclearcat.com
> 
> 
> Latest working kernel version: unknown
> Earliest failing kernel version: 2.6.26-rc2
> Distribution: Gentoo
> Hardware Environment: Intel 965 chipset, Core 2 Duo
> Software Environment: mplayer
> Problem Description: Just lockdep warning in dmesg

This looks like a problem in the core hrtimers code?

> Steps to reproduce: while playing HD video got this.
> [52478.405955] =======================================================
> [52478.405955] [ INFO: possible circular locking dependency detected ]
> [52478.405955] 2.6.26-rc2-devel #1
> [52478.405955] -------------------------------------------------------
> [52478.405955] mplayer/4814 is trying to acquire lock:
> [52478.405955]  (&pcsp_chip.substream_lock){++..}, at: [<f88af285>]
> pcsp_do_timer+0x51/0x2a8 [snd_pcsp]
> [52478.405955]
> [52478.405955] but task is already holding lock:
> [52478.405955]  (&cpu_base->lock){++..}, at: [<c0136ac4>]
> hrtimer_interrupt+0x6c/0x154
> [52478.405955]
> [52478.405955] which lock already depends on the new lock.
> [52478.405955]
> [52478.405955]
> [52478.405955] the existing dependency chain (in reverse order) is:
> [52478.405955]
> [52478.405955] -> #5 (&cpu_base->lock){++..}:
> [52478.405955]        [<c013fc8e>] __lock_acquire+0x982/0xb37
> [52478.405955]        [<c013feaf>] lock_acquire+0x6c/0x89
> [52478.405955]        [<c030783a>] _spin_lock_irqsave+0x25/0x55
> [52478.405955]        [<c01366c2>] lock_hrtimer_base+0x18/0x33
> [52478.405955]        [<c01367aa>] hrtimer_start+0x19/0x100
> [52478.405955]        [<c011b656>] enqueue_rt_entity+0xf7/0x10e
> [52478.405955]        [<c011b6cb>] enqueue_task_rt+0x36/0x4a
> [52478.405955]        [<c0118235>] enqueue_task+0x4c/0x58
> [52478.405955]        [<c011825a>] activate_task+0x19/0x1f
> [52478.405955]        [<c011a9f7>] try_to_wake_up+0x17c/0x1be
> [52478.405955]        [<c011aa61>] wake_up_process+0xf/0x11
> [52478.405955]        [<c0303060>] migration_call+0xb4/0x403
> [52478.405955]        [<c0440d1b>] migration_init+0x38/0x45
> [52478.405955]        [<c043439b>] kernel_init+0x49/0x22a
> [52478.405955]        [<c01045af>] kernel_thread_helper+0x7/0x10
> [52478.405955]        [<ffffffff>] 0xffffffff
> [52478.405955]
> [52478.405955] -> #4 (&rt_b->rt_runtime_lock){++..}:
> [52478.405955]        [<c013fc8e>] __lock_acquire+0x982/0xb37
> [52478.405955]        [<c013feaf>] lock_acquire+0x6c/0x89
> [52478.405955]        [<c030748f>] _spin_lock+0x1c/0x49
> [52478.405955]        [<c011b613>] enqueue_rt_entity+0xb4/0x10e
> [52478.405955]        [<c011b6cb>] enqueue_task_rt+0x36/0x4a
> [52478.405955]        [<c0118235>] enqueue_task+0x4c/0x58
> [52478.405955]        [<c011825a>] activate_task+0x19/0x1f
> [52478.405955]        [<c011a9f7>] try_to_wake_up+0x17c/0x1be
> [52478.405955]        [<c011aa61>] wake_up_process+0xf/0x11
> [52478.405955]        [<c0303060>] migration_call+0xb4/0x403
> [52478.405955]        [<c0440d1b>] migration_init+0x38/0x45
> [52478.405955]        [<c043439b>] kernel_init+0x49/0x22a
> [52478.405955]        [<c01045af>] kernel_thread_helper+0x7/0x10
> [52478.405955]        [<ffffffff>] 0xffffffff
> [52478.405955]
> [52478.405955] -> #3 (&rq->rq_lock_key){++..}:
> [52478.405955]        [<c013fc8e>] __lock_acquire+0x982/0xb37
> [52478.405955]        [<c013feaf>] lock_acquire+0x6c/0x89
> [52478.405955]        [<c030748f>] _spin_lock+0x1c/0x49
> [52478.405955]        [<c011a634>] task_rq_lock+0x36/0x5d
> [52478.405955]        [<c011a8a5>] try_to_wake_up+0x2a/0x1be
> [52478.405955]        [<c011aa44>] default_wake_function+0xb/0xd
> [52478.405955]        [<c0117dad>] __wake_up_common+0x2f/0x5a
> [52478.405955]        [<c0119c2a>] complete+0x2b/0x3e
> [52478.405955]        [<c0133b93>] kthread+0x1f/0x64
> [52478.405955]        [<c01045af>] kernel_thread_helper+0x7/0x10
> [52478.405955]        [<ffffffff>] 0xffffffff
> [52478.405955]
> [52478.405955] -> #2 (&q->lock){++..}:
> [52478.405955]        [<c013fc8e>] __lock_acquire+0x982/0xb37
> [52478.405955]        [<c013feaf>] lock_acquire+0x6c/0x89
> [52478.405955]        [<c030783a>] _spin_lock_irqsave+0x25/0x55
> [52478.405955]        [<c0119c96>] __wake_up+0x15/0x3b
> [52478.405955]        [<f88c6f2d>] snd_pcm_post_stop+0x51/0x5a [snd_pcm]
> [52478.405955]        [<f88c6353>] snd_pcm_action_single+0x33/0x4b [snd_pcm]
> [52478.405955]        [<f88c7238>] snd_pcm_action+0x68/0x74 [snd_pcm]
> [52478.405955]        [<f88c7268>] snd_pcm_stop+0x11/0x13 [snd_pcm]
> [52478.405955]        [<f88c7302>] snd_pcm_drop+0x98/0xbb [snd_pcm]
> [52478.405955]        [<f88c9cf0>] snd_pcm_common_ioctl1+0xe7c/0xf6d
> [snd_pcm]
> [52478.405955]        [<f88ca4a3>] snd_pcm_playback_ioctl1+0x355/0x36c
> [snd_pcm]
> [52478.405955]        [<f88ca4f2>] snd_pcm_kernel_ioctl+0x38/0x5d [snd_pcm]
> [52478.405955]        [<f89fc93e>] snd_pcm_oss_change_params+0xaee/0xea7
> [snd_pcm_oss]
> [52478.405955]        [<f89fd04f>] snd_pcm_oss_get_active_substream+0x34/0x4c
> [snd_pcm_oss]
> [52478.405955]        [<f89fd075>] snd_pcm_oss_get_format+0xe/0x20
> [snd_pcm_oss]
> [52478.405955]        [<f89fe246>] snd_pcm_oss_ioctl+0x3c0/0x9a5
> [snd_pcm_oss]
> [52478.405955]        [<c0179202>] vfs_ioctl+0x22/0x67
> [52478.405955]        [<c01794ab>] do_vfs_ioctl+0x264/0x27b
> [52478.405955]        [<c01794ee>] sys_ioctl+0x2c/0x46
> [52478.405955]        [<c01038b9>] sysenter_past_esp+0x6a/0xb1
> [52478.405955]        [<ffffffff>] 0xffffffff
> [52478.405955]
> [52478.405955] -> #1 (&substream->self_group.lock){.+..}:
> [52478.405955]        [<c013fc8e>] __lock_acquire+0x982/0xb37
> [52478.405955]        [<c013feaf>] lock_acquire+0x6c/0x89
> [52478.405955]        [<c030748f>] _spin_lock+0x1c/0x49
> [52478.405955]        [<f88af2b3>] pcsp_do_timer+0x7f/0x2a8 [snd_pcsp]
> [52478.405955]        [<c0136372>] run_hrtimer_pending+0x3a/0xcb
> [52478.405955]        [<c0136419>] run_hrtimer_softirq+0x16/0x18
> [52478.405955]        [<c0127f51>] __do_softirq+0x6f/0xe9
> [52478.405955]        [<c01059c3>] do_softirq+0x5e/0xa8
> [52478.405955]        [<ffffffff>] 0xffffffff
> [52478.405955]
> [52478.405955] -> #0 (&pcsp_chip.substream_lock){++..}:
> [52478.405955]        [<c013fbb5>] __lock_acquire+0x8a9/0xb37
> [52478.405955]        [<c013feaf>] lock_acquire+0x6c/0x89
> [52478.405955]        [<c030783a>] _spin_lock_irqsave+0x25/0x55
> [52478.405955]        [<f88af285>] pcsp_do_timer+0x51/0x2a8 [snd_pcsp]
> [52478.405955]        [<c0136311>] __run_hrtimer+0x49/0x70
> [52478.405955]        [<c0136b43>] hrtimer_interrupt+0xeb/0x154
> [52478.405955]        [<c0110670>] smp_apic_timer_interrupt+0x6c/0x80
> [52478.405955]        [<c0104417>] apic_timer_interrupt+0x33/0x38
> [52478.405955]        [<f89ff159>] snd_pcm_plug_write_transfer+0x75/0x9f
> [snd_pcm_oss]
> [52478.405955]        [<f89fb6d8>] snd_pcm_oss_write2+0x82/0xc0 [snd_pcm_oss]
> [52478.405955]        [<f89fcfb6>] snd_pcm_oss_write+0x165/0x1ca
> [snd_pcm_oss]
> [52478.405955]        [<c016f557>] vfs_write+0x8c/0x108
> [52478.405955]        [<c016faee>] sys_write+0x3b/0x60
> [52478.405955]        [<c01038b9>] sysenter_past_esp+0x6a/0xb1
> [52478.405955]        [<ffffffff>] 0xffffffff
> [52478.405955]
> [52478.405955] other info that might help us debug this:
> [52478.405955]
> [52478.405955] 2 locks held by mplayer/4814:
> [52478.405955]  #0:  (&runtime->oss.params_lock){--..}, at: [<f89fceb7>]
> snd_pcm_oss_write+0x66/0x1ca [snd_pcm_oss]
> [52478.405955]  #1:  (&cpu_base->lock){++..}, at: [<c0136ac4>]
> hrtimer_interrupt+0x6c/0x154
> [52478.405955]
> [52478.405955] stack backtrace:
> [52478.405955] Pid: 4814, comm: mplayer Not tainted 2.6.26-rc2-devel #1
> [52478.405955]  [<c013de71>] print_circular_bug_tail+0x5b/0x66
> [52478.405955]  [<c013fbb5>] __lock_acquire+0x8a9/0xb37
> [52478.405955]  [<c013feaf>] lock_acquire+0x6c/0x89
> [52478.405955]  [<f88af285>] ? pcsp_do_timer+0x51/0x2a8 [snd_pcsp]
> [52478.405955]  [<c030783a>] _spin_lock_irqsave+0x25/0x55
> [52478.405955]  [<f88af285>] ? pcsp_do_timer+0x51/0x2a8 [snd_pcsp]
> [52478.405955]  [<f88af234>] ? pcsp_do_timer+0x0/0x2a8 [snd_pcsp]
> [52478.405955]  [<f88af285>] pcsp_do_timer+0x51/0x2a8 [snd_pcsp]
> [52478.405955]  [<c0136134>] ? __remove_hrtimer+0x62/0x6a
> [52478.405955]  [<f88af234>] ? pcsp_do_timer+0x0/0x2a8 [snd_pcsp]
> [52478.405955]  [<c0136311>] __run_hrtimer+0x49/0x70
> [52478.405955]  [<c0136b43>] hrtimer_interrupt+0xeb/0x154
> [52478.405955]  [<c0110670>] smp_apic_timer_interrupt+0x6c/0x80
> [52478.405955]  [<c0104417>] apic_timer_interrupt+0x33/0x38
> [52478.405955]  [<f8a0022d>] ? linear_transfer+0xce/0x155 [snd_pcm_oss]
> [52478.405955]  [<f89ff159>] snd_pcm_plug_write_transfer+0x75/0x9f
> [snd_pcm_oss]
> [52478.405955]  [<f89fb6d8>] snd_pcm_oss_write2+0x82/0xc0 [snd_pcm_oss]
> [52478.405955]  [<f89fcfb6>] snd_pcm_oss_write+0x165/0x1ca [snd_pcm_oss]
> [52478.405955]  [<f89fce51>] ? snd_pcm_oss_write+0x0/0x1ca [snd_pcm_oss]
> [52478.405955]  [<c016f557>] vfs_write+0x8c/0x108
> [52478.405955]  [<c016faee>] sys_write+0x3b/0x60
> [52478.405955]  [<c01038b9>] sysenter_past_esp+0x6a/0xb1
> [52478.405955]  =======================
Comment 2 Stas Sergeev 2008-05-15 12:05:07 UTC
bugme-daemon@bugzilla.kernel.org wrote:
> This looks like a problem in the core hrtimers code?
The last time I looked into that,
my impression was it is an alsa core
problem. We had a very lengthy
discussion with Takashi about these
locking problems, during which we
came up with the current locking technique
of snd-pcsp, but I was never convinced
that it works right. It seemed to work
though, and this was long ago.
Some parts of that mailing are in
pcsp_lib.c as a comments. Other parts
I'll have to dig within a few days and
come back with more info...
Comment 3 Adrian Bunk 2008-05-15 15:00:12 UTC
*** Bug 10703 has been marked as a duplicate of this bug. ***
Comment 4 Thomas Gleixner 2008-05-16 05:47:17 UTC
On Thu, 15 May 2008, Andrew Morton wrote:

> (switched to email.  Please respond via emailed reply-to-all, not via the
> bugzilla web interface).
> 
> On Thu, 15 May 2008 03:43:36 -0700 (PDT) bugme-daemon@bugzilla.kernel.org
> wrote:
> 
> > http://bugzilla.kernel.org/show_bug.cgi?id=10701
> > 
> >            Summary: snd_pcsp lockdep warning
> >            Product: Drivers
> >            Version: 2.5
> >      KernelVersion: 2.6.26-rc2
> >           Platform: All
> >         OS/Version: Linux
> >               Tree: Mainline
> >             Status: NEW
> >           Severity: normal
> >           Priority: P1
> >          Component: Sound(ALSA)
> >         AssignedTo: perex@perex.cz
> >         ReportedBy: nuclearcat@nuclearcat.com
> > 
> > 
> > Latest working kernel version: unknown
> > Earliest failing kernel version: 2.6.26-rc2
> > Distribution: Gentoo
> > Hardware Environment: Intel 965 chipset, Core 2 Duo
> > Software Environment: mplayer
> > Problem Description: Just lockdep warning in dmesg
> 
> This looks like a problem in the core hrtimers code?

No, this is a problem in the sound code. It uses the hrtimer callback
mode HRTIMER_CB_IRQSAFE. This mode was never intended to provide a
lock free call mode and was created for very fast callbacks like
wakeups and the like. Also I find it pretty amazing that the pscp code
has the following comment:

/*
 * We need the hrtimer_start as a tasklet to avoid
 * the nasty locking problem. :(
 * The problem:
 * - The timer handler is called with the cpu_base->lock
 *   already held by hrtimer code.
 * - snd_pcm_period_elapsed() takes the
 *   substream->self_group.lock.
 * So far so good.
 * But the snd_pcsp_trigger() is called with the
 * substream->self_group.lock held, and it calls
 * hrtimer_start(), which takes the cpu_base->lock.
 * You see the problem. We have the code pathes
 * which take two locks in a reverse order. This
 * can deadlock and the lock validator complains.
 * The only solution I could find was to move the
 * hrtimer_start() into a tasklet. -stsp
 */

So the author was well aware of locking problem and the whole code is
just a stupid hack around the problem without solving it. This code is
not at all suited for HRTIMER_CB_IRQSAFE.

Thanks,
	tglx
Comment 5 Stas Sergeev 2008-05-16 10:39:00 UTC
bugme-daemon@bugzilla.kernel.org wrote:
> No, this is a problem in the sound code.
Ack.

> mode HRTIMER_CB_IRQSAFE. This mode was never intended to provide a
> lock free call mode and was created for very fast callbacks like
> wakeups and the like.
But I indeed need a low-latency callback.
That driver is very latency-sensitive.
What would you suggest?

> Also I find it pretty amazing that the pscp code
> has the following comment:
You must be thankfull for it being there
at all. As otherwise you'd waste a lot more
time. :)

> So the author was well aware of locking problem and the whole code is
Ack. That's actually what I've already
admitted in that thread.

> just a stupid hack around the problem without solving it.
Well, yes. I tried to solve it in alsa
core, but that was considered too intrusive
and was not applied IIRC, and we ended
up with what we have (and yes, it was
discussed and stuff).
If you can suggest a simple solution to
our problem, it would be really great.
Though this time I guess it won't get
away in a couple of hacks anymore.
I was never happy with the current solution
too, as you can see. Otherwise I wouldn't
document it so verbosely. But I didn't
have anything else. And I don't beleive
it can be properly solved only in a
driver, without touching an alsa core.

> This code is
> not at all suited for HRTIMER_CB_IRQSAFE.
Please suggest an alternative.
But I personally think HRTIMER_CB_IRQSAFE
can stay, we only need to get the locking
problem solved.
I'll investigate into that again, within
a few days.
Comment 6 Anonymous Emailer 2008-05-16 10:40:11 UTC
Reply-To: akpm@linux-foundation.org

On Fri, 16 May 2008 14:46:31 +0200 (CEST) Thomas Gleixner <tglx@linutronix.de> wrote:

> On Thu, 15 May 2008, Andrew Morton wrote:
> 
> > (switched to email.  Please respond via emailed reply-to-all, not via the
> > bugzilla web interface).
> > 
> > On Thu, 15 May 2008 03:43:36 -0700 (PDT) bugme-daemon@bugzilla.kernel.org
> wrote:
> > 
> > > http://bugzilla.kernel.org/show_bug.cgi?id=10701
> > > 
> > >            Summary: snd_pcsp lockdep warning
> > >            Product: Drivers
> > >            Version: 2.5
> > >      KernelVersion: 2.6.26-rc2
> > >           Platform: All
> > >         OS/Version: Linux
> > >               Tree: Mainline
> > >             Status: NEW
> > >           Severity: normal
> > >           Priority: P1
> > >          Component: Sound(ALSA)
> > >         AssignedTo: perex@perex.cz
> > >         ReportedBy: nuclearcat@nuclearcat.com
> > > 
> > > 
> > > Latest working kernel version: unknown
> > > Earliest failing kernel version: 2.6.26-rc2
> > > Distribution: Gentoo
> > > Hardware Environment: Intel 965 chipset, Core 2 Duo
> > > Software Environment: mplayer
> > > Problem Description: Just lockdep warning in dmesg
> > 
> > This looks like a problem in the core hrtimers code?
> 
> No, this is a problem in the sound code. It uses the hrtimer callback
> mode HRTIMER_CB_IRQSAFE. This mode was never intended to provide a
> lock free call mode and was created for very fast callbacks like
> wakeups and the like. Also I find it pretty amazing that the pscp code
> has the following comment:
> 
> /*
>  * We need the hrtimer_start as a tasklet to avoid
>  * the nasty locking problem. :(
>  * The problem:
>  * - The timer handler is called with the cpu_base->lock
>  *   already held by hrtimer code.
>  * - snd_pcm_period_elapsed() takes the
>  *   substream->self_group.lock.
>  * So far so good.
>  * But the snd_pcsp_trigger() is called with the
>  * substream->self_group.lock held, and it calls
>  * hrtimer_start(), which takes the cpu_base->lock.
>  * You see the problem. We have the code pathes
>  * which take two locks in a reverse order. This
>  * can deadlock and the lock validator complains.
>  * The only solution I could find was to move the
>  * hrtimer_start() into a tasklet. -stsp
>  */
> 
> So the author was well aware of locking problem and the whole code is
> just a stupid hack around the problem without solving it. This code is
> not at all suited for HRTIMER_CB_IRQSAFE.

You sound impressed!  So what's the fix?  HRTIMER_CB_SOFTIRQ?
Comment 7 Thomas Gleixner 2008-05-16 12:34:18 UTC
On Fri, 16 May 2008, Andrew Morton wrote:
> > So the author was well aware of locking problem and the whole code is
> > just a stupid hack around the problem without solving it. This code is
> > not at all suited for HRTIMER_CB_IRQSAFE.
> 
> You sound impressed!  So what's the fix?  HRTIMER_CB_SOFTIRQ?

Yeah, impressed by creativity. HRTIMER_CB_SOFTIRQ should be the right
thing.

Thanks,
	tglx
Comment 8 Rafael J. Wysocki 2008-05-16 13:09:41 UTC

*** This bug has been marked as a duplicate of bug 10679 ***
Comment 9 Stas Sergeev 2008-05-16 13:29:40 UTC
bugme-daemon@bugzilla.kernel.org wrote:
> Yeah, impressed by creativity. HRTIMER_CB_SOFTIRQ should be the right
> thing.
The creativity is a consequence of
the lack of documentation. Besides,
it seems HRTIMER_CB_IRQSAFE does
the callback sometimes from hardirq
context and sometimes from the softirq
context. At least it was so in the past.
So the differences were not obvious from
the testings I've done back then.

Before the driver started to use the
hrtimer code, the same callback was
executed inside the timer irq handler,
and it was very important to make it
the first handler in the chain. Otherwise
the latencies were unacceptable.
So the most obvious way to convert it
was to use HRTIMER_CB_IRQSAFE, as it
seemed to me. Is there a reason, besides
locking problem, why it should not be
used for that?
I'll try HRTIMER_CB_SOFTIRQ tomorrow and
see if the latencies are tolerable, but
is there a real reason for increasing
the latencies that way?
Comment 10 Stas Sergeev 2008-05-17 04:15:08 UTC
bugme-daemon@bugzilla.kernel.org wrote:
> Yeah, impressed by creativity. HRTIMER_CB_SOFTIRQ should be the right
> thing.
Actually, HRTIMER_CB_SOFTIRQ doesn't even
help at all.
I applied the patch attached to this message,
and the bug is still there. And I don't see
the increased latencies too, just nothing
changed.
And it looks like even with HRTIMER_CB_SOFTIRQ,
the callback is still called with the
cpu_base->lock held. Am I doing something
wrong? What was supposed to change?
diff --git a/sound/drivers/pcsp/pcsp.c b/sound/drivers/pcsp/pcsp.c
index 54a1f90..1899cf0 100644
--- a/sound/drivers/pcsp/pcsp.c
+++ b/sound/drivers/pcsp/pcsp.c
@@ -96,7 +96,7 @@ static int __devinit snd_card_pcsp_probe(int devnum, struct device *dev)
 		return -EINVAL;
 
 	hrtimer_init(&pcsp_chip.timer, CLOCK_MONOTONIC, HRTIMER_MODE_REL);
-	pcsp_chip.timer.cb_mode = HRTIMER_CB_IRQSAFE;
+	pcsp_chip.timer.cb_mode = HRTIMER_CB_SOFTIRQ;
 	pcsp_chip.timer.function = pcsp_do_timer;
 
 	card = snd_card_new(index, id, THIS_MODULE, 0);
diff --git a/sound/drivers/pcsp/pcsp_lib.c b/sound/drivers/pcsp/pcsp_lib.c
index ac6238e..c30820d 100644
--- a/sound/drivers/pcsp/pcsp_lib.c
+++ b/sound/drivers/pcsp/pcsp_lib.c
@@ -9,7 +9,6 @@
 #include <linux/module.h>
 #include <linux/moduleparam.h>
 #include <sound/pcm.h>
-#include <linux/interrupt.h>
 #include <asm/io.h>
 #include "pcsp.h"
 
@@ -18,31 +17,6 @@ module_param(nforce_wa, bool, 0444);
 MODULE_PARM_DESC(nforce_wa, "Apply NForce chipset workaround "
 		"(expect bad sound)");
 
-static void pcsp_start_timer(unsigned long dummy)
-{
-	hrtimer_start(&pcsp_chip.timer, ktime_set(0, 0), HRTIMER_MODE_REL);
-}
-
-/*
- * We need the hrtimer_start as a tasklet to avoid
- * the nasty locking problem. :(
- * The problem:
- * - The timer handler is called with the cpu_base->lock
- *   already held by hrtimer code.
- * - snd_pcm_period_elapsed() takes the
- *   substream->self_group.lock.
- * So far so good.
- * But the snd_pcsp_trigger() is called with the
- * substream->self_group.lock held, and it calls
- * hrtimer_start(), which takes the cpu_base->lock.
- * You see the problem. We have the code pathes
- * which take two locks in a reverse order. This
- * can deadlock and the lock validator complains.
- * The only solution I could find was to move the
- * hrtimer_start() into a tasklet. -stsp
- */
-static DECLARE_TASKLET(pcsp_start_timer_tasklet, pcsp_start_timer, 0);
-
 enum hrtimer_restart pcsp_do_timer(struct hrtimer *handle)
 {
 	unsigned long flags;
@@ -64,8 +38,6 @@ enum hrtimer_restart pcsp_do_timer(struct hrtimer *handle)
 		return HRTIMER_RESTART;
 	}
 
-	/* hrtimer calls us from both hardirq and softirq contexts,
-	 * so irqsave :( */
 	spin_lock_irqsave(&chip->substream_lock, flags);
 	/* Takashi Iwai says regarding this extra lock:
 
@@ -167,7 +139,7 @@ static void pcsp_start_playing(struct snd_pcsp *chip)
 	atomic_set(&chip->timer_active, 1);
 	chip->thalf = 0;
 
-	tasklet_schedule(&pcsp_start_timer_tasklet);
+	hrtimer_start(&pcsp_chip.timer, ktime_set(0, 0), HRTIMER_MODE_REL);
 }
 
 static void pcsp_stop_playing(struct snd_pcsp *chip)
Comment 11 Adrian Bunk 2008-05-17 04:31:28 UTC
This bug is not the same as #10679.
Comment 12 Takashi Iwai 2008-05-17 06:46:05 UTC
At Fri, 16 May 2008 21:32:49 +0200 (CEST),
Thomas Gleixner wrote:
> 
> On Fri, 16 May 2008, Andrew Morton wrote:
> > > So the author was well aware of locking problem and the whole code is
> > > just a stupid hack around the problem without solving it. This code is
> > > not at all suited for HRTIMER_CB_IRQSAFE.
> > 
> > You sound impressed!  So what's the fix?  HRTIMER_CB_SOFTIRQ?
> 
> Yeah, impressed by creativity. HRTIMER_CB_SOFTIRQ should be the right
> thing.

Well, it'd be basically a similar way like snd-pcsp currently does ("a
stupid hack" :)  But, it's good to have a fix, anyway, since this
sounds like a generic problem with a callback in a spinlock.  If the
callback requires another own lock, this can easily lead to a AB/BA
deadlock.  Actually, ALSA PCM core had sometimes similar problems,
too.


thanks,

Takashi
Comment 13 Thomas Gleixner 2008-05-17 08:25:18 UTC
On Sat, 17 May 2008, Takashi Iwai wrote:
> At Fri, 16 May 2008 21:32:49 +0200 (CEST),
> Thomas Gleixner wrote:
> > 
> > On Fri, 16 May 2008, Andrew Morton wrote:
> > > > So the author was well aware of locking problem and the whole code is
> > > > just a stupid hack around the problem without solving it. This code is
> > > > not at all suited for HRTIMER_CB_IRQSAFE.
> > > 
> > > You sound impressed!  So what's the fix?  HRTIMER_CB_SOFTIRQ?
> > 
> > Yeah, impressed by creativity. HRTIMER_CB_SOFTIRQ should be the right
> > thing.
> 
> Well, it'd be basically a similar way like snd-pcsp currently does ("a
> stupid hack" :)  But, it's good to have a fix, anyway, since this
> sounds like a generic problem with a callback in a spinlock.  If the
> callback requires another own lock, this can easily lead to a AB/BA
> deadlock.  Actually, ALSA PCM core had sometimes similar problems,
> too.

HRTIMER_CB_SOFTIRQ is not a stupid hack :) It's the default for
hrtimers and it does not hold any locks when calling the callback. We
really want to avoid tons of callbacks in the timer interrupt itself.

Thanks,
	tglx
Comment 14 Stas Sergeev 2008-05-17 11:15:12 UTC
bugme-daemon@bugzilla.kernel.org wrote:
> Actually, HRTIMER_CB_SOFTIRQ doesn't even
> help at all.
> I applied the patch attached to this message,
> and the bug is still there.
No, that was the wrong testing.
The patch actually worked. :)
Here's the updated version.
diff --git a/sound/drivers/pcsp/pcsp.c b/sound/drivers/pcsp/pcsp.c
index 54a1f90..1899cf0 100644
--- a/sound/drivers/pcsp/pcsp.c
+++ b/sound/drivers/pcsp/pcsp.c
@@ -96,7 +96,7 @@ static int __devinit snd_card_pcsp_probe(int devnum, struct device *dev)
 		return -EINVAL;
 
 	hrtimer_init(&pcsp_chip.timer, CLOCK_MONOTONIC, HRTIMER_MODE_REL);
-	pcsp_chip.timer.cb_mode = HRTIMER_CB_IRQSAFE;
+	pcsp_chip.timer.cb_mode = HRTIMER_CB_SOFTIRQ;
 	pcsp_chip.timer.function = pcsp_do_timer;
 
 	card = snd_card_new(index, id, THIS_MODULE, 0);
diff --git a/sound/drivers/pcsp/pcsp_lib.c b/sound/drivers/pcsp/pcsp_lib.c
index ac6238e..c30820d 100644
--- a/sound/drivers/pcsp/pcsp_lib.c
+++ b/sound/drivers/pcsp/pcsp_lib.c
@@ -9,7 +9,6 @@
 #include <linux/module.h>
 #include <linux/moduleparam.h>
 #include <sound/pcm.h>
-#include <linux/interrupt.h>
 #include <asm/io.h>
 #include "pcsp.h"
 
@@ -18,34 +17,8 @@ module_param(nforce_wa, bool, 0444);
 MODULE_PARM_DESC(nforce_wa, "Apply NForce chipset workaround "
 		"(expect bad sound)");
 
-static void pcsp_start_timer(unsigned long dummy)
-{
-	hrtimer_start(&pcsp_chip.timer, ktime_set(0, 0), HRTIMER_MODE_REL);
-}
-
-/*
- * We need the hrtimer_start as a tasklet to avoid
- * the nasty locking problem. :(
- * The problem:
- * - The timer handler is called with the cpu_base->lock
- *   already held by hrtimer code.
- * - snd_pcm_period_elapsed() takes the
- *   substream->self_group.lock.
- * So far so good.
- * But the snd_pcsp_trigger() is called with the
- * substream->self_group.lock held, and it calls
- * hrtimer_start(), which takes the cpu_base->lock.
- * You see the problem. We have the code pathes
- * which take two locks in a reverse order. This
- * can deadlock and the lock validator complains.
- * The only solution I could find was to move the
- * hrtimer_start() into a tasklet. -stsp
- */
-static DECLARE_TASKLET(pcsp_start_timer_tasklet, pcsp_start_timer, 0);
-
 enum hrtimer_restart pcsp_do_timer(struct hrtimer *handle)
 {
-	unsigned long flags;
 	unsigned char timer_cnt, val;
 	int periods_elapsed;
 	u64 ns;
@@ -64,9 +37,7 @@ enum hrtimer_restart pcsp_do_timer(struct hrtimer *handle)
 		return HRTIMER_RESTART;
 	}
 
-	/* hrtimer calls us from both hardirq and softirq contexts,
-	 * so irqsave :( */
-	spin_lock_irqsave(&chip->substream_lock, flags);
+	spin_lock_irq(&chip->substream_lock);
 	/* Takashi Iwai says regarding this extra lock:
 
 	If the irq handler handles some data on the DMA buffer, it should
@@ -132,7 +105,7 @@ enum hrtimer_restart pcsp_do_timer(struct hrtimer *handle)
 		chip->period_ptr %= buffer_bytes;
 	}
 
-	spin_unlock_irqrestore(&chip->substream_lock, flags);
+	spin_unlock_irq(&chip->substream_lock);
 
 	if (!atomic_read(&chip->timer_active))
 		return HRTIMER_NORESTART;
@@ -146,7 +119,7 @@ enum hrtimer_restart pcsp_do_timer(struct hrtimer *handle)
 exit_nr_unlock2:
 	snd_pcm_stream_unlock(substream);
 exit_nr_unlock1:
-	spin_unlock_irqrestore(&chip->substream_lock, flags);
+	spin_unlock_irq(&chip->substream_lock);
 	return HRTIMER_NORESTART;
 }
 
@@ -167,7 +140,7 @@ static void pcsp_start_playing(struct snd_pcsp *chip)
 	atomic_set(&chip->timer_active, 1);
 	chip->thalf = 0;
 
-	tasklet_schedule(&pcsp_start_timer_tasklet);
+	hrtimer_start(&pcsp_chip.timer, ktime_set(0, 0), HRTIMER_MODE_REL);
 }
 
 static void pcsp_stop_playing(struct snd_pcsp *chip)
Comment 15 Takashi Iwai 2008-05-18 00:55:28 UTC
At Sat, 17 May 2008 17:24:22 +0200 (CEST),
Thomas Gleixner wrote:
> 
> On Sat, 17 May 2008, Takashi Iwai wrote:
> > At Fri, 16 May 2008 21:32:49 +0200 (CEST),
> > Thomas Gleixner wrote:
> > > 
> > > On Fri, 16 May 2008, Andrew Morton wrote:
> > > > > So the author was well aware of locking problem and the whole code is
> > > > > just a stupid hack around the problem without solving it. This code
> is
> > > > > not at all suited for HRTIMER_CB_IRQSAFE.
> > > > 
> > > > You sound impressed!  So what's the fix?  HRTIMER_CB_SOFTIRQ?
> > > 
> > > Yeah, impressed by creativity. HRTIMER_CB_SOFTIRQ should be the right
> > > thing.
> > 
> > Well, it'd be basically a similar way like snd-pcsp currently does ("a
> > stupid hack" :)  But, it's good to have a fix, anyway, since this
> > sounds like a generic problem with a callback in a spinlock.  If the
> > callback requires another own lock, this can easily lead to a AB/BA
> > deadlock.  Actually, ALSA PCM core had sometimes similar problems,
> > too.
> 
> HRTIMER_CB_SOFTIRQ is not a stupid hack :) It's the default for
> hrtimers and it does not hold any locks when calling the callback. We
> really want to avoid tons of callbacks in the timer interrupt itself.

Yes, this would be an easy workaround for snd-pcsp problem, I guess.

A better fix would be to make snd-pcsp hrtimer calback just change the
pc-speaker port (0x61 and 0x42) and update the position.  These should
be really fast path.  Then, occasionally invoke a tasklet that will
call snd_pcm_period_elapsed(), which could be a (relatively) slow
path and lead to a spinlock mess.


thanks,

Takashi
Comment 16 Stas Sergeev 2008-05-18 08:54:21 UTC
bugme-daemon@bugzilla.kernel.org wrote:
> A better fix would be to make snd-pcsp hrtimer calback just change the
> pc-speaker port (0x61 and 0x42) and update the position.  These should
> be really fast path.
But even that will require the
snd_pcm_stream_lock() IIRC, because
of an access to the (fake) DMA buffer
and other stuff.
And that lock, in turn, requires the
pcsp_chip.substream_lock. Which is
exactly the lock that triggers a
warning. So this will unlikely to
help.
But I think I don't understand the
dependancy chain yet. snd-pcsp does
not call hrtimer_start() under any
lock (because of the tasklet hack).
So I don't see how it could introduce
the dependancy pcsp_chip.substream_lock ->
cpu_base->lock. Moreso, this problem
never happened since the tasklet
workaround was added, and now it started
to happen again. But the pcsp code
did not change. From the traces it
seems the dependancy is introduced
in an alsa core somewhere.
Using HRTIMER_CB_SOFTIRQ will of
course get rid of the problem by
removing the backward dependancy part,
but then we'll never know who recently
introduced a forward part.

> Then, occasionally invoke a tasklet that will
> call snd_pcm_period_elapsed(), which could be a (relatively) slow
> path and lead to a spinlock mess.
From the traces it seems the problem
triggers when the handler is just
trying to take the pcsp_chip.substream_lock.
It doesn't look like snd_pcm_period_elapsed()
gets involved, but I might be wrong...
Comment 17 Takashi Iwai 2008-05-18 10:22:58 UTC
The HRTIMER_CB_SOFTIRQ is a good workaround indeed.  So I would like to take this as a 2.6.26 fix.  Could you post this to LKML and me?

Anyway, what I meant is that the part touching I/O ports could be a faster path than softirq.  And the hrtimer code can use a bit simpler locks.  The hrtimer callback needs just one thing about DMA buffer - the buffer is allocated and the address doesn't change during the callback.  So, we'd need the lock for the hr callback and for the PCM prepare, hwparams and free callbacks, i.e. to assure that the hrtimer callback is finished when these PCM callbacks may change the DMA buffer.  Thus this lock doesn't have to be PCM substream lock.

The current lock mess may come from the PCM trigger (STOP) called from snd_pcm_period_elapsed().  So, putting snd_pcm_period_elapsed() out of hrtimer lock is anyway necessary.  A tasklet for snd_pcm_period_elapsed() is needed for this reason, too.
Comment 18 Stas Sergeev 2008-05-18 10:38:19 UTC
Created attachment 16183 [details]
patch to try

Denys, could you please check and see
if the patch fixes the problem for you?
Comment 19 Stas Sergeev 2008-05-18 10:53:27 UTC
bugme-daemon@bugzilla.kernel.org wrote:
> The HRTIMER_CB_SOFTIRQ is a good workaround indeed.  So I would like to take
> this as a 2.6.26 fix.  Could you post this to LKML and me?
OK, will do.
Lets wait for the reporter to confirm
the fix.

> Anyway, what I meant is that the part touching I/O ports could be a faster
> path
> than softirq.
Ah, so you are talking about a different
problem. Lets move it to alsa-devel then
to avoid the confusion. I'll reply to
the list.
Comment 20 Denys Fedoryshchenko 2008-05-18 10:59:31 UTC
It is not easy to trigger it again on my PC. It happens only once actually.

If you think it is fixed, maybe better if i will close bug?
Comment 21 Stas Sergeev 2008-05-18 11:07:28 UTC
Interesting. For me it became 100% reproducible
on rc2. Lets not close it then for now.
Especially, right now no patch is applied yet.
Comment 22 Adrian Bunk 2008-05-20 12:44:26 UTC
This should be fixed by commit 4b7afb0d0d23b298a7e6d30eaba0679449542d2e

Please reopen this bug if it still occurs after this commit (e.g. in 2.6.26-rc4 or later).