Bug 15244

Summary: PROBLEM: hda-intel divide by zero kernel crash in azx_position_ok()
Product: Drivers Reporter: Maciej Rutecki (maciej.rutecki)
Component: Sound(ALSA)Assignee: Takashi Iwai (tiwai)
Status: RESOLVED OBSOLETE    
Severity: normal CC: alan, florian, jody, perex, rjw
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 2.6.33-rc6 Subsystem:
Regression: Yes Bisected commit-id:
Bug Depends on:    
Bug Blocks: 14885    
Attachments: Test Patch

Description Maciej Rutecki 2010-02-07 07:46:05 UTC
Subject    : PROBLEM: hda-intel divide by zero kernel crash in azx_position_ok()
Submitter  : Jody@tritech <jody@nctritech.com>
Date       : 2010-02-06 0:32
References : http://marc.info/?l=linux-kernel&m=126541276028173&w=2
Handled-By : Takashi Iwai <tiwai@suse.de>

This entry is being used for tracking a regression from 2.6.32.  Please don't
close it until the problem is fixed in the mainline.
Comment 1 Maciej Rutecki 2010-02-07 08:36:46 UTC
Patch: http://marc.info/?l=linux-kernel&m=126547145614070&w=2
Comment 2 Jaroslav Kysela 2010-02-07 09:47:04 UTC
It would be better to check the real cause of this problem. azx_position_ok() is called only when azx_dev->running flag is set and it should be set only after the azx_pcm_prepare() call.
Comment 3 Rafael J. Wysocki 2010-02-07 09:57:20 UTC
Handled-By : Jody Bruchon <jody@nctritech.com>
Comment 4 Rafael J. Wysocki 2010-02-07 09:58:05 UTC
Submitter  : Jody Bruchon <jody@nctritech.com>
Comment 5 Jody Lee Bruchon 2010-02-07 23:00:31 UTC
The patch I submitted only prevents the system from crashing if this bug is triggered.  The actual bug remains, and as such, the status should NOT be set to "resolved, patch available."  The program mp3blaster is what triggers the bug on my machine, and it uses the old OSS interface instead of ALSA; other apps such as Rhythmbox, Xine, Swiftfox+Flash, etc. which probably use ALSA do not seem to trigger the bug.  I suspect it is some unique interaction related to OSS because of this, but I have no idea if that is actually the case.  As I am not a very experienced coder, I don't know how to go about finding the source of the problem myself.
Comment 6 Rafael J. Wysocki 2010-02-08 00:04:25 UTC
Ignore-Patch : http://marc.info/?l=linux-kernel&m=126547145614070&w=2
Comment 7 Jaroslav Kysela 2010-02-27 10:39:55 UTC
Jody, could you attach full oops trace to this bug without the period_size == 0 check? Or you may just add dump_stack() call to the period_size == 0 condition to see the call path.
Comment 8 Jody Lee Bruchon 2010-02-27 14:42:40 UTC
divide error: 0000 [#1] PREEMPT SMP
Modules linked in:

Pid: 0, comm: swapper Not tainted 2.6.33-rc6 #6 M3A78-CM/System Product Name
EIP: 0060:[<c13b3b66>] EFLAGS: 00010246 CPU:3
EIP is at azx_position_ok+0x46/0xb0
EAX: 00000000 EBX: f6ecc150 ECX: 00000000 EDX: 00000000
ESI: f6e2e600 EDI: 00000150 EBP: f748de78 ESP: f748de6c
 DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068
Process swapper (pid: 0, ti=f748c000 task=f7457900 task.ti=f748c000)
Stack:
 f748de78 f6ecc150 00000004 f748de9c c13b5d64 f6e2e600 f6e2e72c f6e2e630
<0> 80000010 f75cde40 00000000 00000000 f748dec0 c1062b66 a0cd6e00 00000019
<0> 000344c3 00000010 f76fe180 f76fe1bc 00000010 f748ded8 c1064a43 f75cd040
Call trace:
 [<c13b5d64>] ? azx_interrupt+0x84/0x160
 [<c1062b66>] ? handle_IRQ_event+0x36/0xc0
 [<c1064a43>] ? handle_fasteoi_irq+0x63/0xd0
 [<c1005298>] ? handle_irq+0x18/0x30
 [<c100480a>] ? do_IRQ+0x4a/0xc0
 [<c104e982>] ? hrtimer_start+0x22/0x30
 [<c104e982>] ? common_interrupt+0x30/0x38
 [<c100a5da>] ? default_idle+0x4a/0x50
 [<c100a7f7>] ? c1e_idle+0x47/0xf0
 [<c1001cc7>] ? cpu_idle+0x87/0xe0
 [<c1637ec4>] ? start_secondary+0x1a0/0x1a7
Code: 53 50 89 da e8 8c fe ff 8b 8e 14 01 00 00 85 c9 74 32 8b 56 08 b9 01 00 00 00 8b 14 95 60 d7 5f c1 85 d2 74 11 8b 4b 24 31 d2 <f7> f1 d1 e9 39 ca 0f 96 c1 0f b6 c9 83 c4 04 89 c8 5b 5e 5d c3
EIP: [<c13b3b66>] azx_position_ok+0x46/0xb0 SS:ESP 0068:f748de6c
---[ end trace a42adfe86b8970e2 ]---
Kernel panic - not syncing: Fatal exception in interrupt
Pid: 0, comm: swapper Tainted: G      D    2.6.33-rc6 #6
Call trace:
 [<c1430fcc>] ? printk+0x18/0x1a
 [<c1430fcc>] panic+0x4d/0xf6
 [<c100634f>] oops_end+0x8f/0x90
 [<c100650f>] die+0x4f/0x70
 [<c1003516>] do_trap+0x96/0xc0
 [<c1003a10>] ? do_divide_error+0x0/0xa0
 [<c1003a93>] do_divide_error+0x83/0xa0
 [<c13b3b66>] ? azx_position_ok+0x46/0xb0
 [<c10186f6>] ? lapic_next_event+0x16/0x20
 [<c10564d6>] ? clockevents_program_event+0x86/0x140
 [<c1009b37>] ? native_sched_clock+0x27/0xa0
 [<c104ffb4>] ? sched_clock_local+0xa4/0x180
 [<c143349c>] ? _raw_spin_lock_irqsave+0x1c/0x40
 [<c1433611>] ? _raw_spin_unlock_irqrestore+0x11/0x30
 [<c104e57a>] ? hrtimer_get_next_event+0x10a/0x160
 [<c143416b>] error_code+0x73/0x78
 [<c13b3b66>] ? azx_position_ok+0x46/0xb0
 [<c13b5d64>] ? azx_interrupt+0x84/0x160
 [<c1062b66>] ? handle_IRQ_event+0x36/0xc0
 [<c1064a43>] ? handle_fasteoi_irq+0x63/0xd0
 [<c1005298>] ? handle_irq+0x18/0x30
 [<c100480a>] ? do_IRQ+0x4a/0xc0
 [<c104e982>] ? hrtimer_start+0x22/0x30
 [<c104e982>] ? common_interrupt+0x30/0x38
 [<c100a5da>] ? default_idle+0x4a/0x50
 [<c100a7f7>] ? c1e_idle+0x47/0xf0
 [<c1001cc7>] ? cpu_idle+0x87/0xe0
 [<c1637ec4>] ? start_secondary+0x1a0/0x1a7
Comment 9 Jody Lee Bruchon 2010-02-27 14:52:13 UTC
I have also found the following report from 42 days ago which shows a similar problem in the same code:

http://sourceforge.net/projects/mp3blaster/forums/forum/518189/topic/3521269

[ 108.168858] divide error: 0000 [#1] SMP
[ 108.172941] last sysfs file: /sys/power/state
[ 108.177425] CPU 1
[ 108.178844] Modules linked in: ppdev vboxnetadp vboxnetflt vboxdrv bridge stp snd_hda_codec_atihdmi snd_hda_codec_via snd_hda_intel snd_hda_codec snd_hwdep snd_pcm_oss snd_mixer_oss snd_pcm snd_seq_dummy snd_sep
[ 108.178844] Pid: 0, comm: swapper Tainted: G M 2.6.31-17-generic #54-Ubuntu System Product Name
[ 108.178844] RIP: 0010:[<ffffffffa02d61bd>] [<ffffffffa02d61bd>] azx_position_ok+0x4d/0xc0 [snd_hda_intel]
[ 108.178844] RSP: 0018:ffff880028055e58 EFLAGS: 00010246
[ 108.178844] RAX: 0000000000000000 RBX: ffff8802351635e0 RCX: 0000000000000000
[ 108.178844] RDX: 0000000000000000 RSI: ffff8802351635e0 RDI: 0000000000000000
[ 108.178844] RBP: ffff880028055e68 R08: 0000000000000008 R09: ffff880237091018
[ 108.178844] R10: 0000000000000000 R11: 0000000000000001 R12: ffff88023218d400
[ 108.178844] R13: ffff88023218d400 R14: 0000000080000010 R15: 000000000000001c
[ 108.178844] FS: 0000000000000000(0000) GS:ffff880028052000(0000) knlGS:0000000000000000
[ 108.178844] CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b
[ 108.178844] CR2: 00007ff502cd5380 CR3: 0000000222091000 CR4: 00000000000406e0
[ 108.178844] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 108.178844] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[ 108.178844] Process swapper (pid: 0, threadinfo ffff8802370b4000, task ffff88023708c410)
[ 108.178844] Stack:
[ 108.178844] ffff8802351635e0 0000000000000004 ffff880028055ec8 ffffffffa02d645a
[ 108.178844] <0> ffff880028055e98 ffffffff81082137 ffff88023218d5a8 ffff88023218d444
[ 108.178844] <0> ffff880028055eb8 ffff8802324d1480 0000000000000000 0000000000000000
[ 108.178844] Call Trace:
[ 108.178844] <IRQ>
[ 108.178844] [<ffffffffa02d645a>] azx_interrupt+0xaa/0x1a0 [snd_hda_intel]
[ 108.178844] [<ffffffff81082137>] ? getnstimeofday+0x57/0xe0
[ 108.178844] [<ffffffff810b3aa8>] handle_IRQ_event+0x58/0x160
[ 108.178844] [<ffffffff81082137>] ? getnstimeofday+0x57/0xe0
[ 108.390930] [<ffffffff810b5c00>] handle_fasteoi_irq+0x80/0x100
Comment 10 Jaroslav Kysela 2010-02-28 07:41:25 UTC
Created attachment 25264 [details]
Test Patch
Comment 11 Jaroslav Kysela 2010-02-28 07:42:34 UTC
Thank you. Could you apply the patch in comment#10 and attach the dmesg output when the problem occurs?
Comment 12 Jody Lee Bruchon 2010-03-01 00:13:49 UTC
I just moved to a new but similar system yesterday; I still have access to the old hardware and when I get an opportunity I will test this on BOTH systems to see what happens, and report back.  It may take a few days, so please bear with me.
Comment 13 Jody Lee Bruchon 2010-03-18 17:30:52 UTC
FYI: have not forgotten or abandoned this issue.  I will be testing on multiple machines, particularly AMD chipset machines, to see how far this error can possibly reach.
Comment 14 Florian Mickler 2012-03-13 23:08:58 UTC
Still a problem in current linux kernels (>=3.2)?