Bug 111481 - [Asus T100 regression] Call trace with snd_soc_sst_mfld_ platform when system suspend to freeze
Summary: [Asus T100 regression] Call trace with snd_soc_sst_mfld_ platform when syste...
Status: NEW
Alias: None
Product: Drivers
Classification: Unclassified
Component: Sound(ALSA) (show other bugs)
Hardware: All Linux
: P1 high
Assignee: Jaroslav Kysela
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2016-01-29 15:06 UTC by wendy.wang
Modified: 2018-03-05 13:03 UTC (History)
8 users (show)

See Also:
Kernel Version: 4.5-rc1
Subsystem:
Regression: Yes
Bisected commit-id:


Attachments
snd_soc_sst_mfld_ platform related call trace (2.16 MB, image/jpeg)
2016-01-29 15:06 UTC, wendy.wang
Details
kernel configure (116.89 KB, application/octet-stream)
2016-01-29 15:07 UTC, wendy.wang
Details
config-mainline-byt-disable-brcmcfac-disable-ALSA.txt (112.69 KB, text/plain)
2016-02-01 06:56 UTC, wendy.wang
Details
Fix suspend regression on Asus T100TA (2.60 KB, patch)
2016-08-15 19:47 UTC, Dmity Karikh
Details | Diff
Fix sound regression on Asus T100TA (3.82 KB, patch)
2016-08-16 21:28 UTC, Dmity Karikh
Details | Diff
asoc: intel: flip Preferred byt driver from sst-ipc-acpi to soc-intel-sst-acpi (DO NOT UPSTREAM) (4.10 KB, patch)
2017-03-18 17:03 UTC, Hans de Goede
Details | Diff

Description wendy.wang 2016-01-29 15:06:04 UTC
Created attachment 202271 [details]
snd_soc_sst_mfld_ platform related call trace

This is regression bug.
mainline 4.4.0 kernel is good.
mainline 4.5.0-rc1 is failed.

BYT-T T100 failed to suspend to freeze with snd_soc_sst_mfld_ platform related call trace.
Pls refer to attached picture.

Reproduce steps:
1. Compile with attached kernel config with mainline 4.5-rc1, install the kernel onto BYT-T T100 and boot into system.
2. echo freeze > /sys/power/state
3. System show hang up with call trace.

test environment
BAY TRAIL T / T100 (Board T1)
Platform: Bay Trail T / T100TA
CPU: Atom Z3740 CPU core i5 (family 6, model 55, stepping 3)
Software
Linux distribution: Ubuntu 14.04 64 bits
Kernel 4.4.0
BIOS: T100TA.313
Comment 1 wendy.wang 2016-01-29 15:07:37 UTC
Created attachment 202281 [details]
kernel configure
Comment 2 wendy.wang 2016-02-01 06:56:37 UTC
Created attachment 202591 [details]
config-mainline-byt-disable-brcmcfac-disable-ALSA.txt
Comment 3 wendy.wang 2016-02-01 07:00:56 UTC
Disable "ALSA for SoC audio support " module when doing "make menuconfig", then re-compile mainline 4.5-rc1 kernel, can do suspend to freeze successfully via "echo freeze > /sys/power/state"

Pls refer to attached config-mainline-byt-disable-brcmcfac-disable-ALSA.txt which has disabled "ALSA for SoC audio support" in kernel configure.
Comment 4 tianye 2016-02-03 08:48:12 UTC
This issue still exists mainline kernel 4.5.0-rc2. tag: v4.5-rc2.
Comment 5 tianye 2016-02-24 03:29:59 UTC
This issue still exists mainline kernel 4.5.0-rc5. tag: v4.5-rc5.
Comment 6 Dmity Karikh 2016-08-15 19:45:56 UTC
This issue still exists mainline kernel 4.7.0. tag: v4.7.

I have found some commits related to this bug:
1) First bad commit: dc901a3541717ca4963dd017eacf50a4c954609c (reverted in 902c136fe4f72dfc2a616ad755c72f1ee407f79a)
2) Second bad commit: 902c136fe4f72dfc2a616ad755c72f1ee407f79a

This problem can be solved by reverting the second broken commit (tested on 4.5, 4.6.6 and 4.7 kernels).
Comment 7 Dmity Karikh 2016-08-15 19:47:12 UTC
Created attachment 228921 [details]
Fix suspend regression on Asus T100TA
Comment 8 Dmity Karikh 2016-08-16 09:57:35 UTC
Update: the first bad commit actually fixes the broken sound, so we still need to choose between suspend and sound.
Comment 9 Dmity Karikh 2016-08-16 21:28:29 UTC
Created attachment 229181 [details]
Fix sound regression on Asus T100TA

Found the commit which introduced a sound problem: a92ea59b74e231cc0a969afa8d71fa314d5860f2.

It can be reverted to make the sound to work after the previous patch. Hope this will help to fix this regression.
Comment 10 Pierre Bossart 2016-11-29 18:33:30 UTC
does this problem still occur with newer kernels?
Comment 11 Hans de Goede 2017-03-18 13:22:49 UTC
Hi,

(In reply to Pierre Bossart from comment #10)
> does this problem still occur with newer kernels?

I bought a 2nd hand t100ta last week to help improve linux support for baytrail devices and yes this still happens. With the right ucm files in place (from https://github.com/plbossart/UCM/tree/master/bytcr-rt5640) if sound has been playing less then 5 seconds before suspend (after 5 seconds pulse-audio closes the device) I get:

Mar 18 14:07:45 localhost.localdomain kernel: PM: Suspending system (freeze)
Mar 18 14:07:45 localhost.localdomain kernel: Suspending console(s)
Mar 18 14:07:45 localhost.localdomain kernel: intel_sst_acpi 80860F28:00: stream 1 is running, can't suspend, abort
Mar 18 14:07:45 localhost.localdomain kernel: dpm_run_callback(): acpi_subsys_suspend+0x0/0x1f returns -16
Mar 18 14:07:45 localhost.localdomain kernel: PM: Device 80860F28:00 failed to suspend: error -16
Mar 18 14:07:45 localhost.localdomain kernel: PM: Some devices failed to suspend, or early wake event detected
...
Mar 18 14:07:45 localhost.localdomain kernel: PM: Finishing wakeup.
Mar 18 14:07:45 localhost.localdomain kernel: Restarting tasks ... done.

And the system never suspends if I stop all audio and wait for 5 seconds then suspend works fine.

I've tried fixing this in various ways; Remove the .ignore_suspend flag from the dais in bytcr_rt5640.c to match the dai in byt-rt5640.c which does not have this flag. As well as looping over the pcms and calling 
snd_pcm_suspend_all() on them from prepare in sst-mfld-platform-pcm.c like is done in sound/soc/intel/haswell/sst-haswell-pcm.c.

Both methods work to some degree, both with the same end result. If I stop all audio, but do not wait for pulseaudio to close the device then suspend/resume works, but I get the following in dmesg during resume:

[  114.032504] Restarting tasks ... done.
[  114.072015] intel_sst_acpi 80860F28:00: FW sent error response 0x40010
[  114.072135]  Baytrail Audio Port: ASoC: trigger FE failed -22
[  114.072643] intel_sst_acpi 80860F28:00: FW sent error response 0x40010
[  114.072753]  Baytrail Audio Port: ASoC: trigger FE failed -22
[  116.633853] intel_sst_acpi 80860F28:00: FW sent error response 0x4000e
[  116.634164] intel_sst_acpi 80860F28:00: free stream returned err -1

If I suspend while audio is playing suspend/resume still works, but on top of the above errors during I resume I get this during suspend:

[   92.414023] Suspending console(s) (use no_console_suspend to debug)
[   92.414552] ------------[ cut here ]------------
[   92.414566] WARNING: CPU: 1 PID: 2469 at kernel/softirq.c:161 __local_bh_enable_ip+0x6b/0x80
[   92.414567] Modules linked in: bnep(E) fuse(E) xt_CHECKSUM(E) iptable_mangle(E) ipt_MASQUERADE(E) nf_nat_masquerade_ipv4(E) iptable_nat(E) nf_nat_ipv4(E) nf_nat(E) nf_conntrack_ipv4(E) nf_defrag_ipv4(E) xt_conntrack(E) nf_conntrack(E) libcrc32c(E) tun(E) bridge(E) stp(E) llc(E) ebtable_filter(E) ebtables(E) ip6table_filter(E) ip6_tables(E) g_serial(E) libcomposite(E) udc_core(E) vfat(E) fat(E) snd_soc_sst_bytcr_rt5640(OE) iTCO_wdt(E) iTCO_vendor_support(E) gpio_keys(E) asus_nb_wmi(E) asus_wmi(E) intel_rapl(E) sparse_keymap(E) intel_soc_dts_thermal(E) intel_powerclamp(E) coretemp(E) kvm_intel(E) kvm(E) irqbypass(E) intel_cstate(E) brcmfmac(E) brcmutil(E) cfg80211(E) snd_soc_rt5670(E) joydev(E) snd_soc_rt5645(E) snd_soc_rt5640(E) snd_soc_rl6231(E) snd_intel_sst_acpi(OE) snd_intel_sst_core(OE) snd_soc_sst_atom_hifi2_platform(OE)
[   92.414640]  snd_soc_sst_match(E) mei_txe(E) snd_soc_core(E) mei(E) lpc_ich(E) hci_uart(E) snd_compress(E) btbcm(E) snd_pcm_dmaengine(E) btqca(E) ac97_bus(E) phy_intel_cht_usb(E) extcon_core(E) btintel(E) snd_seq(E) snd_seq_device(E) snd_pcm(E) bluetooth(E) snd_timer(E) tpm_crb(E) soc_button_array(E) int3400_thermal(E) processor_thermal_device(E) int3403_thermal(E) intel_soc_dts_iosf(E) int3402_thermal(E) snd(E) dw_dmac(E) int3406_thermal(E) rfkill(E) dptf_power(E) acpi_thermal_rel(E) int340x_thermal_zone(E) asus_wireless(E) acpi_pad(E) soundcore(E) spi_pxa2xx_platform(E) tpm_tis(E) pwm_lpss_platform(E) pwm_lpss(E) tpm_tis_core(E) tpm(E) binfmt_misc(E) dm_crypt(E) mmc_block(E) hid_multitouch(E) i915(E) crct10dif_pclmul(E) crc32_pclmul(E) crc32c_intel(E) ghash_clmulni_intel(E) i2c_algo_bit(E) drm_kms_helper(E)
[   92.414708]  drm(E) wmi(E) video(E) i2c_hid(E) sdhci_acpi(E) sdhci(E) mmc_core(E) fjes(E) uas(E) usb_storage(E) sunrpc(E) scsi_transport_iscsi(E) i2c_dev(E)
[   92.414731] CPU: 1 PID: 2469 Comm: systemd-sleep Tainted: G        W  OE   4.11.0-rc2+ #21
[   92.414733] Hardware name: ASUSTeK COMPUTER INC. T100TA/T100TA, BIOS T100TA.304 03/14/2014
[   92.414735] Call Trace:
[   92.414747]  dump_stack+0x63/0x86
[   92.414752]  __warn+0xcb/0xf0
[   92.414756]  warn_slowpath_null+0x1d/0x20
[   92.414760]  __local_bh_enable_ip+0x6b/0x80
[   92.414764]  _raw_spin_unlock_bh+0x1a/0x20
[   92.414773]  sst_create_block+0x91/0xe0 [snd_intel_sst_core]
[   92.414779]  sst_create_block_and_ipc_msg+0x56/0x80 [snd_intel_sst_core]
[   92.414784]  sst_prepare_and_post_msg+0x1bd/0x3d0 [snd_intel_sst_core]
[   92.414791]  sst_pause_stream+0xcf/0x160 [snd_intel_sst_core]
[   92.414796]  sst_alloc_stream_mrfld+0xa8a/0xe90 [snd_intel_sst_core]
[   92.414802]  sst_register_dsp+0x20c/0x810 [snd_soc_sst_atom_hifi2_platform]
[   92.414811]  snd_soc_set_runtime_hwparams+0x11e/0x520 [snd_soc_core]
[   92.414816]  ? sst_register_dsp+0x330/0x810 [snd_soc_sst_atom_hifi2_platform]
[   92.414824]  dpcm_be_dai_trigger+0x440/0xd20 [snd_soc_core]
[   92.414828]  ? up+0x32/0x50
[   92.414837]  dpcm_be_dai_trigger+0x5a8/0xd20 [snd_soc_core]
[   92.414845]  snd_pcm_lib_mmap_iomem+0x201/0x240 [snd_pcm]
[   92.414850]  snd_pcm_lib_mmap_iomem+0xaf/0x240 [snd_pcm]
[   92.414856]  snd_pcm_mmap_data+0x5d2/0x5e0 [snd_pcm]
[   92.414861]  ? snd_pcm_stream_lock+0x31/0x50 [snd_pcm]
[   92.414867]  snd_pcm_suspend+0x32/0x50 [snd_pcm]
[   92.414873]  snd_pcm_suspend_all+0x38/0x1a0 [snd_pcm]
[   92.414880]  snd_soc_suspend+0x156/0x750 [snd_soc_core]
[   92.414885]  ? sst_register_dsp+0x330/0x810 [snd_soc_sst_atom_hifi2_platform]
[   92.414890]  sst_register_dsp+0x357/0x810 [snd_soc_sst_atom_hifi2_platform]
[   92.414896]  dpm_prepare+0x207/0x400
[   92.414901]  dpm_suspend_start+0x11/0x60
[   92.414906]  suspend_devices_and_enter+0xd9/0x6f0
[   92.414911]  pm_suspend+0x325/0x390
[   92.414915]  state_store+0x82/0xf0
[   92.414919]  kobj_attr_store+0xf/0x20
[   92.414925]  sysfs_kf_write+0x37/0x40
[   92.414929]  kernfs_fop_write+0x120/0x1b0
[   92.414934]  __vfs_write+0x37/0x160
[   92.414939]  ? selinux_file_permission+0xd7/0x110
[   92.414944]  ? security_file_permission+0x3b/0xc0
[   92.414948]  vfs_write+0xb5/0x1a0
[   92.414952]  SyS_write+0x55/0xc0
[   92.414956]  entry_SYSCALL_64_fastpath+0x1a/0xa9
[   92.414960] RIP: 0033:0x7ff2a3da1c30
[   92.414962] RSP: 002b:00007ffd97a042c8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[   92.414966] RAX: ffffffffffffffda RBX: 00007ff2a406ab38 RCX: 00007ff2a3da1c30
[   92.414968] RDX: 0000000000000004 RSI: 00005611031893b0 RDI: 0000000000000004
[   92.414970] RBP: 00007ff2a406aae0 R08: 0000561103189260 R09: 00007ff2a48b1180
[   92.414972] R10: 00005611031893b0 R11: 0000000000000246 R12: 00007ff2a406ab38
[   92.414975] R13: 0000000000001010 R14: 00007ff2a406ab38 R15: 000000000000270f
[   92.414979] ---[ end trace e4f935fa8a9c9a39 ]---

Regards,

Hans
Comment 12 Hans de Goede 2017-03-18 17:03:15 UTC
Created attachment 255339 [details]
asoc: intel: flip Preferred byt driver from sst-ipc-acpi to soc-intel-sst-acpi (DO NOT UPSTREAM)

Ok, some more data (sorta kinda), as reported by other users flipping the preferred driver for byt audio back from sst-ipc-acpi to soc-intel-sst-acpi fixes the suspend resume issue, including while playing audio.

2 things stood out:

1) In dmesg:

[  423.136241] PM: Suspending system (freeze)
[  423.136246] Suspending console(s) (use no_console_suspend to debug)
[  423.297834] PM: suspend of devices complete after 160.734 msecs
[  423.297843] PM: suspend devices took 0.162 seconds
[  423.298048] baytrail-pcm-audio baytrail-pcm-audio: dropped IPC msg RX=0, TX=2
[  423.337584] PM: late suspend of devices complete after 39.712 msecs
[  423.353896] PM: noirq suspend of devices complete after 16.274 msecs
[  423.353910] PM: suspend-to-idle
[  431.769403] Suspended for 7.704 seconds
[  431.769918] PM: resume from suspend-to-idle
[  431.805404] PM: noirq resume of devices complete after 35.233 msecs
[  432.077990] PM: early resume of devices complete after 272.208 msecs
[  432.161607] PM: resume of devices complete after 83.582 msecs
[  432.162787] PM: resume devices took 0.085 seconds
[  432.162993] PM: Finishing wakeup.
[  432.162997] Restarting tasks ... done.
[  434.504898] baytrail-pcm-audio baytrail-pcm-audio: ipc: --message timeout-- ipcx 0x126 isr 0xf0020 ipcd 0x297 imrx 0x0
[  434.504909] baytrail-pcm-audio baytrail-pcm-audio: ipc: free stream 1 failed

2) Before suspend resume left/right are swapped with the soc-intel-sst-acpi driver, after suspend/resume things are fine.

One other thing I noticed is that the 2 drivers support a different set of codecs, so maybe we need to allow building both and make the probe-function return -ENODEV in case of an unsupported codec ? Then the device-core should try the next compatible driver, that way we allow for all supported codecs to work with a single kernel build. I can whip up a patch for this if it seems like a good idea.

We could then also swap the default to soc-intel-sst-acpi only for the rt5640 codec. I wonder if this problem might be codec specific ?
Comment 13 Pierre Bossart 2017-03-20 01:37:04 UTC
There is a known fix upstream for suspend issues, see
"ASoC: Intel: boards: remove .pm_ops in all Atom/DPCM machine drivers"
I believe it's queued for 4.11

That said I have no idea what suspend should do when you're playing audio. There is no way you're going to restart from the same position on resume I can guarantee you that. I am also aware of other suspend/resume issues with Baytrail, can't recall if it's eMMC or something else.
Comment 14 Hans de Goede 2017-03-20 11:36:49 UTC
Hi,

(In reply to Pierre Bossart from comment #13)
> There is a known fix upstream for suspend issues, see
> "ASoC: Intel: boards: remove .pm_ops in all Atom/DPCM machine drivers"
> I believe it's queued for 4.11

I've done my testing with 4.11-rc2 which already has that patch, so that does not help.

Regards,

Hans
Comment 15 Hans de Goede 2017-05-24 16:05:38 UTC
Good news, I can no longer reproduce this with 4.12-rc2 (+ the i2c fixes from Linus' current master), so I believe that this can be closed now.
Comment 16 roman 2018-03-05 13:03:00 UTC
Can't reproduce it with 4.15.7 either.
I think it can be closed now.

Note You need to log in before you can comment on or make changes to this bug.