Bug 42615

Summary: BUG: scheduling while atomic: swapper
Product: Process Management Reporter: Yves-Alexis Perez (corsac)
Component: SchedulerAssignee: Ingo Molnar (mingo)
Status: RESOLVED CODE_FIX    
Severity: normal CC: jrnieder, kenyon
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 3.2 Subsystem:
Regression: No Bisected commit-id:

Description Yves-Alexis Perez 2012-01-20 16:07:55 UTC
Hey,

I have an issue with kernel 3.2 on my Thinkpad x201s. I'm running Debian and first noticed the issue with the upload of 3.2.1 to unstable, but managed to reproduce using vanilla 3.2.1.

When it happens, I have:



Jan 20 13:19:29 scapa kernel: [   62.622372] BUG: scheduling while atomic: swapper/0/0/0x10000100
Jan 20 13:19:29 scapa kernel: [   62.622379] Modules linked in: bnep rfcomm bluetooth deflate zlib_deflate ctr twofish_generic twofish_x86_64_3way twofish_x86_64 nf_conntrack_ipv6 twofish_common nf_defrag_ipv6 camellia ip6table_filter serpent ip6_tables blowfish_generic blowfish_x86_64 nf_conntrack_ipv4 blowfish_common nf_defrag_ipv4 cast5 xt_state nf_conntrack des_generic xcbc rmd160 sha512_generic iptable_filter ip_tables crypto_null x_tables af_key ums_realtek usb_storage uas ext2 acpi_cpufreq mperf snd_hda_codec_hdmi arc4 snd_hda_codec_conexant iwlwifi mac80211 snd_hda_intel snd_pcsp snd_hda_codec snd_hwdep snd_pcm_oss snd_pcm snd_timer e1000e cfg80211 thinkpad_acpi ehci_hcd snd usbcore psmouse serio_raw tpm_tis ac tpm i2c_i801 intel_ips rfkill snd_page_alloc wmi soundcore usb_common tpm_bios battery nvram power_supply processor evdev ext4 mbcache jbd2 crc16 xts gf128mul dm_crypt dm_mod aesni_intel cryptd aes_x86_64 sd_mod crc_t10dif i915 ahci libahci libata drm_kms_helper drm scsi_mod i2c_algo_bit thermal 
Jan 20 13:19:29 scapa kernel: button i2c_core video thermal_sys
Jan 20 13:19:29 scapa kernel: [   62.622489] CPU 0 
Jan 20 13:19:29 scapa kernel: [   62.622490] Modules linked in: bnep rfcomm bluetooth deflate zlib_deflate ctr twofish_generic twofish_x86_64_3way twofish_x86_64 nf_conntrack_ipv6 twofish_common nf_defrag_ipv6 camellia ip6table_filter serpent ip6_tables blowfish_generic blowfish_x86_64 nf_conntrack_ipv4 blowfish_common nf_defrag_ipv4 cast5 xt_state nf_conntrack des_generic xcbc rmd160 sha512_generic iptable_filter ip_tables crypto_null x_tables af_key ums_realtek usb_storage uas ext2 acpi_cpufreq mperf snd_hda_codec_hdmi arc4 snd_hda_codec_conexant iwlwifi mac80211 snd_hda_intel snd_pcsp snd_hda_codec snd_hwdep snd_pcm_oss snd_pcm snd_timer e1000e cfg80211 thinkpad_acpi ehci_hcd snd usbcore psmouse serio_raw tpm_tis ac tpm i2c_i801 intel_ips rfkill snd_page_alloc wmi soundcore usb_common tpm_bios battery nvram power_supply processor evdev ext4 mbcache jbd2 crc16 xts gf128mul dm_crypt dm_mod aesni_intel cryptd aes_x86_64 sd_mod crc_t10dif i915 ahci libahci libata drm_kms_helper drm scsi_mod i2c_algo_bit thermal 
Jan 20 13:19:29 scapa kernel: button i2c_core video thermal_sys
Jan 20 13:19:29 scapa kernel: [   62.622530] 
Jan 20 13:19:29 scapa kernel: [   62.622532] Pid: 0, comm: swapper/0 Not tainted 3.2.1 #1 LENOVO 51434JG/51434JG
Jan 20 13:19:29 scapa kernel: [   62.622535] RIP: 0010:[<ffffffffa01ee398>]  [<ffffffffa01ee398>] arch_local_irq_enable+0x4/0x8 [processor]
Jan 20 13:19:29 scapa kernel: [   62.622544] RSP: 0018:ffffffff81601e90  EFLAGS: 00000206
Jan 20 13:19:29 scapa kernel: [   62.622546] RAX: 0000000000000000 RBX: ffffffff81037eb9 RCX: 00000000000000be
Jan 20 13:19:29 scapa kernel: [   62.622547] RDX: 00000000000000be RSI: 0000000000000000 RDI: 00000000000003e8
Jan 20 13:19:29 scapa kernel: [   62.622549] RBP: 0000000000000001 R08: ffffffff81601fd8 R09: 00000000000000bd
Jan 20 13:19:29 scapa kernel: [   62.622551] R10: ffff88012cd39ec0 R11: ffffffff8160d020 R12: ffff88012cdb09e8
Jan 20 13:19:29 scapa kernel: [   62.622552] R13: ffffffff81014acf R14: 0000000000001000 R15: ffff880137c13500
Jan 20 13:19:29 scapa kernel: [   62.622554] FS:  0000000000000000(0000) GS:ffff880137c00000(0000) knlGS:0000000000000000
Jan 20 13:19:29 scapa kernel: [   62.622556] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
Jan 20 13:19:29 scapa kernel: [   62.622558] CR2: 00007fcf271383cc CR3: 000000012f48c000 CR4: 00000000000006f0
Jan 20 13:19:29 scapa kernel: [   62.622560] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Jan 20 13:19:29 scapa kernel: [   62.622561] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Jan 20 13:19:29 scapa kernel: [   62.622564] Process swapper/0 (pid: 0, threadinfo ffffffff81600000, task ffffffff8160d020)
Jan 20 13:19:29 scapa kernel: [   62.622565] Stack:
Jan 20 13:19:29 scapa kernel: [   62.622566]  ffffffffa01ef0e9 0000000000000000 00000000000000be 0000000000000000
Jan 20 13:19:29 scapa kernel: [   62.622570]  00000000000000be ffff88012cfaf020 ffffffffa01f2190 00000000fffffff0
Jan 20 13:19:29 scapa kernel: [   62.622573]  0000000000000001 ffffffffa01f21f8 ffffffff8125d65b ffffffff81601fd8
Jan 20 13:19:29 scapa kernel: [   62.622575] Call Trace:
Jan 20 13:19:29 scapa kernel: [   62.622581]  [<ffffffffa01ef0e9>] ? acpi_idle_enter_c1+0x8d/0xb3 [processor]
Jan 20 13:19:29 scapa kernel: [   62.622589]  [<ffffffff8125d65b>] ? cpuidle_idle_call+0xec/0x179
Jan 20 13:19:29 scapa kernel: [   62.622595]  [<ffffffff8100d250>] ? cpu_idle+0xa1/0xe8
Jan 20 13:19:29 scapa kernel: [   62.622599]  [<ffffffff816a7b3d>] ? start_kernel+0x3bd/0x3c8
Jan 20 13:19:29 scapa kernel: [   62.622602]  [<ffffffff816a7140>] ? early_idt_handlers+0x140/0x140
Jan 20 13:19:29 scapa kernel: [   62.622604]  [<ffffffff816a73c4>] ? x86_64_start_kernel+0x104/0x111
Jan 20 13:19:29 scapa kernel: [   62.622606] Code: c0 89 c5 74 04 80 63 1c fb 48 83 c4 38 89 e8 5b 5d 41 5c 41 5d c3 90 0f 09 66 66 90 66 90 c3 fa 66 66 90 66 66 90 c3 fb 66 66 90 <66> 66 90 c3 48 8b 15 fd ee 53 e1 48 8d 42 fd 48 83 f8 01 0f 96 
Jan 20 13:19:29 scapa kernel: [   62.622628] Call Trace:
Jan 20 13:19:29 scapa kernel: [   62.622632]  [<ffffffffa01ef0e9>] ? acpi_idle_enter_c1+0x8d/0xb3 [processor]
Jan 20 13:19:29 scapa kernel: [   62.622636]  [<ffffffff8125d65b>] ? cpuidle_idle_call+0xec/0x179
Jan 20 13:19:29 scapa kernel: [   62.622638]  [<ffffffff8100d250>] ? cpu_idle+0xa1/0xe8
Jan 20 13:19:29 scapa kernel: [   62.622641]  [<ffffffff816a7b3d>] ? start_kernel+0x3bd/0x3c8
Jan 20 13:19:29 scapa kernel: [   62.622643]  [<ffffffff816a7140>] ? early_idt_handlers+0x140/0x140
Jan 20 13:19:29 scapa kernel: [   62.622646]  [<ffffffff816a73c4>] ? x86_64_start_kernel+0x104/0x111

I tried to bisect but the first bad commit is a merge commit so I'm not too sure:


corsac@scapa: git bisect log
git bisect start
# bad: [b8ed9e5b8c34dc9fb1882669e45b21e3d0194881] Linux 3.2.1
git bisect bad b8ed9e5b8c34dc9fb1882669e45b21e3d0194881
# good: [c3b92c8787367a8bb53d57d9789b558f1295cc96] Linux 3.1
git bisect good c3b92c8787367a8bb53d57d9789b558f1295cc96
# bad: [68d99b2c8efcb6ed3807a55569300c53b5f88be5] Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound
git bisect bad 68d99b2c8efcb6ed3807a55569300c53b5f88be5
# bad: [efb8d21b2c6db3497655cc6a033ae8a9883e4063] Merge branch 'tty-next' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty
git bisect bad efb8d21b2c6db3497655cc6a033ae8a9883e4063
# good: [094daf7db7c47861009899ce23f9177d761e20b0] Merge branch 'master' of git://git.infradead.org/users/linville/wireless-next into for-davem
git bisect good 094daf7db7c47861009899ce23f9177d761e20b0
# good: [1be025d3cb40cd295123af2c394f7229ef9b30ca] Merge branch 'usb-next' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb
git bisect good 1be025d3cb40cd295123af2c394f7229ef9b30ca
# good: [4e7e2a2008f5d8c49791c412849d5b0232d39bb3] Merge branch 'for-linus' of git://opensource.wolfsonmicro.com/regmap
git bisect good 4e7e2a2008f5d8c49791c412849d5b0232d39bb3
# good: [1442d1678ca7e53574fd403ba7bee6f4125d920c] Merge branch 'for-3.2' of git://linux-nfs.org/~bfields/linux
git bisect good 1442d1678ca7e53574fd403ba7bee6f4125d920c
# good: [3cb603284b3d256ae9ae9e65887cee8416bfef15] Merge branch 'hwmon-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging
git bisect good 3cb603284b3d256ae9ae9e65887cee8416bfef15
# good: [51808f051ede81865b7af351d6c230a1ac244a22] max3110: add sysrq support
git bisect good 51808f051ede81865b7af351d6c230a1ac244a22
# good: [f37ac5a144027cddfcf1dfab30eb7c2ba765f5ca] tty/n_gsm: fix a bug in gsm_dlci_data_output (adaption = 2 case)
git bisect good f37ac5a144027cddfcf1dfab30eb7c2ba765f5ca
# good: [1177c0efc04d032644895b8d757f55b433912596] TTY: pty, release tty in all ptmx_open fail paths
git bisect good 1177c0efc04d032644895b8d757f55b433912596
# good: [4cbf9f4864bd4fb2e64d555aacb93c24478e4e8d] tty/serial: atmel_serial: auto-enumerate ports
git bisect good 4cbf9f4864bd4fb2e64d555aacb93c24478e4e8d
# good: [a0340703981baa6cc1e9c7c768095a0a4e718daf] Revert "TTY: call tty_driver_lookup_tty unconditionally"
git bisect good a0340703981baa6cc1e9c7c768095a0a4e718daf
# good: [d208a3bf77f902283894f546b6b5383202cf7882] TTY: serial_core: Fix crash if DCD drop during suspend
git bisect good d208a3bf77f902283894f546b6b5383202cf7882

corsac@scapa: git bisect good
efb8d21b2c6db3497655cc6a033ae8a9883e4063 is the first bad commit

I'm not sure this is the correct component to report against, that looks like scheduling but the “swapper” thing might make it more memory related.

If you need any information, please ask.
Comment 1 Yves-Alexis Perez 2012-01-23 14:59:55 UTC
Further debugging and test on bisected kernels can be found on https://bugs.launchpad.net/linux/+bug/917962 too.
Comment 2 Yves-Alexis Perez 2012-01-25 21:19:05 UTC
I've bisected a bit more and found the guilty commit:

a2c76b83fdd763c826f38a55127ccf25708099ce is the first bad commit
commit a2c76b83fdd763c826f38a55127ccf25708099ce
Author: Kuninori Morimoto <kuninori.morimoto.gx@renesas.com>
Date: Tue Oct 18 20:05:50 2011 -0700

    usb: renesas_usbhs: fixup inconsistent return from usbhs_pkt_push()

    usbhs_pkt_push() had inconsistent return under spin lock.
    This patch fix it up.
    Special thanks to Dan

    Cc: Dan Carpenter <dan.carpenter@oracle.com>
    Signed-off-by: Kuninori Morimoto <kuninori.morimoto.gx@renesas.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

:040000 040000 08d80fa06121151b36ae17dc693c14d58c0d5db3 86b920618ceb575cebf610798c24a104dea36a2c M	drivers

I've confirmed that reverting this one on top of 3.2.1 removes the BUG().

(does someone reads this bugzilla?)
Comment 3 Jonathan Nieder 2012-02-03 18:02:24 UTC
(In reply to comment #2)

>     usb: renesas_usbhs: fixup inconsistent return from usbhs_pkt_push()
[...]
> I've confirmed that reverting this one on top of 3.2.1 removes the BUG().
> 
> (does someone reads this bugzilla?)

No, nobody reads bugzilla. :)  It's way better to contact subsystem mailing lists, cc-ing people in the sign-off chain and linux-kernel@vger for regressions.

Could you attach your .config?  I'm puzzled that the relevant code does not seem to have even been loaded when you got the BUG (renesas_usbhs did not appear in the module list), so I'm worried that the bisection result might not be reproducible.
Comment 4 Jonathan Nieder 2012-02-03 18:07:09 UTC
"scheduling while atomic" points to a driver bug. I hear that for historical reasons idle tasks are named 'swapper' and the name does not signify more than that. I can confirm that there have been several reports of 3.1 -> 3.2 regressions involving "scheduling while atomic", and the only relevant fix I know about is commit b3ef051db763 (USB: Realtek cr: fix autopm scheduling while atomic, 2012-01-26).

See <http://bugs.debian.org/656724>, <http://bugs.debian.org/657526>, <http://bugs.debian.org/657612> for some examples.
Comment 5 Yves-Alexis Perez 2012-02-03 18:19:54 UTC
Yeah, in the end I tried to mail linux-kernel and in the end it was indeed the realtek issue, thanks for replying and sorry for not adding a comment here when I got the solution.