Bug 56191 - Intel Atom N2600 SoC hangs until a key is pressed every now and then when booting without acpi=off
Summary: Intel Atom N2600 SoC hangs until a key is pressed every now and then when boo...
Status: CLOSED INVALID
Alias: None
Product: ACPI
Classification: Unclassified
Component: Config-Other (show other bugs)
Hardware: All Linux
: P1 normal
Assignee: Lan Tianyu
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2013-04-03 18:46 UTC by Olivier Diotte
Modified: 2013-04-24 19:08 UTC (History)
3 users (show)

See Also:
Kernel Version: 3.9.0-rc4
Subsystem:
Regression: No
Bisected commit-id:


Attachments
Verbose boot log (32.84 KB, text/plain)
2013-04-04 22:03 UTC, Olivier Diotte (CMC)
Details
acpidump output (144.81 KB, application/octet-stream)
2013-04-04 22:03 UTC, Olivier Diotte (CMC)
Details
DSDT table (260.67 KB, text/plain)
2013-04-04 22:03 UTC, Olivier Diotte (CMC)
Details
.config for Gentoo's 3.7.10 (13.81 KB, text/plain)
2013-04-24 19:00 UTC, Olivier Diotte (CMC)
Details
Bootlog (no special option) (22.20 KB, text/plain)
2013-04-24 19:02 UTC, Olivier Diotte (CMC)
Details
Bootlog (with ignore_loglevel) (259.09 KB, text/plain)
2013-04-24 19:03 UTC, Olivier Diotte (CMC)
Details
bootlog with "full" options (340.05 KB, text/plain)
2013-04-24 19:05 UTC, Olivier Diotte (CMC)
Details
Bootlog with "full" options, no hid-core.c messages (396.89 KB, text/plain)
2013-04-24 19:06 UTC, Olivier Diotte (CMC)
Details

Description Olivier Diotte 2013-04-03 18:46:16 UTC
Overview:
ACPI won't work on this board.
I have tried Ubuntu Precise Pangolin's 3.5.7.6 kernel
I have tried recompiling that kernel from source
I have tried recompiling 3.9.0-rc4 from Linus' git tree
None will allow me to boot.
The DSDT compiles without error (though there are a few warnings)

When run with "acpi.debug_layer=0xFFFFFFFF acpi.debug_level=0xFFFFFFFF" with output going through a serial console, the boot seems to be caught in an infinite loop. It's been running for at least 2 days and I have over 700 Mb of initial log. I have stopped logging and only connect (serial console) to it from time to time to see if it will hang.

I have already reported this issue on the linux-acpi mailing-list, but getting no response I took the hint posting here might be a better idea :) .

Steps to Reproduce:
Boot without the acpi=off kernel parameter

Actual Results:
The platform always hangs before invoking login, but there doesn't seem to be a clear hang point across kernel versions. I have a screen and keyboard, a serial console and ssh access to the board. I am unable to get access through any of those (and the board won't respond to ping) without acpi=off.

I have noticed that the board will sometimes display a "udevd started" line, but even when disabling udev and booting in Ubuntu's rescue mode (kernel parameters += "nomodeset rescue") I am still unable to get a login prompt or get an ICMP reply.

Expected Results:
Boot correctly, replies to ICMP Requests/ping.

Build Date & Platform:
I have not yet found a kernel that works.

Additional Information:
I am currently reading the ACPI spec(s) and the source code in drivers/acpi/ to try to do as much as possible on my own, but any hint would be useful.
Comment 1 Lan Tianyu 2013-04-04 03:19:56 UTC
Hi Olivier:
             Your board is on the market or just developing machine?  You said "The DSDT compiles without error (though there are a few warnings)". This seems you are writing the DSDT table? Please provide the boot log when hang and ACPI DSDT table.
Comment 2 Olivier Diotte (CMC) 2013-04-04 22:02:15 UTC
Hi Lan,

The board is on the market, it is sold by AAEON (COM-CV Rev. B), we are not writing the DSDT table.

Attached are the boot log with "verbose" and without "acpi=off", the output of acpidump and the compiled (readable) DSDT table.
Comment 3 Olivier Diotte (CMC) 2013-04-04 22:03:10 UTC
Created attachment 97341 [details]
Verbose boot log
Comment 4 Olivier Diotte (CMC) 2013-04-04 22:03:36 UTC
Created attachment 97351 [details]
acpidump output
Comment 5 Olivier Diotte (CMC) 2013-04-04 22:03:59 UTC
Created attachment 97361 [details]
DSDT table
Comment 6 Olivier Diotte (CMC) 2013-04-05 13:58:18 UTC
Rereading my last comment, I see I made a confusing typo: the attached DSDT is decompiled (ergo, compiling it doesn't give errors but gives warnings). Just thought I'd mention it to avoid any ambiguity.
Comment 7 Lan Tianyu 2013-04-07 13:27:56 UTC
Hi Olivier:
      From the output of your log, I don't see any ACPI error logs and the user space also begins to work. About the "ICMP Requests/ping", I have no idea about this since there maybe a lot of reasons. How about serial console login? It doesn't work? This seems to hang in the user space. Have you tried to other distros?
Comment 8 Olivier Diotte (CMC) 2013-04-08 15:26:32 UTC
Hi Tianyu (sorry about mixing your first and last name, I forgot about the difference in name order):

I understand that you suspect the problem is not with linux-acpi. I have tried various ways of getting access to the system without acpi=off, but Ubuntu is getting in my way. I will try to get Gentoo running on the SoC and will see from there. As for the ping, based on my experiments and your input, at the moment I believe this to be because the networking scripts are not called, which would point to an early userspace failure.

I will also try reporting this bug in Ubuntu's bug tracker (maybe I should have started with that). I will update this bug if I get any relevant info on it.
Comment 9 Olivier Diotte (CMC) 2013-04-08 17:54:31 UTC
Hmm, I am still going to try using Gentoo and reporting the bug in Ubuntu's launchpad, but I am wondering: I just tried to use the Magic SysRq key (Raising Skinny Elephants Is Never Utterly Boring Boring Boring Boring). It does work with acpi=off (on the first Boring), but not otherwise, I am not sure if that is due to udev/upstart/some userland software in Ubuntu or the kernel. It is also not possible to get the “Num Lock” LED on the keyboard to light up.

I have also tried gentoo and found an interesting behaviour:
-If I boot the LiveCD as 'gentoo acpi=off', the boot goes through without any grave problems (aside from a firmware decompression error)
-If I boot the LiveCD as 'gentoo acpi=on', the boot will hang at 'waiting for uevents to be processed' until I hit a key on the keyboard at which point it will respond with some timeout error for something related to udevd trying to run modprobe (the boot process will also hang at one or more other places).

-Even more interesting is this: if I boot the LiveCD as 'gentoo acpi=on' AND hit the Caps Lock (or any other key it seems) every second, the boot proceeds without error. Shutting the SoC down requires the same repetitive hitting of keys to proceed.

Could this be a bug in udev/acpid/whatever else? I have tried to disable udev and acpid in Ubuntu's upstart (by creating a /etc/init/service.override file for each of them), but Ubuntu will still hang at the same place.

Last, but certainly not least, I have found a reproducible test for this bug:
Booting the Gentoo LiveCD as described last above (acpi=on, hitting Caps Lock every second or so), I get a shell prompt (the LiveCD autologins root).

Once there, if I run “while true; do date; done”, I see pages upon pages of output and I see at least one message for every second.

On the other hand, if I run “while true; do date; sleep 1s; done”, the output only outputs a line every time I hit a key (and the time matches the time I hit the key, not the next second to the previous line). After stopping the acpid and udev daemons (/etc/init.d/acpid stop; /etc/init.d/udev stop), the behaviour stays the same.

I will still try opening a bug in Ubuntu's bug tracker, but I am starting to wonder where the bug lies :S
Comment 10 Olivier Diotte (CMC) 2013-04-08 18:03:27 UTC
Additionally, running “while true; do date; a=$(date +%S); while [ $(date +%S) -eq $a ]; do :; done; done” outputs a message every second, while “sleep 0.001” stops the process until a key is hit.
Comment 11 Olivier Diotte 2013-04-11 22:04:24 UTC
Ok, I installed Gentoo on the system and I still get problems without acpi=off.

I am not sure why I lose the keyboard (Num Lock LED goes off) on Ubuntu, but I assume it is due to the same symptom as I get on Gentoo, which is that the system stops working without being constantly awaken by hitting keyboard keys.

I don't think this behaviour can relate to userspace daemons.
Comment 12 Olivier Diotte (CMC) 2013-04-12 16:03:52 UTC
Here is some additional information:
When I try to ssh into the SoC, the password prompt doesn't appear, no matter what I do on the client. If, on the other hand, I hit a key (Ctrl for example) on the SoC, the 'Password:' prompt appears. Every input and output needs a key to be pressed on the SoC for it to appear on the SSH client.

Interestingly, if I keep the CPU busy by running "while true; do :; done" in a tty on the SoC for example, everything appears in a timely fashion without needing any typing on the SoC.

perf top -a also reveals the SoC spends a lot of time in native_io_delay() when the system hangs. Stopping the "while true..." command makes native_io_delay() slowly creep up in perf's output (it is currently at 18%), so I am assuming this is the culprit.
Comment 13 Olivier Diotte (CMC) 2013-04-16 21:12:51 UTC
Looks like I was wrong.

Disabling the udev startup script (Gentoo's "rc-update del udev sysinit") allows me to boot without acpi=off without any −user-facing at least− issue. 

perf top still reports about 20% time spent in native_io_delay(), but as long as udev is not started, the system doesn't freeze.
Comment 14 Lan Tianyu 2013-04-17 08:40:17 UTC
hi, Sorry for later reply. This seems user space problem or udev gets some acpi info and then do some operations which block log daemon. So mark this bug as invalid. Thanks for your effort. I think you needs to analyse udev config file.
Comment 15 Olivier Diotte (CMC) 2013-04-24 18:58:42 UTC
Hi Tianyu,

I did some more tests and I am starting to believe I was right about this being a kernel bug (well, it could also be a bug in the BIOS/hardware/etc, but my understanding of the goal of linux-acpi is that we try to workaround that is the case).

It is a reproducible bug, but it seems quite hard to narrow down precisely.

Here is what I have gathered and what I can say about this issue:

-I have trimmed down gentoo's 3.7.10 kernel as much as possible to try to pindown the problem (attached is my .config file)

-The bug is hard to narrow down because the behaviour changes depending on whether certain debug kernel parameters are present or not (which seems consistent with what I saw from userspace in comment #10)

-Using the following kernel parameters:
*** BEGIN ***
ignore_loglevel debug initcall_debug udev.log_priority=8 earlyprintk=vga,keep print_fatal_signals=1 apm.debug=Y i8042.debug=Y drm.debug=1 scsi_logging_level=1 usbserial.debug=Y option.debug=Y hid.debug=1 pci_hotplug.debug=Y pci_hotplug.debug_acpi=Y apic=debug show_lapic=all hpet=verbose sysrq_always_enabled console=tty0 console=ttyS0,115200n8
*** END ***
the kernel boots and hangs just before my / (sda2) is mounted until I hit keys on the keyboard (can be seen by finding the empty lines at line 5186 in 3.7.10-gentoo_fullOptions_ctrlRatherThanNumLock.bootlog which correspond to me hitting ENTER on the serial console). If I keep hitting keys, I'll end up with a login prompt as described in comment #9.

-The USB keyboard seems to die (as can be seen by the NumLock LED going off) at some point during the boot (regardless of kernel parameters).

-With the full options above, the keyboard comes back (LED lights back up) near the point where it hangs. Even stranger: if a computer is connected to the serial port (but without a program driving it), the keyboard comes back on most of the time, but if I omit the "console=ttyS0,115200n8" part it never comes back (preventing me from doing anything other than a hard reset).

All of this (logs included) seems to point to the issue appearing even when userspace isn't loaded yet. Nevertheless, because of the inconsistent behaviour I am unsure whether that is a bug in the kernel or in the hardware (although, to the best of my knowledge, Windows has correct ACPI behaviour on this SoC for what it is worth).

I am leaving all this info here in case someone ever encounters this bug and wants to try to pinpoint/solve/workaround it, but I am not sure whether it is a bug in linux-acpi code, so I suppose this bug can stay closed until someone can confirm the bug (if ever).
Comment 16 Olivier Diotte (CMC) 2013-04-24 19:00:35 UTC
Created attachment 99921 [details]
.config for Gentoo's 3.7.10
Comment 17 Olivier Diotte (CMC) 2013-04-24 19:02:19 UTC
Created attachment 99931 [details]
Bootlog (no special option)

bootline: kernel /bool/kernel-oli-x86-3.7.10-gentoo root=/dev/sda2 console=tty0 console=ttyS0,115200n8
Comment 18 Olivier Diotte (CMC) 2013-04-24 19:03:16 UTC
Created attachment 99941 [details]
Bootlog (with ignore_loglevel)

bootline: kernel /bool/kernel-oli-x86-3.7.10-gentoo root=/dev/sda2 console=tty0
console=ttyS0,115200n8 ignore_loglevel
Comment 19 Olivier Diotte (CMC) 2013-04-24 19:05:02 UTC
Created attachment 99951 [details]
bootlog with "full" options

bootline: kernel /bool/kernel-oli-x86-3.7.10-gentoo root=/dev/sda2 ignore_loglevel debug initcall_debug udev.log_priority=8 earlyprintk=vga,keep
print_fatal_signals=1 apm.debug=Y i8042.debug=Y drm.debug=1
scsi_logging_level=1 usbserial.debug=Y option.debug=Y hid.debug=1
pci_hotplug.debug=Y pci_hotplug.debug_acpi=Y apic=debug show_lapic=all
hpet=verbose sysrq_always_enabled console=tty0 console=ttyS0,115200n8

The hid-core.c message would print every time I pressed the NumLock key
Comment 20 Olivier Diotte (CMC) 2013-04-24 19:06:24 UTC
Created attachment 99961 [details]
Bootlog with "full" options, no hid-core.c messages

bootline: kernel /bool/kernel-oli-x86-3.7.10-gentoo root=/dev/sda2
ignore_loglevel debug initcall_debug udev.log_priority=8 earlyprintk=vga,keep
print_fatal_signals=1 apm.debug=Y i8042.debug=Y drm.debug=1
scsi_logging_level=1 usbserial.debug=Y option.debug=Y hid.debug=1
pci_hotplug.debug=Y pci_hotplug.debug_acpi=Y apic=debug show_lapic=all
hpet=verbose sysrq_always_enabled console=tty0 console=ttyS0,115200n8

This time I hit Ctrl, so no hid-core.c message
Comment 21 Olivier Diotte (CMC) 2013-04-24 19:08:54 UTC
I should also add that even with clocksource=jiffies, clocksource=hpet or clocksource=tsc I was never able to fully boot correctly without acpi=off.

Note You need to log in before you can comment on or make changes to this bug.