I've been fiddling with running Fedora on the Dell Venue 8 Pro - a Bay Trail-m based tablet device - for some time. We've made some progress with 3.13 kernels - see https://bugzilla.kernel.org/show_bug.cgi?id=67861 , https://bugzilla.kernel.org/show_bug.cgi?id=67911 , https://bugzilla.kernel.org/show_bug.cgi?id=65841 , for e.g. Things seem to have regressed with 3.14 kernels, though. I've built three images with a 3.14 kernel so far - using Fedora Rawhide's kernel, and applying a few small vlv-specific patches, simplify-efi-initialization and allow-mapping-bgrt-on-x86-32 from https://bugzilla.kernel.org/show_bug.cgi?id=67911 , and a patch that reverts commit 6c4a8962a4a078cacfc8eb5d4bd79f6343b8cd7a (see https://bugs.freedesktop.org/show_bug.cgi?id=71977#c18 ). With each of these images, the boot process reaches grub fine, but when proceeding from there, just hangs at "Booting a command list", which is a grub message. I don't see anything at all from the kernel - it just gets stuck, apparently indefinitely, right there. Jan-Michael Brummer confirms seeing the same thing. These systems are somewhat notorious for having 32-bit UEFI firmwares, note. I'm doing a 32-bit UEFI boot on them, which is an unusual codepath. But it's known to work (at least, work better than this) in 3.11, 3.12 and 3.13.
My current kernel build is kernel-3.14.0-0.rc0.git19.1.1.fc21 - kernel-3.14.0-0.rc0.git19.1 is the current rawhide build, the extra '.1' denotes that this is my side build with the three vlv patches added.
Adam, presumably this is a change that went into for the v3.14 merge window? OK, first things first - can you try booting with efi=old_map on the kernel command line? This should be the default for x86-32 anyway, but it's possible that some of the logic is incorrect. There were only two major changes in this EFI merge window, the 1:1 virtual mapping changes (disabled with efi=old_map) and support for kexec on EFI.
Aha - this actually doesn't seem to be Bay Trail specific. One of our releng guys was saying he's seeing the same on his UEFI system with current Rawhide kernels, and indeed I just grabbed a Rawhide nightly, fired it up in a UEFI VM, and hit the same thing three times in a row. efi=old_map doesn't seem to help. so, I guess we're in the kexec-on-EFI stuff?
...and yeesh, even a BIOS VM seems to hang on boot with the latest Rawhide nightly. Clearly we have a major snafu somewhere, but it may be grub or dracut or something and not the kernel, and it doesn't seem to be baytrail or UEFI specific, and it may be a Fedora thing. maybe put this on hold for now, and I'm gonna file a Fedora bug, since apparently there isn't one yet.
http://happyassassin.net/temp/314_early_trace.png is the strack trace I got from booting with earlyprintk=vga in a BIOS KVM, josh boyer thinks he might have something to fix it that he's going to dig up for us later.
Yeah, the call stack looks like the stack protector code is firing. commit a0acda917284 ("acpi, numa, mem_hotplug: mark all nodes the kernel resides un-hotpluggable") looks like a potential culprit? (Of course you could always just disable CONFIG_CC_STACKPROTECTOR to be sure). It would be good to confirm that this is in fact the same issue you're hitting with the UEFI systems by capturing a stack trace, possibly via earlyprintk=efi.
jwb dug up some patches from LKML discussion last week - davej ran into something similar but couldn't quite nail it down. I'm going to build a baytrail kernel with those patches in today and check it boots on the v8p.
I've created an earlyprintk output and uploaded the photo to: http://de.tabos.org/temp/fedlet-314rc1-1.jpg
http://de.tabos.org/temp/fedlet-314rc1-2.jpg http://de.tabos.org/temp/fedlet-314rc1-3.jpg
Jan-Michael and Thomas report that https://lkml.org/lkml/2014/1/23/190 fixes this. I'll try and test/confirm here today.
I'm still hitting divide errors with that patch applied, but the intel folks say it works for them. I'm confused. I've tested multiple times, with various combinations of the 'add baytrail' and 'fallback to normal' patches (both Mika's initial version and Thomas' suggested improvement), and just the 'add baytrail' patch. No dice. Divide error, every damn time. oh, but hum, now I look at -3.jpg, my divide error is somewhere else...I'll attach the photo.
Created attachment 125001 [details] photo of my divide error
OK, so intel_pstate=disable gets me past my problem - definitely something exploding in pstate too. jan-michael, you don't see that? odd.
Since I think all the other various bugs discussed here are being handled elsewhere, let's make this the bug for my intel_pstate problem - the one in the photo in c#12. jan-michael says he doesn't hit this crash in a vanilla 3,14rc1 kernel that's stripped down for v8p (he may attach the kernel config he's using). But I don't see anything in Fedora Rawhide's current patch set or kernel spec that touches the intel_pstate driver in any way, nor do I see that anything's changed in intel_pstate in upstream git since rc1 (Fedora's kernels are bumped to new git revs between RCs). Just to be clear: with kernel-3.14.0-0.rc1.git2.1.2.fc21.i686, which is Fedora's kernel-3.14.0-0.rc1.git2.1 with the following patchset: # v8p # https://bugzilla.kernel.org/show_bug.cgi?id=67911 Patch26001: simplify-efi-initialization.patch Patch26002: allow-mapping-bgrt-on-x86-32.patch # Reverts "drm/i915/vlv: re-enable hotplug detect based probing on VLV/BYT" # see https://bugs.freedesktop.org/show_bug.cgi?id=71977 Patch26004: vlv-hotplug-revert.patch # fix tsc calibration, otherwise kernel hangs in early init # see https://bugzilla.kernel.org/show_bug.cgi?id=69831 # and https://lkml.org/lkml/2014/1/23/190 Patch26005: tsc-fallback-normal.patch Patch26006: tsc-add-baytrail.patch every boot on the Venue 8 Pro for me hits the intel_pstate crash shown in c#12. Booting with intel_pstate=disable gives me a successful boot.
Created attachment 125101 [details] Stripped DV8P kernel config As requested i'm attaching the kernel config we are using here. It has some additional modules for our usb+gigabit ethernet hub and some atheros wifi dongles.
Created attachment 125181 [details] config file being used by Intel guys who do *NOT* hit the pstate divide error For reference, this is the kernel config the Intel guys working on the V8P are using. They say with this config they don't hit the pstate divide error - i.e. they can boot with the patchset I listed, but with no intel_pstate=disable needed. The config I'm using is the stock Fedora Rawhide kernel config, I'm not modifying it.
oh, damn, sorry, just noticed I duped jan's post. sorry.
Ah, so the difference is debugging. If I disable debugging in Fedora's kernel, I can boot with pstate enabled.
Adam, so is it still a regression -- or did the debug vs non-debug difference also apply to 3.13?
I'm wondering if this helps by chance: https://patchwork.kernel.org/patch/3612801/
len: I would've been booting debug kernels all along, so this is still likely a regression.
rafael: sounds plausible, I'll try and test (once I'm done tracing out a dracut issue the fedlet found: this thing is an *awesome* bug magnet. :>)
For the record, after doing an install to my fedlet's internal storage, I was actually hitting a similar (but not the same) crash on boot even with a non-debug kernel. I've just built a debug kernel with the patch from c#20, and I'll see how that goes shortly.
The patch from c#20 does seem to help. A kernel with debugging enabled and that patch built in boots successfully without intel_pstate=disable both in a live image and in my installed system.
I have sent a patch set to the linux-pm list that contains Baytrail updates and removes power reporting so the patch from comment #20 will no longer be needed.
I've built a kernel with patch 1/5 from your new series to check that also does the trick, but didn't get enough round tuits to test today.
Looks like it already got upstreamed, but just to confirm, that patch does indeed seem to solve the problem. I believe there's nothing left requiring this to be open. Thanks.
shipped in Linux-3.14-rc3: commit 709c078e176bd47227e89bb34de7c64b57aaaeab Author: Dirk Brandewie <dirk.j.brandewie@intel.com> Date: Wed Feb 12 10:01:03 2014 -0800 intel_pstate: Remove energy reporting from pstate_sample tracepoint