Using this tablet:
With 3.17 rc0 fails to boot, printing a single line of text:
until the end of the line.
Turning on debug doesn't print more data. This machine has a 32-bit UEFI firmware, and I used a 32 bit kernel.
kernel 3.16.0 can boot just fine.
Created attachment 149781 [details]
Confirmed I'm seeing this on my Venue 8 Pro, and (ex-)Intel devs working on the baytrail platform say they've seen it too but haven't got around to debugging yet. I usually get two repeating characters for a full-width line, similar to Bastien's picture. I think for a given kernel it's always the same two characters, but they're not the same between two different kernels, but I'd have to try it a couple more times to confirm.
From Jan-Michael Brummer:
"You are right, 3.17 is buggy as hell >:( We have the same experience
here but not the time to bisect the character-issue. Hopefully at the
end of the week."
this is of course a definite regression from 3.16, which booted successfully on these systems. It persists up to at least rc2, I'll check the latest Fedora 'rc4' kernel shortly.
Still fails for me with 3.17 rc4.
A git bisect would be invaluable in tracking this down.
Alternatively, applying the patches on the following branch may improve things somewhat,
the latter's easier :P, so I'll try it. if it doesn't help i'll try and get to a bisect at some point, but currently drowning in f21 alpha, unfortunately, so my side projects get limited time :/
Just tried the 3.17-rc4 with your both patches, it boots again :) As i do not use a initrd obviously the GOT patch did it. Thanks
It fixes it (kind of). It booted twice, out of 3 separate attempts. Might not be related though, graphics is also particularly awful.
Right, it completely failed to boot in my last 3 attempts. 3.16 still works as expected.
Works fine with the patches, no boot problem (DV8P). Are you sure that it is the same issue?
I just rebased the Fedlet kernel to 3.17 and included the two x86-efi patches from Matt's branch. I'll see how that flies. It's building at http://koji.fedoraproject.org/koji/taskinfo?taskID=7575316 .
(In reply to Jan-Michael Brummer from comment #9)
> Works fine with the patches, no boot problem (DV8P). Are you sure that it is
> the same issue?
Same issue as what?
Hum - so a 3.17rc4 kernel I built with Matt's patches - that's the one from c#10 - seems to boot reliably for me, but a 3.17rc5 kernel built with the exact same delta from Fedora's stock (same patch set, same config options) fails, sticks on the 'garbage characters' screen.
I guess it's worth noting the 'garbage characters' are briefly visible in a working boot with 3.17rc4, but are quickly replaced by the normal boot screen.
Note: the garbage characters are produced by efi_printk() and the message should be 'setup_efi_pci() failed!'
Commenting the efi_printk() of 'setup_efi_pci() failed!' reveals a oops within virt_efi_get_time().
Jan-Michael, but things boot correctly with the patches in my 'urgent' queue, right?
Well, at first they solved my issue. But after recreating my initrd the same symptom happend. That's why i moved on and tried to figure out what the characters mean. Obviously there is something wrong with the efi part, because as soon as i remove the efi_printk() it continues to boot and shows the framebuffer following with an oops in virt_efi_get_time(). See me previous comments.
Could you upload a copy of the oops? Debugging that is going to be much more straight forward than debugging the early efi_prinkt() issue (assuming they're the same bug).
Also, try turning off CONFIG_RTC_DRV_EFI and see whether you get a working machine. It used to be disabled for x86 because, by and large, the implementations of GetTime() don't work very well on x86. Recent arm64 changes look like they turned it on.
Created attachment 150911 [details]
oops in efi time functions
As soon as i'm back at home (monday) i'll disable the efi rtc support. In the meantime i hope that my photo from yesterday helps.
I see something similar - the 3.17rc4 Fedlet kernel I first built with the patches booted, but subsequently I built a 3.17rc5 kernel and that fails. I *think* it's deterministic - the 3.17rc4 kernel always boots, the rc5 one never does - but I haven't done enough boots yet to be absolutely sure, it could just be a very unlikely chance streak (I think I've done ~5 successful boots on rc4, ~5 failed boots on rc5).
Yeah, that looks like a firmware bug to me. The IP is 0xf04e1a36. That's not a kernel address. We can probably just chalk this up to one of the many GetTime() bugs.
So the efi_printk() bugs is interesting. Can you try and bisect betwen -rc4 and -rc5 and see if it's some other regression?
Well, I'm not sure if rc4->rc5 is the thing, it might be the initramfs generation or contents or something - Jan-Michael said regenerating initramfs triggered it for him, and I believe Bastien said the patches didn't completely work for him either. More testing needed I guess...
Jan-Michael, when you re-generated your initramfs did you change something?
I think it is related to the garbage output and this one must be investigated. So we should try to find the "buggy" patch between 3.16 and 3.17rcX. Unfortunately i'm on a trip this weekend... Adam, can you jump in?
I think i just changed the plymouth startup screen....
"Adam, can you jump in?"
siiigh, I guess the gods really want me to set up a tweaked local kernel build environment :/ (I still build the fedlet kernels as full-fat, two-hour-long builds in Fedora's buildsystem, I don't have a config for doing a stripped local build). I'll try.
Guys, could you try the attached patch? It applies on top of f3670394c29f in Linus' tree. It basically reverts a bunch of the EFI boot stub code back to the pre-merge window state.
Also, I can reproduce those unreadable characters on boot on my ASUS T100. I'll take a look at that issue.
Created attachment 151601 [details]
I have a fedlet build with 'big arse revert' running now.
(there's a sentence I never thought I'd write.)
Created attachment 151781 [details]
Fix garbled text output
If someone wants to give the "Fix garbled text output" patch a try, it should fix the issue in the description to this bug.
Sorry for the delay, I got sidelined.
An rc6 kernel with the 'big arse rebuild' boots successfully on the Fedlet (Venue 8 Pro, running pure 32-bit boot chain). Of course, rc4 booted but rc5 failed with the same patchset so this seems a sensitive issue, but insofar as I have an indication, it says 'big arses are good'.
Oh, I don't believe I saw the garbled text with the 'big arse revert' either (I didn't use the patch from c#29).
This should all be fixed in v3.17-rc7. Please reopen this bug if you're still having problems.
So, I've been unable to reliably boot anything later then 14.04.1 LTS on my pipo W2 bay trail windows tablet. tried the latest fedlet build, no dice. got to desktop once on fedlet and crashed there. no other attempts made it to desktop. (in about 10 attempts) tried ubuntu 14.10, and many ubuntu flavors, (xubuntu, mate, kubuntu, lubuntu, gnome, etc) 14.04.1 boots fine though.
I'm under the impression the latest Fedlet builds are based on a kernel that should be fixed. (don't think the same applies to the official 14.10 releases on ubuntu variants) So thought I would mention that this (or another closely related) kernel bug is still being, well, buggy. I'm not certain how to determine whats actually the problem. I've been using lubuntu for years, but when it comes to troubleshooting anything serious like this, I don't know where to even look, since I get a garbled screen on boot, so no text to post. I assume there's some log file on a live usb drive that would have the information, but I'm not certain where it would be, which greatly limits my ability to provide helpful feedback. I've seen people mention .dmesg and other logs of some sort. But I really am a noob in this case, trying to work through things way out of my depth, and learn as much as I can in the process.
If it crashes during early boot there's no magic log file storage, unfortunately, no. We mostly debug early kernel crashes based on screen photos / videos :/
The particular bug covered in this report does seem to be pretty solidly fixed; what you're hitting is likely something else. But it does sound like it might be a bit tricky to figure out what. Try editing the kernel command line and removing 'rhgb quiet' and see if you get more messages that way (that's for Fedlet).
interesting, thanks for the reply. Now I am perplexed. after 11 unsuccessful boot attempts on fedlet, now, 3 successful boots in a row, with quiet rhgb deleted, it has booted to desktop, got a kernel problem message on bootup (clicked to report it, don't know if that ever helps) and its acting laggy and the touchscreen isn't working, but those seem like just the usual expected hardware issues from the many slightly different bay trail tablets. Sorry to waste people's time on this, and thank you :D I'll post a couple questions on your fedlet site though, once I figure out what to ask while I tinker around with it. Still no luck on any ubuntu varient of 14.10
(In reply to Jan-Michael Brummer from comment #18)
> Created attachment 150911 [details]
> oops in efi time functions
Valuable information. Thanks for sharing such a useful post.