Created attachment 120031 [details] photo of the crash/trace I've been trying to get a Dell Venue 8 Pro (valley view tablet) to boot Fedora on and off for the last while. Back around 3.12 / early 3.13 it would reliably boot and even start X with 'nomodeset', but almost none of the integrated hardware shows up. With recent 3.13 kernels it seems like it almost always fails to boot, but just occasionally does boot (and like earlier kernels, no X with modesetting, no wifi, no bluetooth, no sound etc etc). When it fails, I wind up with what's visible in the attachment: a "BUG: unable to handle kernel paging request" and a reference to __add_pin_to_irq_node, then a trace that looks like ACPI / PCI / IRQ stuff. The trace has taint flag W, but I didn't see any other traces scrolling by earlier in boot. Unfortunately, can't page up once it hangs - it's just stuck like this. The kernel I'm using is Fedora's current Rawhide build - 3.13.0-0.rc5.git0.1 , which is basically 3.13rc5 - with two patches added that seemed possibly useful to vlv. one is this: https://github.com/rjwysocki/linux-pm/commit/b2a51a6d0f96308251cfa41b793c43d0316e3b16 and the other hacks the SDIO ID for the tablet's wifi adapter into the ath6kl driver: --- linux-3.13.0-0.rc2.git2.1.1.fc21.x86_64/drivers/net/wireless/ath/ath6kl/sdio.c 2013-11-03 15:41:51.000000000 -0800 +++ linux-3.13.0-0.rc2.git2.1.1.fc21.x86_64/drivers/net/wireless/ath/ath6kl/sdio.c.new 2013-12-03 16:38:36.109011716 -0800 @@ -1403,6 +1403,7 @@ {SDIO_DEVICE(MANUFACTURER_CODE, (MANUFACTURER_ID_AR6003_BASE | 0x1))}, {SDIO_DEVICE(MANUFACTURER_CODE, (MANUFACTURER_ID_AR6004_BASE | 0x0))}, {SDIO_DEVICE(MANUFACTURER_CODE, (MANUFACTURER_ID_AR6004_BASE | 0x1))}, + {SDIO_DEVICE(MANUFACTURER_CODE, (MANUFACTURER_ID_AR6004_BASE | 0x18))}, {}, }; I doubt either is relevant here.
Forgot to mention - I've actually tested with the stock current Rawhide kernel with no patches, and it does the same thing.
Testing a build with "pinctrl: baytrail: lock IRQs when starting them" patch from LKML to see if that helps...
Nope, same with that kernel. :/
Hum, looks like this error may not be directly causing the boot fail, if I leave it for a long time it drops to a dracut emergency shell complaining it can't find some disks. Have to look into that. But it's definitely happening, and I guess it shouldn't be.
Created attachment 120181 [details] full journal from a successful boot on the affected system, including traceback So the issue definitely isn't preventing boot (it seems like the device's firmware is kind of wiggy about booting from the same USB stick twice in a row, or something...) but it definitely happens during a 'successful' boot, and may be involved in the fact that almost none of the device's hardware works. Here is a full journal from booting the affected system successfully with my kernel 3.13.0-0.rc5.git0.1.2.fc21.x86_64 , which is 3.13.0-0.rc5.git0.1 with five patches on top, all pulled from various upstream maintainer branches except my hack to ath6kl: add-baytrail-soc-gpio.patch - https://github.com/rjwysocki/linux-pm/commit/b2a51a6d0f96308251cfa41b793c43d0316e3b16 ath6kl_v8p.patch use-correct-gmch-register.patch - http://cgit.freedesktop.org/~danvet/drm-intel/commit/?h=drm-intel-fixes&id=a885b3ccc74d8e38074e1c43a47c354c5ea0b01e rapl-add-valleyview.patch - https://github.com/rjwysocki/linux-pm/commit/ed93b71492da3464b4798613aa8a99bed914251b baytrail-lock-irqs-when-starting.patch - http://www.gossamer-threads.com/lists/engine?do=post_view_printable;post=1829558;list=linux I definitely saw this same traceback when booting an unmodified 3.13.0-0.rc5.git0.1, I don't believe any of these patches is implicated or involved.
Looks like the earlier trace that causes this one to be tainted is a warn_slowpath_common in ACPI setup or something. I'll file that separately.
Here's the bug snipped out of the log, for ease of reference: Dec 29 11:11:34 localhost kernel: BUG: unable to handle kernel paging request at 41003230 Dec 29 11:11:34 localhost kernel: IP: [<c0431329>] __add_pin_to_irq_node+0x29/0xa0 Dec 29 11:11:34 localhost kernel: *pde = 00000000 Dec 29 11:11:34 localhost kernel: Oops: 0000 [#1] SMP Dec 29 11:11:34 localhost kernel: Modules linked in: i915(+) crc32_pclmul i2c_algo_bit drm_kms_helper crc32c_intel drm i2c_core video usb_storage loop Dec 29 11:11:34 localhost kernel: CPU: 3 PID: 343 Comm: systemd-udevd Tainted: G W 3.13.0-0.rc5.git0.1.2.fc21.i686 #1 Dec 29 11:11:34 localhost kernel: Hardware name: DellInc. Venue 8 Pro 5830/09RP78, BIOS A02 10/17/2013 Dec 29 11:11:34 localhost kernel: task: f329f080 ti: f33be000 task.ti: f33be000 Dec 29 11:11:34 localhost kernel: EIP: 0060:[<c0431329>] EFLAGS: 00010206 CPU: 3 Dec 29 11:11:34 localhost kernel: EIP is at __add_pin_to_irq_node+0x29/0xa0 Dec 29 11:11:34 localhost kernel: EAX: 41003230 EBX: 00000000 ECX: 00000000 EDX: ffffffff Dec 29 11:11:34 localhost kernel: ESI: 00000010 EDI: 41003230 EBP: f33bfaf8 ESP: f33bfad8 Dec 29 11:11:34 localhost kernel: DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068 Dec 29 11:11:34 localhost kernel: CR0: 80050033 CR2: 41003230 CR3: 332cc000 CR4: 001007d0 Dec 29 11:11:34 localhost kernel: Stack: Dec 29 11:11:34 localhost kernel: 00000010 c0d663b0 f33bfae8 c068dd9a ffffffff ffffffff 00000010 f3b0c194 Dec 29 11:11:34 localhost kernel: f33bfb4c c0431c2e 00000010 00000000 f33bfb10 f3a90810 00000000 00000001 Dec 29 11:11:34 localhost kernel: f33bfb44 c071b1d8 c071b638 00000000 00000000 f33bfbb0 00000000 c09fd830 Dec 29 11:11:34 localhost kernel: Call Trace: Dec 29 11:11:34 localhost kernel: [<c068dd9a>] ? radix_tree_lookup+0xa/0x10 Dec 29 11:11:34 localhost kernel: [<c0431c2e>] io_apic_setup_irq_pin+0x5e/0x340 Dec 29 11:11:34 localhost kernel: [<c071b1d8>] ? acpi_ut_update_object_reference+0xec/0x159 Dec 29 11:11:34 localhost kernel: [<c071b638>] ? acpi_ut_evaluate_object+0x17b/0x185 Dec 29 11:11:34 localhost kernel: [<c043327b>] io_apic_setup_irq_pin_once+0x7b/0xd0 Dec 29 11:11:34 localhost kernel: [<c07003fe>] ? acpi_pci_irq_find_prt_entry+0x1fd/0x217 Dec 29 11:11:34 localhost kernel: [<c0433319>] io_apic_set_pci_routing+0x39/0x60 Dec 29 11:11:34 localhost kernel: [<c0429437>] mp_register_gsi+0x97/0x190 Dec 29 11:11:34 localhost kernel: [<c0429547>] acpi_register_gsi_ioapic+0x17/0x20 Dec 29 11:11:34 localhost kernel: [<c04292b8>] acpi_register_gsi+0x18/0x30 Dec 29 11:11:34 localhost kernel: [<c0700699>] acpi_pci_irq_enable+0x143/0x24c Dec 29 11:11:34 localhost kernel: [<c06b9ca4>] ? pci_bus_read_config_word+0x74/0x80 Dec 29 11:11:34 localhost kernel: [<c08a9c40>] ? pci_read+0x30/0x40 Dec 29 11:11:34 localhost kernel: [<c08a9ef0>] pcibios_enable_device+0x30/0x40 Dec 29 11:11:34 localhost kernel: [<c06c0c91>] do_pci_enable_device+0x31/0x50 Dec 29 11:11:34 localhost kernel: [<c06c1e23>] pci_enable_device_flags+0xb3/0x100 Dec 29 11:11:34 localhost kernel: [<c06c1ec2>] pci_enable_device+0x12/0x20 Dec 29 11:11:34 localhost kernel: [<f9012466>] drm_get_pci_dev+0x56/0x120 [drm] Dec 29 11:11:34 localhost kernel: [<c05bad38>] ? sysfs_do_create_link_sd.isra.2+0xa8/0x1b0 Dec 29 11:11:34 localhost kernel: [<f91175ea>] i915_pci_probe+0x3a/0x80 [i915] Dec 29 11:11:34 localhost kernel: [<c06c3a5f>] pci_device_probe+0x6f/0xc0 Dec 29 11:11:34 localhost kernel: [<c05bae65>] ? sysfs_create_link+0x25/0x40 Dec 29 11:11:34 localhost kernel: [<c07717d5>] driver_probe_device+0x105/0x380 Dec 29 11:11:34 localhost kernel: [<c06c39ae>] ? pci_match_device+0x9e/0xb0 Dec 29 11:11:34 localhost kernel: [<c0771b01>] __driver_attach+0x71/0x80 Dec 29 11:11:34 localhost kernel: [<c0771a90>] ? __device_attach+0x40/0x40 Dec 29 11:11:34 localhost kernel: [<c076fc67>] bus_for_each_dev+0x47/0x80 Dec 29 11:11:34 localhost kernel: [<c077124e>] driver_attach+0x1e/0x20 Dec 29 11:11:34 localhost kernel: [<c0771a90>] ? __device_attach+0x40/0x40 Dec 29 11:11:34 localhost kernel: [<c0770ea7>] bus_add_driver+0x157/0x230 Dec 29 11:11:34 localhost kernel: [<c07720a9>] driver_register+0x59/0xe0 Dec 29 11:11:34 localhost kernel: [<c06c24c2>] __pci_register_driver+0x32/0x40 Dec 29 11:11:34 localhost kernel: [<f9012625>] drm_pci_init+0xf5/0x100 [drm] Dec 29 11:11:34 localhost kernel: [<f8bb1000>] ? 0xf8bb0fff Dec 29 11:11:34 localhost kernel: [<f8bb105e>] i915_init+0x5e/0x60 [i915] Dec 29 11:11:34 localhost kernel: [<c0400482>] do_one_initcall+0xd2/0x190 Dec 29 11:11:34 localhost kernel: [<f8bb1000>] ? 0xf8bb0fff Dec 29 11:11:34 localhost kernel: [<c043d1f7>] ? set_memory_ro+0x37/0x40 Dec 29 11:11:34 localhost kernel: [<c04b04c3>] load_module+0x1a23/0x2360 Dec 29 11:11:34 localhost kernel: [<c04b0f65>] SyS_finit_module+0x75/0xc0 Dec 29 11:11:34 localhost kernel: [<c0520f2b>] ? vm_mmap_pgoff+0x7b/0xa0 Dec 29 11:11:34 localhost kernel: [<c09b24cd>] sysenter_do_call+0x12/0x28 Dec 29 11:11:34 localhost kernel: Code: 00 00 55 89 e5 57 56 53 83 ec 14 3e 8d 74 26 00 8b 38 8b 75 08 89 55 f0 89 cb 85 ff 75 0d eb 4e 66 90 8b 47 08 85 c0 74 19 89 c7 <39> 1f 75 f3 39 77 04 75 ee 83 c4 14 31 c0 5b 5e 5f 5d c3 8d 74 Dec 29 11:11:34 localhost kernel: EIP: [<c0431329>] __add_pin_to_irq_node+0x29/0xa0 SS:ESP 0068:f33bfad8 Dec 29 11:11:34 localhost kernel: CR2: 0000000041003230 Dec 29 11:11:34 localhost kernel: ---[ end trace 22c53e93c5894589 ]---
Created attachment 120191 [details] acpidump from affected system
The other hardware is mostly ACPI identifiers not PCI. The trace is the graphics trying to setup going into ACPI and then exploding. You may be able to capture the earlier crash with netconsole or by booting with i915 disabled and sticking while(1); on thr end of the Oops dumping code so it hangs the box on the first oops. The other thing to disable is the P State driver as we know there are some problems there right now.
I've already captured the other traceback. Once I realized neither actually prevented boot, but the v8p just has weird issues booting Linux from USB multiple times in succession or something, it was easy - I can't boot my test stick twice in a row easily, I have to entirely disconnect the USB hub and boot once without it, then shut down, reconnect the USB hub and boot again, don't ask me why. Anyway, yeah, that's how I got the logs and everything, and the earlier trace is visible in the logs, if you look. I filed it at https://bugzilla.kernel.org/show_bug.cgi?id=67911 , it's a warn_slowpath_common. If this one is in graphics stuff, could it be related to a later problem I have, the entire system hanging as soon as I try to start X with modesetting? https://bugs.freedesktop.org/show_bug.cgi?id=73133
Someone commented on my G+ (where I've been blogging about this tablet) that there's a Yocto branch with a bunch of baytrail fixes: http://git.yoctoproject.org/cgit.cgi/linux-yocto-contrib/log/?h=rebecca/base-baytrail Just in case this wasn't already generally known. Looks like there's some activity from Intel on baytrail fixups over there, I presume it'll get submitted for merging at some point.
There are several species of Baytrail, the relevant one for the tablets is Baytrail/T. I'm not sure what the Yocto enabling folk are working with but that tree seems to be a general dump of tons of forward porting. There are folks working on Baytrail/T in Intel. I did pass your contact info on - I'll prod them further if need be.
well, they said they'd got KMS video working with that tree. I haven't tried it myself yet, though.
See the apic one here with 3.13-rc7, but not 3.12 on Asus TA100T Trying to pin down the bad patch (bug 68291)
Should we close this as a dupe of that, then, or vice versa?
I'm pretty sure this and Alan's are the same, if I apply the patch from https://bugzilla.kernel.org/show_bug.cgi?id=68291#c5 to my kernel this trace goes away (but I wind up hitting https://bugs.freedesktop.org/show_bug.cgi?id=71977 again). Marking as a dupe. *** This bug has been marked as a duplicate of bug 68291 ***