Bug 67861 - Venue 8 Pro (valleyview): 'unable to handle kernel paging request' error and trace in __add_pin_to_irq_node on boot
Summary: Venue 8 Pro (valleyview): 'unable to handle kernel paging request' error and ...
Status: CLOSED DUPLICATE of bug 68291
Alias: None
Product: ACPI
Classification: Unclassified
Component: Config-Interrupts (show other bugs)
Hardware: i386 Linux
: P1 high
Assignee: acpi_config-interrupts
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2013-12-29 01:47 UTC by Adam Williamson
Modified: 2014-10-20 21:21 UTC (History)
5 users (show)

See Also:
Kernel Version: 3.13.0-0.rc5.git0.1.fc21
Subsystem:
Regression: No
Bisected commit-id:


Attachments
photo of the crash/trace (266.52 KB, image/jpeg)
2013-12-29 01:47 UTC, Adam Williamson
Details
full journal from a successful boot on the affected system, including traceback (170.97 KB, text/plain)
2013-12-30 00:21 UTC, Adam Williamson
Details
acpidump from affected system (265.28 KB, text/plain)
2013-12-30 00:36 UTC, Adam Williamson
Details

Description Adam Williamson 2013-12-29 01:47:26 UTC
Created attachment 120031 [details]
photo of the crash/trace

I've been trying to get a Dell Venue 8 Pro (valley view tablet) to boot Fedora on and off for the last while. Back around 3.12 / early 3.13 it would reliably boot and even start X with 'nomodeset', but almost none of the integrated hardware shows up. With recent 3.13 kernels it seems like it almost always fails to boot, but just occasionally does boot (and like earlier kernels, no X with modesetting, no wifi, no bluetooth, no sound etc etc).

When it fails, I wind up with what's visible in the attachment: a "BUG: unable to handle kernel paging request" and a reference to __add_pin_to_irq_node, then a trace that looks like ACPI / PCI / IRQ stuff. The trace has taint flag W, but I didn't see any other traces scrolling by earlier in boot. Unfortunately, can't page up once it hangs - it's just stuck like this.

The kernel I'm using is Fedora's current Rawhide build - 3.13.0-0.rc5.git0.1 , which is basically 3.13rc5 - with two patches added that seemed possibly useful to vlv. one is this:

https://github.com/rjwysocki/linux-pm/commit/b2a51a6d0f96308251cfa41b793c43d0316e3b16

and the other hacks the SDIO ID for the tablet's wifi adapter into the ath6kl driver:

--- linux-3.13.0-0.rc2.git2.1.1.fc21.x86_64/drivers/net/wireless/ath/ath6kl/sdio.c      2013-11-03 15:41:51.000000000 -0800
+++ linux-3.13.0-0.rc2.git2.1.1.fc21.x86_64/drivers/net/wireless/ath/ath6kl/sdio.c.new  2013-12-03 16:38:36.109011716 -0800
@@ -1403,6 +1403,7 @@
        {SDIO_DEVICE(MANUFACTURER_CODE, (MANUFACTURER_ID_AR6003_BASE | 0x1))},
        {SDIO_DEVICE(MANUFACTURER_CODE, (MANUFACTURER_ID_AR6004_BASE | 0x0))},
        {SDIO_DEVICE(MANUFACTURER_CODE, (MANUFACTURER_ID_AR6004_BASE | 0x1))},
+       {SDIO_DEVICE(MANUFACTURER_CODE, (MANUFACTURER_ID_AR6004_BASE | 0x18))},
        {},
 };

I doubt either is relevant here.
Comment 1 Adam Williamson 2013-12-29 01:48:29 UTC
Forgot to mention - I've actually tested with the stock current Rawhide kernel with no patches, and it does the same thing.
Comment 2 Adam Williamson 2013-12-29 02:57:50 UTC
Testing a build with "pinctrl: baytrail: lock IRQs when starting them" patch from LKML to see if that helps...
Comment 3 Adam Williamson 2013-12-29 07:40:21 UTC
Nope, same with that kernel. :/
Comment 4 Adam Williamson 2013-12-29 07:45:15 UTC
Hum, looks like this error may not be directly causing the boot fail, if I leave it for a long time it drops to a dracut emergency shell complaining it can't find some disks. Have to look into that. But it's definitely happening, and I guess it shouldn't be.
Comment 5 Adam Williamson 2013-12-30 00:21:53 UTC
Created attachment 120181 [details]
full journal from a successful boot on the affected system, including traceback

So the issue definitely isn't preventing boot (it seems like the device's firmware is kind of wiggy about booting from the same USB stick twice in a row, or something...) but it definitely happens during a 'successful' boot, and may be involved in the fact that almost none of the device's hardware works. Here is a full journal from booting the affected system successfully with my kernel 3.13.0-0.rc5.git0.1.2.fc21.x86_64 , which is 3.13.0-0.rc5.git0.1 with five patches on top, all pulled from various upstream maintainer branches except my hack to ath6kl:

add-baytrail-soc-gpio.patch - https://github.com/rjwysocki/linux-pm/commit/b2a51a6d0f96308251cfa41b793c43d0316e3b16
ath6kl_v8p.patch
use-correct-gmch-register.patch - http://cgit.freedesktop.org/~danvet/drm-intel/commit/?h=drm-intel-fixes&id=a885b3ccc74d8e38074e1c43a47c354c5ea0b01e
rapl-add-valleyview.patch - https://github.com/rjwysocki/linux-pm/commit/ed93b71492da3464b4798613aa8a99bed914251b
baytrail-lock-irqs-when-starting.patch - http://www.gossamer-threads.com/lists/engine?do=post_view_printable;post=1829558;list=linux

I definitely saw this same traceback when booting an unmodified 3.13.0-0.rc5.git0.1, I don't believe any of these patches is implicated or involved.
Comment 6 Adam Williamson 2013-12-30 00:23:19 UTC
Looks like the earlier trace that causes this one to be tainted is a warn_slowpath_common in ACPI setup or something. I'll file that separately.
Comment 7 Adam Williamson 2013-12-30 00:31:12 UTC
Here's the bug snipped out of the log, for ease of reference:

Dec 29 11:11:34 localhost kernel: BUG: unable to handle kernel paging request at 41003230
Dec 29 11:11:34 localhost kernel: IP: [<c0431329>] __add_pin_to_irq_node+0x29/0xa0
Dec 29 11:11:34 localhost kernel: *pde = 00000000 
Dec 29 11:11:34 localhost kernel: Oops: 0000 [#1] SMP 
Dec 29 11:11:34 localhost kernel: Modules linked in: i915(+) crc32_pclmul i2c_algo_bit drm_kms_helper crc32c_intel drm i2c_core video usb_storage loop
Dec 29 11:11:34 localhost kernel: CPU: 3 PID: 343 Comm: systemd-udevd Tainted: G        W    3.13.0-0.rc5.git0.1.2.fc21.i686 #1
Dec 29 11:11:34 localhost kernel: Hardware name: DellInc. Venue 8 Pro 5830/09RP78, BIOS A02 10/17/2013
Dec 29 11:11:34 localhost kernel: task: f329f080 ti: f33be000 task.ti: f33be000
Dec 29 11:11:34 localhost kernel: EIP: 0060:[<c0431329>] EFLAGS: 00010206 CPU: 3
Dec 29 11:11:34 localhost kernel: EIP is at __add_pin_to_irq_node+0x29/0xa0
Dec 29 11:11:34 localhost kernel: EAX: 41003230 EBX: 00000000 ECX: 00000000 EDX: ffffffff
Dec 29 11:11:34 localhost kernel: ESI: 00000010 EDI: 41003230 EBP: f33bfaf8 ESP: f33bfad8
Dec 29 11:11:34 localhost kernel:  DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068
Dec 29 11:11:34 localhost kernel: CR0: 80050033 CR2: 41003230 CR3: 332cc000 CR4: 001007d0
Dec 29 11:11:34 localhost kernel: Stack:
Dec 29 11:11:34 localhost kernel:  00000010 c0d663b0 f33bfae8 c068dd9a ffffffff ffffffff 00000010 f3b0c194
Dec 29 11:11:34 localhost kernel:  f33bfb4c c0431c2e 00000010 00000000 f33bfb10 f3a90810 00000000 00000001
Dec 29 11:11:34 localhost kernel:  f33bfb44 c071b1d8 c071b638 00000000 00000000 f33bfbb0 00000000 c09fd830
Dec 29 11:11:34 localhost kernel: Call Trace:
Dec 29 11:11:34 localhost kernel:  [<c068dd9a>] ? radix_tree_lookup+0xa/0x10
Dec 29 11:11:34 localhost kernel:  [<c0431c2e>] io_apic_setup_irq_pin+0x5e/0x340
Dec 29 11:11:34 localhost kernel:  [<c071b1d8>] ? acpi_ut_update_object_reference+0xec/0x159
Dec 29 11:11:34 localhost kernel:  [<c071b638>] ? acpi_ut_evaluate_object+0x17b/0x185
Dec 29 11:11:34 localhost kernel:  [<c043327b>] io_apic_setup_irq_pin_once+0x7b/0xd0
Dec 29 11:11:34 localhost kernel:  [<c07003fe>] ? acpi_pci_irq_find_prt_entry+0x1fd/0x217
Dec 29 11:11:34 localhost kernel:  [<c0433319>] io_apic_set_pci_routing+0x39/0x60
Dec 29 11:11:34 localhost kernel:  [<c0429437>] mp_register_gsi+0x97/0x190
Dec 29 11:11:34 localhost kernel:  [<c0429547>] acpi_register_gsi_ioapic+0x17/0x20
Dec 29 11:11:34 localhost kernel:  [<c04292b8>] acpi_register_gsi+0x18/0x30
Dec 29 11:11:34 localhost kernel:  [<c0700699>] acpi_pci_irq_enable+0x143/0x24c
Dec 29 11:11:34 localhost kernel:  [<c06b9ca4>] ? pci_bus_read_config_word+0x74/0x80
Dec 29 11:11:34 localhost kernel:  [<c08a9c40>] ? pci_read+0x30/0x40
Dec 29 11:11:34 localhost kernel:  [<c08a9ef0>] pcibios_enable_device+0x30/0x40
Dec 29 11:11:34 localhost kernel:  [<c06c0c91>] do_pci_enable_device+0x31/0x50
Dec 29 11:11:34 localhost kernel:  [<c06c1e23>] pci_enable_device_flags+0xb3/0x100
Dec 29 11:11:34 localhost kernel:  [<c06c1ec2>] pci_enable_device+0x12/0x20
Dec 29 11:11:34 localhost kernel:  [<f9012466>] drm_get_pci_dev+0x56/0x120 [drm]
Dec 29 11:11:34 localhost kernel:  [<c05bad38>] ? sysfs_do_create_link_sd.isra.2+0xa8/0x1b0
Dec 29 11:11:34 localhost kernel:  [<f91175ea>] i915_pci_probe+0x3a/0x80 [i915]
Dec 29 11:11:34 localhost kernel:  [<c06c3a5f>] pci_device_probe+0x6f/0xc0
Dec 29 11:11:34 localhost kernel:  [<c05bae65>] ? sysfs_create_link+0x25/0x40
Dec 29 11:11:34 localhost kernel:  [<c07717d5>] driver_probe_device+0x105/0x380
Dec 29 11:11:34 localhost kernel:  [<c06c39ae>] ? pci_match_device+0x9e/0xb0
Dec 29 11:11:34 localhost kernel:  [<c0771b01>] __driver_attach+0x71/0x80
Dec 29 11:11:34 localhost kernel:  [<c0771a90>] ? __device_attach+0x40/0x40
Dec 29 11:11:34 localhost kernel:  [<c076fc67>] bus_for_each_dev+0x47/0x80
Dec 29 11:11:34 localhost kernel:  [<c077124e>] driver_attach+0x1e/0x20
Dec 29 11:11:34 localhost kernel:  [<c0771a90>] ? __device_attach+0x40/0x40
Dec 29 11:11:34 localhost kernel:  [<c0770ea7>] bus_add_driver+0x157/0x230
Dec 29 11:11:34 localhost kernel:  [<c07720a9>] driver_register+0x59/0xe0
Dec 29 11:11:34 localhost kernel:  [<c06c24c2>] __pci_register_driver+0x32/0x40
Dec 29 11:11:34 localhost kernel:  [<f9012625>] drm_pci_init+0xf5/0x100 [drm]
Dec 29 11:11:34 localhost kernel:  [<f8bb1000>] ? 0xf8bb0fff
Dec 29 11:11:34 localhost kernel:  [<f8bb105e>] i915_init+0x5e/0x60 [i915]
Dec 29 11:11:34 localhost kernel:  [<c0400482>] do_one_initcall+0xd2/0x190
Dec 29 11:11:34 localhost kernel:  [<f8bb1000>] ? 0xf8bb0fff
Dec 29 11:11:34 localhost kernel:  [<c043d1f7>] ? set_memory_ro+0x37/0x40
Dec 29 11:11:34 localhost kernel:  [<c04b04c3>] load_module+0x1a23/0x2360
Dec 29 11:11:34 localhost kernel:  [<c04b0f65>] SyS_finit_module+0x75/0xc0
Dec 29 11:11:34 localhost kernel:  [<c0520f2b>] ? vm_mmap_pgoff+0x7b/0xa0
Dec 29 11:11:34 localhost kernel:  [<c09b24cd>] sysenter_do_call+0x12/0x28
Dec 29 11:11:34 localhost kernel: Code: 00 00 55 89 e5 57 56 53 83 ec 14 3e 8d 74 26 00 8b 38 8b 75 08 89 55 f0 89 cb 85 ff 75 0d eb 4e 66 90 8b 47 08 85 c0 74 19 89 c7 <39> 1f 75 f3 39 77 04 75 ee 83 c4 14 31 c0 5b 5e 5f 5d c3 8d 74
Dec 29 11:11:34 localhost kernel: EIP: [<c0431329>] __add_pin_to_irq_node+0x29/0xa0 SS:ESP 0068:f33bfad8
Dec 29 11:11:34 localhost kernel: CR2: 0000000041003230
Dec 29 11:11:34 localhost kernel: ---[ end trace 22c53e93c5894589 ]---
Comment 8 Adam Williamson 2013-12-30 00:36:59 UTC
Created attachment 120191 [details]
acpidump from affected system
Comment 9 Alan 2013-12-30 15:00:23 UTC
The other hardware is mostly ACPI identifiers not PCI.

The trace is the graphics trying to setup going into ACPI and then exploding. You may be able to capture the earlier crash with netconsole or by booting with i915 disabled and sticking while(1); on thr end of the Oops dumping code so it hangs the box on the first oops.

The other thing to disable is the P State driver as we know there are some problems there right now.
Comment 10 Adam Williamson 2013-12-30 17:18:01 UTC
I've already captured the other traceback. Once I realized neither actually prevented boot, but the v8p just has weird issues booting Linux from USB multiple times in succession or something, it was easy - I can't boot my test stick twice in a row easily, I have to entirely disconnect the USB hub and boot once without it, then shut down, reconnect the USB hub and boot again, don't ask me why.

Anyway, yeah, that's how I got the logs and everything, and the earlier trace is visible in the logs, if you look. I filed it at https://bugzilla.kernel.org/show_bug.cgi?id=67911 , it's a warn_slowpath_common.

If this one is in graphics stuff, could it be related to a later problem I have, the entire system hanging as soon as I try to start X with modesetting? https://bugs.freedesktop.org/show_bug.cgi?id=73133
Comment 11 Adam Williamson 2014-01-06 00:34:21 UTC
Someone commented on my G+ (where I've been blogging about this tablet) that there's a Yocto branch with a bunch of baytrail fixes:

http://git.yoctoproject.org/cgit.cgi/linux-yocto-contrib/log/?h=rebecca/base-baytrail

Just in case this wasn't already generally known. Looks like there's some activity from Intel on baytrail fixups over there, I presume it'll get submitted for merging at some point.
Comment 12 Alan 2014-01-06 12:05:51 UTC
There are several species of Baytrail, the relevant one for the tablets is Baytrail/T. I'm not sure what the Yocto enabling folk are working with but that tree seems to be a general dump of tons of forward porting.

There are folks working on Baytrail/T in Intel. I did pass your contact info on - I'll prod them further if need be.
Comment 13 Adam Williamson 2014-01-06 17:28:33 UTC
well, they said they'd got KMS video working with that tree. I haven't tried it myself yet, though.
Comment 14 Alan 2014-01-08 22:24:50 UTC
See the apic one here with 3.13-rc7, but not 3.12 on Asus TA100T

Trying to pin down the bad patch
(bug 68291)
Comment 15 Adam Williamson 2014-01-08 23:54:59 UTC
Should we close this as a dupe of that, then, or vice versa?
Comment 16 Adam Williamson 2014-01-17 01:55:37 UTC
I'm pretty sure this and Alan's are the same, if I apply the patch from https://bugzilla.kernel.org/show_bug.cgi?id=68291#c5 to my kernel this trace goes away (but I wind up hitting https://bugs.freedesktop.org/show_bug.cgi?id=71977 again). Marking as a dupe.

*** This bug has been marked as a duplicate of bug 68291 ***

Note You need to log in before you can comment on or make changes to this bug.