Created attachment 152241 [details] journalctl hello, if I want upgrade kernel 3.11.* to 3.16.* I have subj problems (radeon 0000:01:00.0: Fatal error during GPU init). With kernel 3.11.* all working. I attached systemctl --system (last booting with kernel 3.16*), uname -a, dmesg, lspci
Created attachment 152251 [details] dmesg
Created attachment 152261 [details] lspci
Created attachment 152271 [details] uname-a with working kernel
Can you narrow down when the problem started? Even better, can you bisect?
(In reply to Alex Deucher from comment #4) > Can you narrow down when the problem started? Even better, can you bisect? As soon as I updated the kernel. At version 3.11, everything was fine with version 3.16 of the problem.
(In reply to Zermond from comment #5) > (In reply to Alex Deucher from comment #4) > > Can you narrow down when the problem started? Even better, can you bisect? > > As soon as I updated the kernel. At version 3.11, everything was fine with > version 3.16 of the problem. Can you narrow it down any more than that? Does 3.12 work ok? 3.13? etc.
(In reply to Alex Deucher from comment #6) > (In reply to Zermond from comment #5) > > (In reply to Alex Deucher from comment #4) > > > Can you narrow down when the problem started? Even better, can you > bisect? > > > > As soon as I updated the kernel. At version 3.11, everything was fine with > > version 3.16 of the problem. > > Can you narrow it down any more than that? Does 3.12 work ok? 3.13? etc. I'm sorry, I do not know how to use the old kernel. I installed 3.17, but it also did not work. I installed the boot loader to the kernel boot 3.17 1 level and made dmesg, journalctl -xn I am attached.
Created attachment 152401 [details] dmesg with 3.17 kernel
Created attachment 152411 [details] journalctl with 3.17 kernel
Hi all, I have simillar problem and I tried to find some solution but with no success. Until kernel 3.15.10-201 worked everythink fine, but after upgrade to 3.16.2-200 (and every next kernel up to 3.16.7-200) instead radeon driver VESA is used (small resolution, kde gui is bit laggy probably because gpu acceleration is not used). My description maybe isn`t accurate but I will be happy to answer any of your questions. I have attached output of journalctl, lsmod, dmesg and Xorg.log for last working and first not working kernel. I am using Fedora 20 x64 on asus notebook M51Se with ati radeon HD3470 graphics.
Created attachment 157331 [details] logs for last working kernel
Created attachment 157341 [details] logs for first not working kernel
Marek, can you bisect or otherwise narrow down what kernel change caused the problem for you?
I can try, but this will be the first time I am going to do this. I have read this article: https://wiki.ubuntu.com/Kernel/KernelBisection and I am going to proceed accordingly.
Hi, I tried also Kubuntu with new kernel (newer than 3.15) and it was not working (previous versions of kernel were working also with Kubuntu) so it is not Fedora specific problem. The result of bisection is that the first bad commit is: e5558d1a516fa6924fa8d53152b665d4c26f142e Merge branches 'dma-api', 'pci/virtualization', 'pci/msi', 'pci/misc' and 'pci/resource' into next I took a look at code that was changed, but it is (yet) far beyond my abylities to come to some conclusion/quess. I am java developer and in the past I have written also few small C programs - so if needed I could help with some testing/debugging.
(In reply to Marek from comment #15) > The result of bisection is that the first bad commit is: > > e5558d1a516fa6924fa8d53152b665d4c26f142e Merge branches 'dma-api', > 'pci/virtualization', 'pci/msi', 'pci/misc' and 'pci/resource' into next In general, if the result of a bisection is a merge commit, it indicates something might have gone wrong during the bisection. In this case, I suspect the problem might not happen every time even with affected kernels, so you need to test several times before declaring a commit good. You can double-check this by testing commit e5558d1a516fa6924fa8d53152b665d4c26f142e again several times. Does it happen every time? If yes, test its parent commit(s) again several times. Does it never happen? If the answer to either question is no, I'm afraid you need to start the bisection again.
this looks like a PCI regresion окт 03 03:49:35 localhost.localdomain kernel: pci 0000:01:00.0: can't claim BAR 0 [mem 0xc0000000-0xcfffffff pref]: no compatible bridge window окт 03 03:49:35 localhost.localdomain kernel: pci 0000:01:00.0: BAR 0: can't assign mem pref (size 0x10000000) окт 03 03:49:35 localhost.localdomain kernel: pci 0000:01:00.0: BAR 0: trying firmware assignment [mem size 0x10000000 pref] окт 03 03:49:35 localhost.localdomain kernel: pci 0000:01:00.0: BAR 0: [mem size 0x10000000 pref] conflicts with PCI Bus 0000:02 [mem 0xc0000000-0xc01fffff] Bjorn? Dave.
It seemd weird also to me, that result of the bisect was a merge so I tried to build one of parents of this wrong merge commit (before Michael's comment) and then I was merging to the parent the other parents one by one: Parent: 518a6a34f645897ec3440e5cbcf53ced3493ee1c - good - I startded with this one Parent: 14574674e461077a9f4dd5eae050f622e8b8c084 - good - I merged this to the commit above Parent: 3cb30b73ad71b384c6289243d4ccd31ab90bce6f - good - I merged this to the commit above Parent: 034cd97ebda4062eb4402a6cf963ccd262caa86a - good - I merged this to the commit above Parent: 9edbcd2252b5ef148177c9f2c11a56469cf5db52 - good - I merged this to the commit above Parent: 67d29b5c6c40e91b124695e9250c2fd24915e24a - bad After Dave's comment I decided to merge commit 67d29b5c6c40e91b124695e9250c2fd24915e24a as the last. Based on this I think that it is possible that the commit we are looking for is one of commits between 67d29b5c6c40e91b124695e9250c2fd24915e24a and 0b2d70764bb39242dcc49c0ebd10fcb8258ce5fa
[ 0.113672] pci 0000:01:00.0: reg 0x10: [mem 0xc0000000-0xcfffffff pref] [ 0.113683] pci 0000:01:00.0: reg 0x14: [io 0xa000-0xa0ff] [ 0.113695] pci 0000:01:00.0: reg 0x18: [mem 0xfdff0000-0xfdffffff] [ 0.113729] pci 0000:01:00.0: reg 0x30: [mem 0xfdfc0000-0xfdfdffff pref] [ 0.113776] pci 0000:01:00.0: supports D1 D2 [ 0.115016] pci 0000:00:01.0: PCI bridge to [bus 01] [ 0.115022] pci 0000:00:01.0: bridge window [io 0x8000-0xafff] [ 0.115027] pci 0000:00:01.0: bridge window [mem 0xfdf00000-0xfdffffff] [ 0.115034] pci 0000:00:01.0: bridge window [mem 0xbdf00000-0xddefffff 64bit pref] so kernel reject the one for BAR0 in 01:00.0. and later can not allocate one... [ 0.169448] pci 0000:01:00.0: BAR 0: no space for [mem size 0x10000000 pref] [ 0.169453] pci 0000:01:00.0: BAR 0: trying firmware assignment [mem size 0x10000000 pref] [ 0.169458] pci 0000:01:00.0: BAR 0: [mem size 0x10000000 pref] conflicts with PCI Bus 0000:02 [mem 0xc0000000-0xc01fffff] [ 0.169461] pci 0000:01:00.0: BAR 0: failed to assign [mem size 0x10000000 pref] and it says [ 0.169657] pci_bus 0000:00: Some PCI device resources are unassigned, try booting with pci=realloc so boot with pci=realloc will fix the problem? also in old kernel: [ 0.187601] pci 0000:00:01.0: BAR 15: assigned [mem 0xc0000000-0xcfffffff pref] [ 0.187634] pci 0000:00:01.0: BAR 15: can't assign mem pref (size 0x10000000) [ 0.187638] pci 0000:00:01.0: failed to add 10000000 res[15]=[mem 0xc0000000-0xcfffffff pref] [ 0.187643] pci 0000:01:00.0: BAR 0: assigned [mem 0xc0000000-0xcfffffff pref] that means it has pci=realloc enabled by default.
Created attachment 159601 [details] save_big_align for later avoid 0xc0000000 to be taken early by device other than 00:01.0 and it is needed together with booting with "pci=realloc"
Zermond, Marek, is there any chance you could test these two commits: 5b28541552ef PCI: Restrict 64-bit prefetchable bridge windows to 64-bit resources 14c8530dbc1b PCI: Support BAR sizes up to 8GB I think 5b28541552ef is probably what broke this. 14c8530dbc1b is the preceeding commit and I suspect it will work.
I will test that, but probably not before sunday evening
Created attachment 159861 [details] clear mmio64 flags when children device does not support it Please apply this patch only on 3.17 or 3.18.
Bjorn: you were right - commit 14c8530dbc1b was working and 5b28541552ef did't work Yinghai: I applied your patch "clear mmio64 flags when..." to versions 3.17.0 and 3.18.0_rc7 and both patched version are working correctly
Hi, Zermond & Marek I am willing to take a look at this one. If possible, would you mind providing some information on both the bad one and the good one? What I need is: cat /proc/iomem lspci -vvv Thanks for your efforts in advance.
Created attachment 160081 [details] "cat /proc/iomem" and "lspci -vvv"
Created attachment 160091 [details] "cat /proc/iomem" and "lspci -vvv" for not working kernel 3.16.2
Hi Wei, of course - attachment https://bugzilla.kernel.org/attachment.cgi?id=160081 is for working kernel 3.15.10 (sorry, I did't add sufficient comment on that attachment)
(In reply to Marek from comment #26) > Created attachment 160081 [details] > "cat /proc/iomem" and "lspci -vvv" Thanks, give me some time to take a look. Hope I could help :-)
(In reply to Marek from comment #28) > Hi Wei, of course - attachment > https://bugzilla.kernel.org/attachment.cgi?id=160081 is for working kernel > 3.15.10 (sorry, I did't add sufficient comment on that attachment) hmm... could I ask for one more information? The output of lspci -t would be helpful for me to understand the topology of the pci tree :-)
(In reply to Wei Yang from comment #30) > (In reply to Marek from comment #28) > > Hi Wei, of course - attachment > > https://bugzilla.kernel.org/attachment.cgi?id=160081 is for working kernel > > 3.15.10 (sorry, I did't add sufficient comment on that attachment) > > hmm... could I ask for one more information? > > The output of lspci -t would be helpful for me to understand the topology of > the pci tree :-) sure - it is from version 3.15.10
Created attachment 160271 [details] output of "lspci -t" for kernel 3.15.10
Created attachment 160281 [details] output of "dmesg" from branch from Yinghai Lu git://git.kernel.org/pub/scm/linux/kernel/git/yinghai/linux-yinghai.git branch: for-pci-allocate-fit-3.18
(In reply to Marek from comment #33) > Created attachment 160281 [details] > output of "dmesg" from branch from Yinghai Lu > > git://git.kernel.org/pub/scm/linux/kernel/git/yinghai/linux-yinghai.git > branch: for-pci-allocate-fit-3.18 Thanks Marek, That is working as expected.
The problem is that BIOS programmed an invalid Root Port window leading to the Radeon device. The window contains the Radeon device, so the configuration actually *works* fine, but the window is invalid because it either overlaps system RAM or starts below the upstream host bridge window, so Linux discards it: Zermond's system: acpi PNP0A08:00: host bridge window [mem 0xc0000000-0xffffffff] (ignored) pci_bus 0000:00: root bus resource [mem 0x00000000-0xfffffffff] pci 0000:00:01.0: bridge window [mem 0xbdf00000-0xddefffff 64bit pref] # invalid Root Port window pci 0000:00:01.0: can't claim BAR 15 [mem 0xbdf00000-0xddefffff 64bit pref]: address conflict with System RAM [mem 0x00100000-0xbff9ffff] pci 0000:01:00.0: can't claim BAR 0 [mem 0xc0000000-0xcfffffff pref]: no compatible bridge window # Radeon Marek's system: pci_bus 0000:00: root bus resource [mem 0xc0000000-0xffffffff] (from _CRS) pci 0000:00:01.0: bridge window [mem 0xbdf00000-0xddefffff 64bit pref] # invalid Root Port window pci 0000:00:01.0: can't claim BAR 15 [mem 0xbdf00000-0xddefffff 64bit pref]: no compatible bridge window pci 0000:01:00.0: can't claim BAR 0 [mem 0xc0000000-0xcfffffff pref]: no compatible bridge window # Radeon The Root Port window had the same problem before 5b28541552ef, of course, since BIOS set it up. But before 5b28541552ef, Linux assigned a valid window big enough for the Radeon: pci 0000:00:01.0: bridge window [mem 0xc0000000-0xcfffffff pref] After 5b28541552ef, we won't put a 64-bit window below 4GB, so we assign space above 4GB: pci 0000:00:01.0: bridge window [mem 0x140000000-0x1401fffff 64bit pref] which is not usable by Radeon, since it only has a 32-bit BAR.
Zermond, Marek, do either of you have Windows on this box? If so, I'm interested in an AIDA64 dump. You can get a free trial version at http://www.aida64.com/downloads My suspicion is that Windows trims the Root Port window to fit inside the host bridge window. That would leave everything working almost identically to how the BIOS configured it, so it would be sort of a minimal change.
Created attachment 161621 [details] AIDA64 report
Here you are Bjorn, I am not sure whether this is the dump you have in mind - if it's not then let me know and I will give you what you need.
Created attachment 161761 [details] clip resource under bridge Please test it on top v3.18.
(In reply to Marek from comment #37) > Created attachment 161621 [details] > AIDA64 report so windows change from [mem 0xbdf00000-0xddefffff 64bit pref] to C0000000-DFFFFFFF ? Driver Description Mobile Intel(R) PM965/GM965/GL960/GS965 Express PCI Express Root Port - 2A01 Driver Date 6/21/2006 Driver Version 6.1.7601.17514 Driver Provider Microsoft INF File machine.inf Hardware ID PCI\VEN_8086&DEV_2A01&SUBSYS_2A018086&REV_03 Location Information PCI bus 0, device 1, function 0 PCI Device Intel GL960/GM965/PM965 Chipset - PCI Express Root Port [C-0] Device Resources: IRQ 131071 Memory 000A0000-000BFFFF Memory C0000000-DFFFFFFF Memory FDF00000-FDFFFFFF Port 03B0-03BB Port 03C0-03DF Port 8000-AFFF
Created attachment 161801 [details] photo of screen with kernel panic After applying second version of Yinghai's patch (pci_bridge_clip_v2.patch) to versions 3.18.0 and 3.18.1, boot process stops with kernel panic - I attached photo of screen with the log. (I need to find a way how to get the full log)
Created attachment 161841 [details] clip resource under bridge
please check updated patch.
Created attachment 161851 [details] dmesg log for kernel 3.18.1 and pci_bridge_clip_v3.patch Yinghai's patch v3 was tested on kernel version 3.18.1 and it works fine. I attached output of dmesg.
I've just submitted a bug report https://bugzilla.kernel.org/show_bug.cgi?id=90831 regarding my kernel's inability to claim bar 0, although under slightly different circumstances and with an NVidia card. Would you be able to tell me if the patch will help? In order to get to the stage at which the kernel reports that it can't claim Bar 0, I had to set pci=use_crs on the grub cmdline Thanks.
This should be resolved by the following commits, which appeared in v3.19: 3f2f4dc456e9 ("PCI: Pass bridge device, not bus, when updating bridge windows") 0f7e7aee2f37 ("PCI: Add pci_bus_clip_resource() to clip to fit upstream window") 8505e729a2f6 ("PCI: Add pci_claim_bridge_resource() to clip window if necessary") 851b09369255 ("x86/PCI: Clip bridge windows to fit in upstream windows") There are commits similar to 851b09369255 for arches other than x86.
Hi, ALL. Please advise whether this bug was fixed permanently or again present since 4.1 kernel. I have the following story with RV610 on ASUS F7SR laptop during kernel loading: cat journalctlb | grep -Ei 'radeon|drm' вер 06 16:00:59 h4os kernel: [drm] Initialized drm 1.1.0 20060810 вер 06 16:00:59 h4os kernel: [drm] radeon kernel modesetting enabled. вер 06 16:00:59 h4os kernel: [drm] initializing kernel modesetting (RV610 0x1002:0x94C9 0x1043:0x15B2). вер 06 16:00:59 h4os kernel: [drm] register mmio base: 0xFDEF0000 вер 06 16:00:59 h4os kernel: [drm] register mmio size: 65536 вер 06 16:00:59 h4os kernel: radeon 0000:01:00.0: VRAM: 256M 0x0000000000000000 - 0x000000000FFFFFFF (256M used) вер 06 16:00:59 h4os kernel: radeon 0000:01:00.0: GTT: 512M 0x0000000010000000 - 0x000000002FFFFFFF вер 06 16:00:59 h4os kernel: [drm] Detected VRAM RAM=256M, BAR=0M вер 06 16:00:59 h4os kernel: [drm] RAM width 64bits DDR вер 06 16:00:59 h4os kernel: radeon 0000:01:00.0: Fatal error during GPU init вер 06 16:00:59 h4os kernel: [drm] radeon: finishing device. вер 06 16:00:59 h4os kernel: [drm] radeon: ttm finalized вер 06 16:00:59 h4os kernel: radeon: probe of 0000:01:00.0 failed with error -12 All Details are gathered on arch forum: https://bbs.archlinux.org/viewtopic.php?id=202078
Hi Niam, Please post boot log with "debug ignore_logleve", so we can find out if the pci resource allocation cause the problem.
Created attachment 188021 [details] Arch linux 4.2-4 and rv610 dmesg I have a problem with RV610 on all kernels from 4.1 while 3.19 works perfect. Dmesg attached, was tracked while kernel was boot with: debug ignore_loglevel log_buf_len=10M print_fatal_signals=1 LOGLEVEL=8 earlyprintk=vga,keep sched_debug The problem is following radeon 0000:01:00.0: Fatal error during GPU init [ 0.694918] [drm] radeon: finishing device. [ 0.702587] [drm] radeon: ttm finalized [ 0.702776] radeon: probe of 0000:01:00.0 failed with error -12
(In reply to Yinghai Lu from comment #48) > Hi Niam, > > Please post boot log with "debug ignore_logleve", so we can find out if > the pci resource allocation cause the problem. Please see below.https://bugzilla.kernel.org/attachment.cgi?id=188021
Actually, probably these lines: ACPI : EC: GPE = 0x1c, I/O: command/status = 0x66, data = 0x62 [ 0.221255] vgaarb: setting as boot device: PCI:0000:01:00.0 [ 0.221255] vgaarb: device added: PCI:0000:01:00.0,decodes=io+mem,owns=io+mem,locks=none [ 0.221255] vgaarb: loaded [ 0.221255] vgaarb: bridge control possible 0000:01:00.0 [ 0.221255] PCI: Using ACPI for IRQ routing [ 0.228084] PCI: pci_cache_line_size set to 64 bytes [ 0.228100] pci 0000:00:01.0: can't claim BAR 15 [mem 0xbdf00000-0xddefffff 64bit pref]: no compatible bridge window [ 0.228119] pci 0000:00:01.0: [mem size 0x20000000 64bit pref] clipped to [mem size 0x1df00000 64bit pref] [ 0.228137] pci 0000:00:01.0: bridge window [mem size 0x1df00000 64bit pref] [ 0.228155] pci 0000:00:01.0: can't claim BAR 15 [mem size 0x1df00000 64bit pref]: no address assigned [ 0.228180] pci 0000:01:00.0: can't claim BAR 0 [mem 0xc0000000-0xcfffffff pref]: no compatible bridge window shows that something goes wrong with initialization of GPU. The question is - what to do and how to fix his in new kernels.
please check patch: commit a4ad03352739c96842af5d06387595665cdd875e Author: Bjorn Helgaas <bhelgaas@google.com> Date: Fri Sep 18 17:15:01 2015 -0500 PCI: Clear IORESOURCE_UNSET when clipping a bridge window c770cb4cb505 ("PCI: Mark invalid BARs as unassigned") sets IORESOURCE_UNSET if we fail to claim a resource. If we tried to claim a bridge window, failed, clipped the window, and tried to claim the clipped window, we failed again because of IORESOURCE_UNSET. When pci_bus_clip_resource() clips a bridge window to fit inside an upstream window, we're reassigning the window, so clear the IORESOURCE_UNSET flag. Also clear IORESOURCE_UNSET in our copy of the unclipped window so we can see exactly what the original window was and how it now fits inside the upstream window. Fixes: c770cb4cb505 ("PCI: Mark invalid BARs as unassigned") Based-on-patch-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> CC: stable@vger.kernel.org # 4.1+ diff --git a/drivers/pci/bus.c b/drivers/pci/bus.c index 6fbd3f2..d3346d2 100644 --- a/drivers/pci/bus.c +++ b/drivers/pci/bus.c @@ -256,6 +256,8 @@ bool pci_bus_clip_resource(struct pci_dev *dev, int idx) res->start = start; res->end = end; + res->flags &= ~IORESOURCE_UNSET; + orig_res.flags &= ~IORESOURCE_UNSET; dev_printk(KERN_DEBUG, &dev->dev, "%pR clipped to %pR\n", &orig_res, res);
(In reply to Yinghai Lu from comment #52) > please check patch: > > commit a4ad03352739c96842af5d06387595665cdd875e > Author: Bjorn Helgaas <bhelgaas@google.com> > Date: Fri Sep 18 17:15:01 2015 -0500 > > PCI: Clear IORESOURCE_UNSET when clipping a bridge window > > c770cb4cb505 ("PCI: Mark invalid BARs as unassigned") sets > IORESOURCE_UNSET > if we fail to claim a resource. If we tried to claim a bridge window, > failed, clipped the window, and tried to claim the clipped window, we > failed again because of IORESOURCE_UNSET. > > When pci_bus_clip_resource() clips a bridge window to fit inside an > upstream window, we're reassigning the window, so clear the > IORESOURCE_UNSET flag. Also clear IORESOURCE_UNSET in our copy of the > unclipped window so we can see exactly what the original window was and > how > it now fits inside the upstream window. > > Fixes: c770cb4cb505 ("PCI: Mark invalid BARs as unassigned") > Based-on-patch-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com> > Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> > CC: stable@vger.kernel.org # 4.1+ > > diff --git a/drivers/pci/bus.c b/drivers/pci/bus.c > index 6fbd3f2..d3346d2 100644 > --- a/drivers/pci/bus.c > +++ b/drivers/pci/bus.c > @@ -256,6 +256,8 @@ bool pci_bus_clip_resource(struct pci_dev *dev, int idx) > > res->start = start; > res->end = end; > + res->flags &= ~IORESOURCE_UNSET; > + orig_res.flags &= ~IORESOURCE_UNSET; > dev_printk(KERN_DEBUG, &dev->dev, "%pR clipped to %pR\n", > &orig_res, res); Thank you, but the main question - whether it will be included in the main line vanila kernel and from whitch version? If you need my check before - then I need to build custom kernel based on it, because the kernels I have mentioned before - there were stock prebuild kernels for Arch linux.
Niam, Please try to build one customized kernel with that patch. should be on top of kernel after v4.1
Created attachment 189381 [details] Vanilla 4.2.2 patch test = WORKS! commit a4ad03352739c96842af5d06387595665cdd875e commit a4ad03352739c96842af5d06387595665cdd875e testing on Vanilla 4.2.2 under ARCH = WORKS!
(In reply to Yinghai Lu from comment #54) > Niam, > > Please try to build one customized kernel with that patch. > should be on top of kernel after v4.1 Have tested with 4.2.2 Vanilla kernel + .config from Arch 4.2.2-1 = WORKS! GPU now detected, attached is dmesg with ignore_loglevel approcing this. Please note that this patch is still NOT included in the main line kernel! :(
> Please note that this patch is still NOT included in the main line kernel! :( This patch appeared in v4.3-rc3: http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=b838b39e930a It will be released in v4.3, and it's marked for stable, so it will likely be backported to the stable kernels for v4.1 and later.