Bug 12861
Summary: | Xorg fails to start "Failed to allocate space for kernel memory manager" | ||
---|---|---|---|
Product: | Drivers | Reporter: | Emil Karlson (jekarlson) |
Component: | Video(DRI - non Intel) | Assignee: | Jesse Barnes (jbarnes) |
Status: | CLOSED INVALID | ||
Severity: | normal | CC: | airlied, eric, jbarnes, rjw, serge |
Priority: | P1 | ||
Hardware: | All | ||
OS: | Linux | ||
Kernel Version: | 2.6.29-rc5 | Subsystem: | |
Regression: | Yes | Bisected commit-id: | |
Bug Depends on: | |||
Bug Blocks: | 12398 | ||
Attachments: |
dmesg from 2.6.29-rc7
Xorg log strace output linux config from the first broken commit strace output for good 2.6.29-rc4-wl (wireless testing) xorg strace on libdrm-2.4.5 (broken 2.6.29-rc8) |
Description
Emil Karlson
2009-03-12 12:06:31 UTC
Created attachment 20507 [details]
dmesg from 2.6.29-rc7
Created attachment 20508 [details]
Xorg log
Thanks for bisecting. The result is a bit perplexing... do you get any other meaningful information in /var/log/syslog or /var/log/messages? Can you try running X under strace (maybe strace -f -o xoutput startx)? Also, especially if this isn't 100% reproducible, it might be worth going to the previous commit and booting with that a few times to make sure this is the bad commit. Bug is still present in 2.6.29-rc8 I can actually get X to start if I set i915.modeset=1, but not in a useable state (no cursor and no keyboard) I did explicitly confirm bisect, on second run oldconfig asked about dma remapping (set to N) kant ~ # diff -u /boot/config /boot/config.old -# CONFIG_DMAR_DEFAULT_ON is not set With 99cbb86180bccd77f331f6e8eb7ce26aeea2cb72 I got normal operation, the other reproduces the bug. 491 git checkout 99cbb86180bccd77f331f6e8eb7ce26aeea2cb72 492 cp /boot/config-2.6.29-rc4 .config 493 make oldconfig 494 make -j3 all && make install modules_install && make clean mrproper && init 6 495 cd /usr/src/linux-2.6 496 git checkout fb5ae64fdde29236e1a15e0366946df7060f41f2 497 make oldconfig 498 make -j3 all && make install modules_install Created attachment 20525 [details]
strace output
Sorry no other useful logs available.
Please attach your .config. Created attachment 20529 [details]
linux config from the first broken commit
Thanks, so both cgroups (and therefore config_user_sched) and user namespaces are turned off. Meaning put_user_ns(), which is the only thing moved in this commit, is an empty static inline function. Suspicious :) Sorry I had missed the fact that you'd appended strace output last night. Could you also upload the result of a successful strace? thanks, -serge Created attachment 20533 [details]
strace output for good 2.6.29-rc4-wl (wireless testing)
This may be a bit different since I upgraded to libdrm-2.4.5 and ~-intel-2.6.3 ...
Which btw brings up only black screen on UXA when starting X
[drm:i915_setparam] *ERROR* unknown parameter 4
[drm:i915_getparam] *ERROR* Unknown parameter 6
[drm:i915_getparam] *ERROR* Unknown parameter 6
Probably irrelevant, ask if you want matching strace on broken revision.
Created attachment 20534 [details]
xorg strace on libdrm-2.4.5 (broken 2.6.29-rc8)
Well generated one anyways...
Yeah, please do give a new bad strace output. I don't know about x to know which differences are relevant. Based on what I'm seeing so far, though, it looks as though the key lines would be: 3491 open("/dev/agpgart", O_RDWR) = 10 3491 ioctl(10, AGPIOC_ACQUIRE or APM_IOC_STANDBY, 0) = -1 EBUSY (Device or resource busy) According to drivers/char/agp/frontend.c this means that agp_fe.current_controller is already set or &agp_bridge->agp_in_use is not 0. Is X already running on some other console when it fails??? No other X processes were running. startx outputs btw (EE) GARTInit: AGPIOC_INFO failed (Invalid argument) (EE) intel(0): Failed to allocate space for kernel memory manager (EE) intel(0): Failed to allocate framebuffer. Is your VideoRAM set too low? (EE) intel(0): Couldn't allocate video memory Already attached new strace. Right. AGPIOC_INFO returns -EINVAL only if agp_fe.current_controller==NULL. AGPIOC_ACQUIRE returns -EBUSY only if agp_fe.current_controller!=NULL or &agp_bridge->agp_in_use!=0. So agp_in_use is not 0. We need someone who knows about X and agp to tell us under what conditions that can happen. I don't know who that is. One more question - does the 'No AGP bridge found' early in dmesg also show up when you boot with a kernel that works? "Yes No AGP bridge found" can also be found in dmesg on working kernel 2.6.29-rc4-wl Hi David, could you look over the comments in this post and tell me if it rings any bells about what sorts of situations would cause the case where, when running startx, agp_bridge->agp_in_use > 0? The kernel bisect results claim that this started happening as of a patch (which I wrote) which, in the test machine, moves some calls to a static inline empty function (put_user_ns) around. You can always question the bisect. Still haven't seen anything newer than 99cbb86180bccd77f331f6e8eb7ce26aeea2cb72 (which I retested) work properly. I also find curious how oldconfig asks about dma remapping being on by default, even though there is nothing new in kernel configuration system between these 2 revisions afaik. This particular trait seems to correlate with xorg being able to start. Perhaps I just suck at git, still I am quite confident that there actually is a bug ;) - without git the regression is limited to rc4...rc5. Also tried not compiling alsa and not compiling dma remapping as well as setting it on as default, none of which has any effect. Sorry - are you saying that without git, rc6 works fine? I was not saying anything radically new or relevant, anything >=rc5 fails, sorry for the poor phrasing. (In reply to comment #4) > Bug is still present in 2.6.29-rc8 > > I can actually get X to start if I set i915.modeset=1, but not in a useable > state (no cursor and no keyboard) > > I did explicitly confirm bisect, on second run oldconfig asked about dma > remapping (set to N) > > kant ~ # diff -u /boot/config /boot/config.old > -# CONFIG_DMAR_DEFAULT_ON is not set > > With 99cbb86180bccd77f331f6e8eb7ce26aeea2cb72 I got normal operation, the > other > reproduces the bug. > > 491 git checkout 99cbb86180bccd77f331f6e8eb7ce26aeea2cb72 Wait, what is this one? Was this one successfull? BC this commit is from Mar 13 and is actually not applied in Linus' main branch (IIUC). > 492 cp /boot/config-2.6.29-rc4 .config > 493 make oldconfig > 494 make -j3 all && make install modules_install && make clean mrproper && > init 6 > 495 cd /usr/src/linux-2.6 > 496 git checkout fb5ae64fdde29236e1a15e0366946df7060f41f2 You also didn't copy /boot/config-2.6.29-rc4 .config between the two builds. Which could be a problem precisely because the first tree you compiled was a month newer than the second one. So I think you want to try the following to really confirm whether X starts failing at the commit being blamed: make clean && make distclean && make mrproper && clean this bad boy out ferreal git checkout 1d7b33f77b2d8b0b1ee767e6f8f05cbd9d72cb7c cp /boot/config-2.6.29-rc4 .config make oldconfig && make -j3 all && make install modules_install && init 6 [ confirm that X works ] git checkout fb5ae64fdde29236e1a15e0366946df7060f41f2 cp /boot/config-2.6.29-rc4 .config make oldconfig && make -j3 all && make install modules_install && init 6 [ confirm that X does not work ] Sorry, the first commit id should be: 14fa43f53ff3a9c3d8b9662574b7369812a31a97 not 1d7b33f77b2d8b0b1ee767e6f8f05cbd9d72cb7c In fact, it might not hurt to start in a fresh directory doing git clone git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6.git and testing there. One more note - if you do a: git diff 99cbb86180bccd77f331f6e8eb7ce26aeea2cb72 fb5ae64fdde29236e1a15e0366946df7060f41f2 you see there are in fact a lot of changes, including agp and drm related changes. Ok did monte carlo on new bisect: ab657db12d7020629f26f30d287558a8d0e32b41 is first bad commit commit ab657db12d7020629f26f30d287558a8d0e32b41 Author: Eric Anholt <eric@anholt.net> Date: Fri Jan 23 12:57:47 2009 -0800 drm/i915: Set up an MTRR covering the GTT at driver load. We'd love to just be using PAT, but even on chips with PAT it gets disabled sometimes due to an errata. It would probably be better to have pat_enabled exported and only bother with this when !pat_enabled. Signed-off-by: Eric Anholt <eric@anholt.net> Signed-off-by: Dave Airlie <airlied@linux.ie> :040000 040000 38f014e90bbc5787444f7be042af496b5ffc4048 ac7d4a1c36531cb76bdff3f32fefbbf918ec89e5 M drivers I am still not comfortable of getting rc3 on uname -a, wonder why. Isn't http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=99cbb86180bccd77f331f6e8eb7ce26aeea2cb72 a proper commit - listed in http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=shortlog;h=v2.6.29-rc5 just before the commit blamed earlier... CONFIG_X86_PAT=y In case you are wondering. Needless to say I used a fresh clone from linus's tree. build command 479 cd /usr/src/linux-2.6/ && cp /boot/config-2.6.29-rc4 .config && make oldconfig && make -j3 all && make install modules_install && make clean mrproper distclean && init 6 Eric Anholdt, Dave Airlie, can you take a quick look at this regression introduced by ab657db12d7020629f26f30d287558a8d0e32b41 ? On Tuesday 07 April 2009, John Emil Karlson wrote: > Yes the bug is still present in 2.6.29.1. > > Venkatesh Pallipadi fixed an apparently similar bug there. I could ask him > for advice once I have more time. > > > http://git.kernel.org/?p=linux/kernel/git/stable/linux-2.6.29.y.git;a=commit;h=e1b427acc979431fc7f57a06d0c636c542fdffcc First-Bad-Commit : ab657db12d7020629f26f30d287558a8d0e32b41 Notify-Also : DRI <dri-devel@lists.sourceforge.net> This is the only case I've heard of this problem, and this: (==) intel(0): VideoRam: 16124 KB is a bit worrying. I'd expect problems with a videoram setting this low. Are you forcing it by hand? Can you attach your xorg.conf? Note that forcing it isn't generally necessary. Physical RAM will only be used when the server actually binds the memory. The "videoram" line in the log just reports the aperture size (which is the maximum amount the X server is allowed to use). The memory manager in recent kernels assumes it has the full aperture to play with; if you want to limit its memory usage we'd need to patch it instead. Oh nevermind, looking at the report I see that your AGP init is failing so the driver is just trying to use stolen memory (which explains the 16M videoram). I guess we just need to figure out why your AGP driver is returning EBUSY when the driver tries to initialize it... Dave? Also, does this still happen with 2.6.30-rc? Bug is still present in linux-2.6.30-rc4. Still present in 2.6.30-rc6. Also the test hardware is being removed so you may want to close this bug, until someone else does reproduce it. Ok, thanks. Yeah I haven't heard other reports of this; I suspect a config issue or a related bug, several of which have been fixed now. |