Bug 15469

Summary: [Intel Graphics HD] Kernel panic on boot with certain BIOS options
Product: Drivers Reporter: Artem S. Tashkinov (aros)
Component: Video(DRI - Intel)Assignee: drivers_video-dri-intel (drivers_video-dri-intel)
Status: RESOLVED PATCH_ALREADY_AVAILABLE    
Severity: blocking CC: airlied, akpm, chris, chrisw, daniel, drivers_video-dri, jbarnes, nanericwang, torvalds, yakui.zhao
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 2.6.34 Subsystem:
Regression: No Bisected commit-id:
Attachments: Display shots showing kernel panic information
dmesg
trim stolen space if needed
dmesg with the patch applied
Fail to load KMS without GEM.

Description Artem S. Tashkinov 2010-03-07 14:43:22 UTC
I own ASRock H55DE3 with BIOS v2.0 installed (the newest release).

There are two options concerning embedded Intel Graphics on this motherboard: "DVMT Fixed Memory" and "Share Memory".

When both of these options are set to 128MB the kernel boots up successfully, everything works fine.

When I set "DVMT Fixed Memory" to "Max" and "Share Memory" to 256MB the kernel dies trying to initialize i915 kernel module.

I cannot give any meaningful information about the error, because as soon as the module loads, the screen becomes garbled up and I don't see the error messages.

My hardware is:

GPU: 00:01.0 PCI bridge [0604]: Intel Corporation Auburndale/Havendale PCI Express x16 Root Port [8086:0041] (rev 12)
CPU: Intel Core i5 650 CPU.
MB: ASRock H55DE3
Comment 1 Artem S. Tashkinov 2010-03-08 12:44:09 UTC
Created attachment 25406 [details]
Display shots showing kernel panic information
Comment 2 Artem S. Tashkinov 2010-03-08 12:47:11 UTC
Even with "DVMT Fixed Memory" and "Share Memory" both set to 256MB kernel also panics.
Comment 3 Artem S. Tashkinov 2010-04-21 09:59:42 UTC
Who else I should subscribe this bug to, so that it was noticed by kernel developers?

I'm now manually transcribing those screenshots:

EIP: 0060:[<f845ff1b>] EFLAGS: 00010206 CPU: 1
EIP us at drm_mm_search_free+0x6b/0x90 [drm]
EAX: 00002000 EBX: f6798de0 ECX: 00000000 EDX: 00001000
ESI: f76a86c0 EDI: f6d93300 EBP: f6739cec ESP: f6739cc4
 DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068


...

Call Trace:
[<f87519ab>] ? i915_gem_object_bind_to_gtt+0x6b/0x1e0 [i915]
[<f8751ba4>] ? i915_gem_object_pin+0x84/0x90 [i915]
[<f8751c00>] ? i915_gem_init_ringbuffer+0x50/0x460 [i915]
[<f845fe0e>] ? drm_mm_create_tail_node+0x1e/0x70 [drm]

...

Kernel 2.6.34-rc5 is also affected.
Comment 4 Artem S. Tashkinov 2010-05-05 19:18:57 UTC
It's kind of stupid to report (no i915 patches have been merged after rc5), but kernel 2.6.34-rc6 is also affected.

I can post the entire backtrace in a text form if anyone's interested.
Comment 5 Daniel Vetter 2010-05-05 19:47:37 UTC
> --- Comment #4 from Artem S. Tashkinov <t.artem@mailcity.com>  2010-05-05
> 19:18:57 ---
> It's kind of stupid to report (no i915 patches have been merged after rc5),
> but
> kernel 2.6.34-rc6 is also affected.
> 
> I can post the entire backtrace in a text form if anyone's interested.

I have an idea of what's going on + an idea of how to prove it ;)

Can you please boot your box with the crashing bios options, but don't load
the i915 module (delete it if nothing else helps). Manually load the
intel-agp module and post the dmesg output here (the messages by the
intel-agp module are the important stuff). Thanks.
Comment 6 Artem S. Tashkinov 2010-05-05 20:38:35 UTC
Created attachment 26245 [details]
dmesg

(In reply to comment #5)
> > --- Comment #4 from Artem S. Tashkinov <t.artem@mailcity.com>  2010-05-05
> 19:18:57 ---
> > It's kind of stupid to report (no i915 patches have been merged after rc5),
> but
> > kernel 2.6.34-rc6 is also affected.
> > 
> > I can post the entire backtrace in a text form if anyone's interested.
> 
> I have an idea of what's going on + an idea of how to prove it ;)
> 
> Can you please boot your box with the crashing bios options, but don't load
> the i915 module (delete it if nothing else helps). Manually load the
> intel-agp module and post the dmesg output here (the messages by the
> intel-agp module are the important stuff). Thanks.

Here it is.
Comment 7 Artem S. Tashkinov 2010-05-07 22:25:07 UTC
I hate to be annoying but this time I have to be, because I don't like the idea of not being able to use all the system memory Intel HD Graphics can potentially claim and use.

Besides, the next stable kernel is closing in and it will be the second stable Linux release I won't be able to use without crippling my hardware.

I ain't a Linux kernel expert but these messages seem bogus to me:

> [drm:i915_driver_load] *ERROR* Detected broken video BIOS with
> 262140/262144kB of video memory stolen.
> [drm:i915_driver_load] *ERROR* Disabling GEM. (try reducing stolen memory or
> updating the BIOS to fix).

My BIOS is certainly not broken and all future Intel on-die GPUs may use up to 1.7GB(1) of the system RAM.

References:

1. http://www.tomshardware.com/reviews/intel-clarkdale-core-i5-661,2514-4.html

Intel makes up to 1.7GB of system memory available to graphics, as with its previous-generation integrated graphics core, but there’s really no reason to dedicate that much RAM in light of the GPU’s performance characteristics.
Comment 8 Daniel Vetter 2010-05-08 13:10:24 UTC
Well, these bios settings _are_ broken. They essentially steel 256MB of system memory for the intel igd, which has two effects:

- Your usable system memory decreased by 256 (linux can't use this memory).
- You've reduced the memory available to the igd to zero (which then caused the kernel oops later on).

So please change the bios settings to something sensible (32MB of so called stolen space is enough) - linux will allocate anything it needs dynamically anyway. This stolen space is only used to support certain special hw features that absolutely require it.

I can't close this as "not a bug" due to not having sufficient bz permissions. Can somebody else please take care of that?
Comment 9 Artem S. Tashkinov 2010-05-08 13:40:16 UTC
OK, the *default* BIOS settings for H55/H57 chipset on all ASRock motherboards (and probably other vendors too) are:

Share memory:          Auto
DVMT/FIXED Memory:     Maximum DVMT

These are "broken" settings as you say, so, people should refrain from using Linux and should only use Windows because Microsoft OS works beautifully with this setup and all 3D intensive applications run much smoother and faster.

P.S. Available values for these two options are:

Share Memory:          Auto/32MB/64MB/128MB/256MB
DVMT/FIXED Memory:     128MB/256MB/Maximum DVMT
Comment 10 Jesse Barnes 2010-05-08 21:03:36 UTC
Created attachment 26287 [details]
trim stolen space if needed

I was hoping this patch wouldn't be necessary, but your BIOS seems to indicate that it is.  Can you give it a try and see if it helps your stolen memory problem?
Comment 11 Artem S. Tashkinov 2010-05-09 07:54:53 UTC
Created attachment 26294 [details]
dmesg with the patch applied

Thank you, the patch seems to work, feel free to close this bug report as soon as this patch is committed to the mainline.

$ cat /proc/iomem
00000000-0000ffff : reserved
00010000-0009f7ff : System RAM
0009f800-0009ffff : reserved
000a0000-000bffff : Video RAM area
000c0000-000c7fff : Video ROM
000cd000-000ce7ff : Adapter ROM
000e4000-000fffff : reserved
  000f0000-000fffff : System ROM
00100000-ab76ffff : System RAM
  01000000-01280681 : Kernel code
  01280682-01368487 : Kernel data
  013c0000-01426e73 : Kernel bss
ab770000-ab77ffff : ACPI Tables
ab780000-ab7cffff : ACPI Non-volatile Storage
ab7d0000-ab7dffff : reserved
ab7e0000-ab7eb3ff : RAM buffer
ab7eb400-bfffffff : reserved
c0000000-c03fffff : PCI Bus 0000:03
c0400000-c07fffff : PCI Bus 0000:02
c0800000-c0800fff : Intel Flush Page
d0000000-dfffffff : 0000:00:02.0
e0000000-efffffff : PCI MMCONFIG 0000 [bus 00-ff]
  e0000000-efffffff : pnp 00:0d
fad00000-fadfffff : PCI Bus 0000:01
  fadf8000-fadfbfff : 0000:01:00.0
    fadf8000-fadfbfff : r8169
  fadff000-fadfffff : 0000:01:00.0
    fadff000-fadfffff : r8169
fae00000-faefffff : PCI Bus 0000:02
faf00000-faffffff : PCI Bus 0000:03
fb800000-fbbfffff : 0000:00:02.0
fbdf6000-fbdf63ff : 0000:00:1a.0
  fbdf6000-fbdf63ff : ehci_hcd
fbdf8000-fbdfbfff : 0000:00:1b.0
  fbdf8000-fbdfbfff : ICH HD audio
fbdfc000-fbdfc3ff : 0000:00:1d.0
  fbdfc000-fbdfc3ff : ehci_hcd
fbdffc00-fbdffcff : 0000:00:1f.3
fbe00000-fbefffff : PCI Bus 0000:01
  fbee0000-fbefffff : 0000:01:00.0
fbf00000-fbffffff : PCI Bus 0000:04
  fbfc0000-fbfdffff : 0000:04:01.0
    fbfc0000-fbfdffff : e100
  fbfe0000-fbfeffff : 0000:04:01.0
  fbffe000-fbffefff : 0000:04:01.0
    fbffe000-fbffefff : e100
fc000000-fcffffff : pnp 00:01
fd000000-fdffffff : pnp 00:01
fe000000-febfffff : pnp 00:01
fec00000-fec003ff : IOAPIC 0
fed00000-fed003ff : HPET 0
fed14000-fed19fff : pnp 00:01
fed1c000-fed1ffff : pnp 00:07
fed20000-fed3ffff : pnp 00:07
fed40000-fed8ffff : pnp 00:07
fee00000-fee00fff : Local APIC
  fee00000-fee00fff : reserved
    fee00000-fee00fff : pnp 00:09
ffa00000-ffffffff : reserved
100000000-13fffffff : System RAM

$ free
             total       used       free     shared    buffers     cached
Mem:       3811104     695212    3115892          0      39032     364240
-/+ buffers/cache:     291940    3519164
Swap:            0          0          0
Comment 12 Artem S. Tashkinov 2010-05-09 08:49:43 UTC
I've got a little question: with or without this patch approximately 384MB of the system RAM is missing, can anyone shed light where this fat piece of RAM has gone? I do understand that some of the system RAM is claimed by the embedded GPU but then again roughly 128MB is missing.

(4*1024*1024-3811104)/1024 ~ 374MB + kernel RAM ~ 384MB.
Comment 13 Jesse Barnes 2010-05-09 21:34:01 UTC
My patch is only a partial fix; it won't actually reclaim the stolen space allocated by the BIOS, so it will appear to be missing.
Comment 14 Artem S. Tashkinov 2010-05-10 10:06:22 UTC
Funnily 64 bit kernel sees even less RAM:

$ free
             total       used       free     shared    buffers     cached
Mem:       3724300     401916    3322384          0      30668     189920
-/+ buffers/cache:     181328    3542972
Swap:            0          0          0

$ cat /proc/iomem
00000000-0000ffff : reserved
00010000-0009f7ff : System RAM
0009f800-0009ffff : reserved
000c0000-000cffff : pnp 00:0e
000e4000-000fffff : reserved
00100000-ab76ffff : System RAM
  01000000-012d98bd : Kernel code
  012d98be-014147ff : Kernel data
  0147d000-014e9c77 : Kernel bss
ab770000-ab77ffff : ACPI Tables
ab780000-ab7cffff : ACPI Non-volatile Storage
ab7d0000-ab7dffff : reserved
ab7e0000-ab7eb3ff : RAM buffer
ab7eb400-bfffffff : reserved
c0000000-c03fffff : PCI Bus 0000:03
c0400000-c07fffff : PCI Bus 0000:02
c0800000-c0800fff : Intel Flush Page
d0000000-dfffffff : 0000:00:02.0
e0000000-efffffff : PCI MMCONFIG 0000 [bus 00-ff]
  e0000000-efffffff : pnp 00:0d
fad00000-fadfffff : PCI Bus 0000:01
  fadf8000-fadfbfff : 0000:01:00.0
    fadf8000-fadfbfff : r8169
  fadff000-fadfffff : 0000:01:00.0
    fadff000-fadfffff : r8169
fae00000-faefffff : PCI Bus 0000:02
faf00000-faffffff : PCI Bus 0000:03
fb800000-fbbfffff : 0000:00:02.0
fbdf6000-fbdf63ff : 0000:00:1a.0
  fbdf6000-fbdf63ff : ehci_hcd
fbdf8000-fbdfbfff : 0000:00:1b.0
  fbdf8000-fbdfbfff : ICH HD audio
fbdfc000-fbdfc3ff : 0000:00:1d.0
  fbdfc000-fbdfc3ff : ehci_hcd
fbdffc00-fbdffcff : 0000:00:1f.3
fbe00000-fbefffff : PCI Bus 0000:01
  fbee0000-fbefffff : 0000:01:00.0
fbf00000-fbffffff : PCI Bus 0000:04
  fbfc0000-fbfdffff : 0000:04:01.0
    fbfc0000-fbfdffff : e100
  fbfe0000-fbfeffff : 0000:04:01.0
  fbffe000-fbffefff : 0000:04:01.0
    fbffe000-fbffefff : e100
fc000000-fcffffff : pnp 00:01
fd000000-fdffffff : pnp 00:01
fe000000-febfffff : pnp 00:01
fec00000-fec003ff : IOAPIC 0
fed00000-fed003ff : HPET 0
fed14000-fed19fff : pnp 00:01
fed1c000-fed1ffff : pnp 00:07
fed20000-fed3ffff : pnp 00:07
fed40000-fed8ffff : pnp 00:07
fee00000-fee00fff : Local APIC
  fee00000-fee00fff : reserved
    fee00000-fee00fff : pnp 00:09
ffa00000-ffffffff : reserved
100000000-13fffffff : System RAM

$ cat /proc/mtrr
reg00: base=0x000000000 (    0MB), size= 2048MB, count=1: write-back
reg01: base=0x080000000 ( 2048MB), size=  512MB, count=1: write-back
reg02: base=0x0a0000000 ( 2560MB), size=  128MB, count=1: write-back
reg03: base=0x0a8000000 ( 2688MB), size=   64MB, count=1: write-back
reg04: base=0x100000000 ( 4096MB), size= 1024MB, count=1: write-back
reg05: base=0x0d0000000 ( 3328MB), size=  256MB, count=1: write-combining

$ uname -a
Linux localhost.localdomain 2.6.33.3-ic64 #2 SMP PREEMPT Mon May 10 09:06:02 UTC 2010 x86_64 x86_64 x86_64 GNU/Linux
Comment 15 Chris Wilson 2010-05-17 08:29:22 UTC
Created attachment 26408 [details]
Fail to load KMS without GEM.

The other half of this is to prevent the OOPS should we ever disable GEM but still attempt to use KMS.
Comment 16 Jesse Barnes 2010-05-17 14:54:27 UTC
*** Bug 15754 has been marked as a duplicate of this bug. ***
Comment 17 Artem S. Tashkinov 2010-05-17 16:13:46 UTC
(In reply to comment #15)
> Created an attachment (id=26408) [details]
> Fail to load KMS without GEM.
> 
> The other half of this is to prevent the OOPS should we ever disable GEM but
> still attempt to use KMS.

Chris, I need advise, should I apply this patch too, or this patch only, if I'm running 2.6.34?
Comment 18 Chris Wilson 2010-05-17 17:26:57 UTC
(In reply to comment #17)
> Chris, I need advise, should I apply this patch too, or this patch only, if
> I'm
> running 2.6.34?

Jesse's is an attempt to workaround the misreported aperture and continue working. My patch just makes sure that if we do fail, than we fail gracefully and not OOPS - so there's no need to apply it if your machine boots.
Comment 19 Artem S. Tashkinov 2010-06-26 22:29:19 UTC
2.6.35-rc3, i915 doesn't work (it loads but it doesn't allow me to run X server, so now I'm running vesa), at least the kernel doesn't panic any longer:

[    3.699679] Linux agpgart interface v0.103
[    3.743327] agpgart-intel 0000:00:00.0: Intel HD Graphics Chipset
[    3.744086] agpgart-intel 0000:00:00.0: detected 262140K stolen memory
[    3.789925] agpgart-intel 0000:00:00.0: AGP aperture is 256M @ 0xd0000000
[    4.047069] i915 0000:00:02.0: PCI INT A -> GSI 16 (level, low) -> IRQ 16
[    4.047073] i915 0000:00:02.0: setting latency timer to 64
[    4.057692] [drm:i915_driver_load] *ERROR* Detected broken video BIOS with 262140/262144kB of video memory stolen.
[    4.057695] [drm:i915_driver_load] *ERROR* Disabling GEM. (try reducing stolen memory or updating the BIOS to fix).
[    4.057697] [drm:i915_driver_load] *ERROR* kernel modesetting requires GEM, disabling driver.
[    4.067968] i915 0000:00:02.0: PCI INT A disabled

I cannot update BIOS, I'm running the newest/latest one (http://www.asrock.com/mb/download.asp?Model=H55DE3&o=BIOS):

# dmidecode:
BIOS Information
        Vendor: American Megatrends Inc.
        Version: P2.20
        Release Date: 04/13/2010
        Address: 0xF0000
        Runtime Size: 64 kB
        ROM Size: 2048 kB
Comment 20 Artem S. Tashkinov 2010-07-05 20:43:52 UTC
Chris, I would be very glad and grateful if I could use upcoming 2.6.35 without patches, after all the bug has now been known for *four* months.
Comment 21 Chris Wilson 2010-07-05 21:01:27 UTC
The don't crash patch is upstream. That is all we can expect for 2.6.35, as we don't have a method yet for returning the range reserved by the BIOS to the kernel.
Comment 22 Jesse Barnes 2010-07-07 21:41:58 UTC
Just sent out a new version of the stolen space trim patch.  Can you reply with your tested-by so we can get it upstream?  If you want it in older kernels as well you can request that it be cc'd to stable@kernel.org.  Not sure if Linus will take it for 2.6.35 or not, but we should be able to get it into 2.6.34.x and 2.6.35.x once it hits his tree.
Comment 23 Artem S. Tashkinov 2010-07-08 09:35:59 UTC
(In reply to comment #22)
> Just sent out a new version of the stolen space trim patch.  Can you reply
> with
> your tested-by so we can get it upstream?  If you want it in older kernels as
> well you can request that it be cc'd to stable@kernel.org.  Not sure if Linus
> will take it for 2.6.35 or not, but we should be able to get it into 2.6.34.x
> and 2.6.35.x once it hits his tree.

Jesse, where can I find this new patch and how I can add myself as "tested-by"? I'm not subscribed to LKML but I can if it's necessary.
Comment 24 Jesse Barnes 2010-07-08 15:58:55 UTC
I cc'd you on the patch, subject "[PATCH] drm/agp/i915: trim stolen space to 32M".  It also got sent to the intel-gfx@lists.freedesktop.org mailing list.  Archives are available at lists.freedesktop.org.

If it works for you, just reply to the message with "Tested-by: ..." including your name & email addr.

Thanks.
Comment 25 Artem S. Tashkinov 2010-07-09 05:26:52 UTC
I decided not to add this paragraph to my e-mail, so I'm expressing myself here:

---

It seems like the only difference from the older patch is 32 vs 16, so could you please explain in laymen terms how this patch works and why did you change 16MB to 32MB.

---

My only grief right now is `free` output (I have 4GB of RAM):

$ free
             total       used       free     shared    buffers     cached
Mem:       3748132     328152    3419980          0      32760     160052
-/+ buffers/cache:     135340    3612792
Swap:            0          0          0

$ uname -a
Linux localhost.localdomain 2.6.34.1-ic #1 SMP PREEMPT Tue Jul 6 02:36:45 YEKST 2010 i686 i686 i386 GNU/Linux
Comment 26 Jesse Barnes 2010-07-09 07:18:50 UTC
Our newer chips run at higher resolution, so we need more space for the compressed framebuffer.

bugzilla-daemon@bugzilla.kernel.org wrote:

>https://bugzilla.kernel.org/show_bug.cgi?id=15469
>
>
>
>
>
>--- Comment #25 from Artem S. Tashkinov <t.artem@mailcity.com>  2010-07-09
>05:26:52 ---
>I decided not to add this paragraph to my e-mail, so I'm expressing myself
>here:
>
>---
>
>It seems like the only difference from the older patch is 32 vs 16, so could
>you please explain in laymen terms how this patch works and why did you change
>16MB to 32MB.
>
>---
>
>My only grief right now is `free` output (I have 4GB of RAM):
>
>$ free
>             total       used       free     shared    buffers     cached
>Mem:       3748132     328152    3419980          0      32760     160052
>-/+ buffers/cache:     135340    3612792
>Swap:            0          0          0
>
>$ uname -a
>Linux localhost.localdomain 2.6.34.1-ic #1 SMP PREEMPT Tue Jul 6 02:36:45
>YEKST
>2010 i686 i686 i386 GNU/Linux
>
>-- 
>Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email
>------- You are receiving this mail because: -------
>You are on the CC list for the bug.
>
Comment 27 Artem S. Tashkinov 2010-07-13 08:33:21 UTC
Jesse, your patch hasn't be included into the mainline 2.6.35-rc5, I hope it will be included before 2.6.35 gets released.

And like you asked I've sent my "Tested-by:" e-mail.
Comment 28 Jesse Barnes 2010-07-13 15:52:14 UTC
> Jesse, your patch hasn't be included into the mainline 2.6.35-rc5, I hope it
> will be included before 2.6.35 gets released.
> 
> And like you asked I've sent my "Tested-by:" e-mail.

Thanks, just waiting for Eric to pick it up now.
Comment 29 Artem S. Tashkinov 2010-07-23 09:09:32 UTC
Somehow someone has forgotten to push this patch into the mainline.

OK, let's wait until 2.6.36 comes out :(
Comment 30 Jesse Barnes 2010-07-23 19:36:16 UTC
Yeah sorry, Eric has been busy with other things, the patch should be applied soon, then you can request that it be merged to the stable tree as well.
Comment 31 Artem S. Tashkinov 2010-08-14 07:58:07 UTC
Is there any chance that these patches will be pushed to stable? (2.6.35.x/2.6.32.x)