Bug 16148

Summary: page allocation failure. order:1, mode:0x50d0
Product: Memory Management Reporter: Tobias (devnull)
Component: Page AllocatorAssignee: Andrew Morton (akpm)
Status: RESOLVED OBSOLETE    
Severity: normal CC: alan, devnull, mikko.cal
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 2.6.35-rc1 Subsystem:
Regression: No Bisected commit-id:
Attachments: dmesg

Description Tobias 2010-06-07 17:32:00 UTC
Created attachment 26687 [details]
dmesg

Never seen this before.

2.6.35-rc1 #1 SMP Mon May 31 21:31:02 CEST 2010 x86_64 GNU/Linux

[48126.787684] Xorg: page allocation failure. order:1, mode:0x50d0
[48126.787691] Pid: 1895, comm: Xorg Tainted: G        W   2.6.35-rc1 #1
[48126.787694] Call Trace:
[48126.787709]  [<ffffffff811192f5>] __alloc_pages_nodemask+0x5f5/0x6f0
[48126.787716]  [<ffffffff81148695>] alloc_pages_current+0x95/0x100
[48126.787720]  [<ffffffff8114e04a>] new_slab+0x2ba/0x2c0
[48126.787724]  [<ffffffff8114ed0b>] __slab_alloc+0x14b/0x4e0
[48126.787730]  [<ffffffff81403f91>] ? security_vm_enough_memory_kern+0x21/0x30
[48126.787736]  [<ffffffff81556e6a>] ? agp_alloc_page_array+0x5a/0x70
[48126.787740]  [<ffffffff8115087f>] __kmalloc+0x11f/0x1c0
[48126.787744]  [<ffffffff81556e6a>] agp_alloc_page_array+0x5a/0x70
[48126.787747]  [<ffffffff81556ee4>] agp_generic_alloc_user+0x64/0x140
[48126.787750]  [<ffffffff8155717a>] agp_allocate_memory+0x9a/0x140
[48126.787755]  [<ffffffff8156c179>] drm_agp_allocate_memory+0x9/0x10
[48126.787758]  [<ffffffff8156c1d7>] drm_agp_bind_pages+0x57/0x100
[48126.787765]  [<ffffffff81627fe4>] i915_gem_object_bind_to_gtt+0x144/0x340
[48126.787768]  [<ffffffff81628295>] i915_gem_object_pin+0xb5/0xd0
[48126.787772]  [<ffffffff81629a4c>] i915_gem_do_execbuffer+0x6cc/0x14f0
[48126.787777]  [<ffffffff81091ba0>] ? __is_ram+0x0/0x10
[48126.787783]  [<ffffffff8106c76e>] ? lookup_memtype+0xce/0xe0
[48126.787787]  [<ffffffff8162ab11>] ? i915_gem_execbuffer+0x91/0x390
[48126.787790]  [<ffffffff8162ac55>] i915_gem_execbuffer+0x1d5/0x390
[48126.787794]  [<ffffffff816255b0>] ? i915_gem_sw_finish_ioctl+0x90/0xc0
[48126.787799]  [<ffffffff81565a0a>] drm_ioctl+0x32a/0x4b0
[48126.787802]  [<ffffffff8162aa80>] ? i915_gem_execbuffer+0x0/0x390
[48126.787807]  [<ffffffff8116c248>] vfs_ioctl+0x38/0xd0
[48126.787810]  [<ffffffff8116c87a>] do_vfs_ioctl+0x8a/0x580
[48126.787814]  [<ffffffff8116cdf1>] sys_ioctl+0x81/0xa0
[48126.787820]  [<ffffffff8103af02>] system_call_fastpath+0x16/0x1b
Comment 1 Andrew Morton 2010-06-10 22:39:38 UTC
(switched to email.  Please respond via emailed reply-to-all, not via the
bugzilla web interface).

On Mon, 7 Jun 2010 17:32:04 GMT
bugzilla-daemon@bugzilla.kernel.org wrote:

> https://bugzilla.kernel.org/show_bug.cgi?id=16148
> 
>            Summary: page allocation failure. order:1, mode:0x50d0
>            Product: Memory Management
>            Version: 2.5
>     Kernel Version: 2.6.35-rc1
>           Platform: All
>         OS/Version: Linux
>               Tree: Mainline
>             Status: NEW
>           Severity: normal
>           Priority: P1
>          Component: Page Allocator
>         AssignedTo: akpm@linux-foundation.org
>         ReportedBy: devnull@plzk.org
>         Regression: No
> 
> 
> Created an attachment (id=26687)
>  --> (https://bugzilla.kernel.org/attachment.cgi?id=26687)
> dmesg
> 
> Never seen this before.
> 
> 2.6.35-rc1 #1 SMP Mon May 31 21:31:02 CEST 2010 x86_64 GNU/Linux
> 
> [48126.787684] Xorg: page allocation failure. order:1, mode:0x50d0
> [48126.787691] Pid: 1895, comm: Xorg Tainted: G        W   2.6.35-rc1 #1
> [48126.787694] Call Trace:
> [48126.787709]  [<ffffffff811192f5>] __alloc_pages_nodemask+0x5f5/0x6f0
> [48126.787716]  [<ffffffff81148695>] alloc_pages_current+0x95/0x100
> [48126.787720]  [<ffffffff8114e04a>] new_slab+0x2ba/0x2c0
> [48126.787724]  [<ffffffff8114ed0b>] __slab_alloc+0x14b/0x4e0
> [48126.787730]  [<ffffffff81403f91>] ?
> security_vm_enough_memory_kern+0x21/0x30
> [48126.787736]  [<ffffffff81556e6a>] ? agp_alloc_page_array+0x5a/0x70
> [48126.787740]  [<ffffffff8115087f>] __kmalloc+0x11f/0x1c0
> [48126.787744]  [<ffffffff81556e6a>] agp_alloc_page_array+0x5a/0x70
> [48126.787747]  [<ffffffff81556ee4>] agp_generic_alloc_user+0x64/0x140
> [48126.787750]  [<ffffffff8155717a>] agp_allocate_memory+0x9a/0x140
> [48126.787755]  [<ffffffff8156c179>] drm_agp_allocate_memory+0x9/0x10
> [48126.787758]  [<ffffffff8156c1d7>] drm_agp_bind_pages+0x57/0x100
> [48126.787765]  [<ffffffff81627fe4>] i915_gem_object_bind_to_gtt+0x144/0x340
> [48126.787768]  [<ffffffff81628295>] i915_gem_object_pin+0xb5/0xd0
> [48126.787772]  [<ffffffff81629a4c>] i915_gem_do_execbuffer+0x6cc/0x14f0
> [48126.787777]  [<ffffffff81091ba0>] ? __is_ram+0x0/0x10
> [48126.787783]  [<ffffffff8106c76e>] ? lookup_memtype+0xce/0xe0
> [48126.787787]  [<ffffffff8162ab11>] ? i915_gem_execbuffer+0x91/0x390
> [48126.787790]  [<ffffffff8162ac55>] i915_gem_execbuffer+0x1d5/0x390
> [48126.787794]  [<ffffffff816255b0>] ? i915_gem_sw_finish_ioctl+0x90/0xc0
> [48126.787799]  [<ffffffff81565a0a>] drm_ioctl+0x32a/0x4b0
> [48126.787802]  [<ffffffff8162aa80>] ? i915_gem_execbuffer+0x0/0x390
> [48126.787807]  [<ffffffff8116c248>] vfs_ioctl+0x38/0xd0
> [48126.787810]  [<ffffffff8116c87a>] do_vfs_ioctl+0x8a/0x580
> [48126.787814]  [<ffffffff8116cdf1>] sys_ioctl+0x81/0xa0
> [48126.787820]  [<ffffffff8103af02>] system_call_fastpath+0x16/0x1b
> 

David, I have a vague feeling that we've been round this loop before.. 

Why does agp_alloc_page_array() use __GFP_NORETRY?  It's pretty unusual
and it's what caused this spew.

There's nothing in the changelog and the only relevant commentary
appears to be "This speeds things up and also saves memory for small
AGP regions", which is inscrutable.  Can you please add a usable
comment there?

Presumably this was added in response to some observed behaviour, but
what was it??


If the __GFP_NORETRY is indeed useful and legitimate and given that we
have a vmalloc fallback, I'd suggest that we add __GFP_NOWARN there as
well to keep the bug reports away.

btw, agp_memory.vmalloc_flag can be done away with - it's conventional
to use is_vmalloc_addr() for this.
Comment 2 Anonymous Emailer 2010-06-11 01:26:29 UTC
Reply-To: airlied@redhat.com

On Thu, 2010-06-10 at 15:38 -0700, Andrew Morton wrote:
> (switched to email.  Please respond via emailed reply-to-all, not via the
> bugzilla web interface).
> 
> On Mon, 7 Jun 2010 17:32:04 GMT
> bugzilla-daemon@bugzilla.kernel.org wrote:
> 
> > https://bugzilla.kernel.org/show_bug.cgi?id=16148
> > 
> >            Summary: page allocation failure. order:1, mode:0x50d0
> >            Product: Memory Management
> >            Version: 2.5
> >     Kernel Version: 2.6.35-rc1
> >           Platform: All
> >         OS/Version: Linux
> >               Tree: Mainline
> >             Status: NEW
> >           Severity: normal
> >           Priority: P1
> >          Component: Page Allocator
> >         AssignedTo: akpm@linux-foundation.org
> >         ReportedBy: devnull@plzk.org
> >         Regression: No
> > 
> > 
> > Created an attachment (id=26687)
> >  --> (https://bugzilla.kernel.org/attachment.cgi?id=26687)
> > dmesg
> > 
> > Never seen this before.
> > 
> > 2.6.35-rc1 #1 SMP Mon May 31 21:31:02 CEST 2010 x86_64 GNU/Linux
> > 
> > [48126.787684] Xorg: page allocation failure. order:1, mode:0x50d0
> > [48126.787691] Pid: 1895, comm: Xorg Tainted: G        W   2.6.35-rc1 #1
> > [48126.787694] Call Trace:
> > [48126.787709]  [<ffffffff811192f5>] __alloc_pages_nodemask+0x5f5/0x6f0
> > [48126.787716]  [<ffffffff81148695>] alloc_pages_current+0x95/0x100
> > [48126.787720]  [<ffffffff8114e04a>] new_slab+0x2ba/0x2c0
> > [48126.787724]  [<ffffffff8114ed0b>] __slab_alloc+0x14b/0x4e0
> > [48126.787730]  [<ffffffff81403f91>] ?
> security_vm_enough_memory_kern+0x21/0x30
> > [48126.787736]  [<ffffffff81556e6a>] ? agp_alloc_page_array+0x5a/0x70
> > [48126.787740]  [<ffffffff8115087f>] __kmalloc+0x11f/0x1c0
> > [48126.787744]  [<ffffffff81556e6a>] agp_alloc_page_array+0x5a/0x70
> > [48126.787747]  [<ffffffff81556ee4>] agp_generic_alloc_user+0x64/0x140
> > [48126.787750]  [<ffffffff8155717a>] agp_allocate_memory+0x9a/0x140
> > [48126.787755]  [<ffffffff8156c179>] drm_agp_allocate_memory+0x9/0x10
> > [48126.787758]  [<ffffffff8156c1d7>] drm_agp_bind_pages+0x57/0x100
> > [48126.787765]  [<ffffffff81627fe4>]
> i915_gem_object_bind_to_gtt+0x144/0x340
> > [48126.787768]  [<ffffffff81628295>] i915_gem_object_pin+0xb5/0xd0
> > [48126.787772]  [<ffffffff81629a4c>] i915_gem_do_execbuffer+0x6cc/0x14f0
> > [48126.787777]  [<ffffffff81091ba0>] ? __is_ram+0x0/0x10
> > [48126.787783]  [<ffffffff8106c76e>] ? lookup_memtype+0xce/0xe0
> > [48126.787787]  [<ffffffff8162ab11>] ? i915_gem_execbuffer+0x91/0x390
> > [48126.787790]  [<ffffffff8162ac55>] i915_gem_execbuffer+0x1d5/0x390
> > [48126.787794]  [<ffffffff816255b0>] ? i915_gem_sw_finish_ioctl+0x90/0xc0
> > [48126.787799]  [<ffffffff81565a0a>] drm_ioctl+0x32a/0x4b0
> > [48126.787802]  [<ffffffff8162aa80>] ? i915_gem_execbuffer+0x0/0x390
> > [48126.787807]  [<ffffffff8116c248>] vfs_ioctl+0x38/0xd0
> > [48126.787810]  [<ffffffff8116c87a>] do_vfs_ioctl+0x8a/0x580
> > [48126.787814]  [<ffffffff8116cdf1>] sys_ioctl+0x81/0xa0
> > [48126.787820]  [<ffffffff8103af02>] system_call_fastpath+0x16/0x1b
> > 
> 
> David, I have a vague feeling that we've been round this loop before.. 
> 
> Why does agp_alloc_page_array() use __GFP_NORETRY?  It's pretty unusual
> and it's what caused this spew.
> 
> There's nothing in the changelog and the only relevant commentary
> appears to be "This speeds things up and also saves memory for small
> AGP regions", which is inscrutable.  Can you please add a usable
> comment there?

cc'ing Thomas, who added this, I expect we could drop the NORETRY or
just add NOWARN. Though an order 1 page alloc failure isn't a pretty
sight, not sure how a vmalloc fallback could save us.

> 
> Presumably this was added in response to some observed behaviour, but
> what was it??
> 
> 
> If the __GFP_NORETRY is indeed useful and legitimate and given that we
> have a vmalloc fallback, I'd suggest that we add __GFP_NOWARN there as
> well to keep the bug reports away.
> 
> btw, agp_memory.vmalloc_flag can be done away with - it's conventional
> to use is_vmalloc_addr() for this.

Lols, conventional my ass, we wanted to add that thing years ago for
this purpose and got told that would be an insane interface, then the
same person added the interface a year later and never fixed AGP to use
it.

I'll try and write a patch.

Dave.
Comment 3 Thomas Hellstrom 2010-06-11 09:23:59 UTC
On 06/11/2010 01:15 AM, Dave Airlie wrote:
> On Thu, 2010-06-10 at 15:38 -0700, Andrew Morton wrote:
>    
>> (switched to email.  Please respond via emailed reply-to-all, not via the
>> bugzilla web interface).
>>
>> On Mon, 7 Jun 2010 17:32:04 GMT
>> bugzilla-daemon@bugzilla.kernel.org wrote:
>>
>>      
>>> https://bugzilla.kernel.org/show_bug.cgi?id=16148
>>>
>>>             Summary: page allocation failure. order:1, mode:0x50d0
>>>             Product: Memory Management
>>>             Version: 2.5
>>>      Kernel Version: 2.6.35-rc1
>>>            Platform: All
>>>          OS/Version: Linux
>>>                Tree: Mainline
>>>              Status: NEW
>>>            Severity: normal
>>>            Priority: P1
>>>           Component: Page Allocator
>>>          AssignedTo: akpm@linux-foundation.org
>>>          ReportedBy: devnull@plzk.org
>>>          Regression: No
>>>
>>>
>>> Created an attachment (id=26687)
>>>   -->  (https://bugzilla.kernel.org/attachment.cgi?id=26687)
>>> dmesg
>>>
>>> Never seen this before.
>>>
>>> 2.6.35-rc1 #1 SMP Mon May 31 21:31:02 CEST 2010 x86_64 GNU/Linux
>>>
>>> [48126.787684] Xorg: page allocation failure. order:1, mode:0x50d0
>>> [48126.787691] Pid: 1895, comm: Xorg Tainted: G        W   2.6.35-rc1 #1
>>> [48126.787694] Call Trace:
>>> [48126.787709]  [<ffffffff811192f5>] __alloc_pages_nodemask+0x5f5/0x6f0
>>> [48126.787716]  [<ffffffff81148695>] alloc_pages_current+0x95/0x100
>>> [48126.787720]  [<ffffffff8114e04a>] new_slab+0x2ba/0x2c0
>>> [48126.787724]  [<ffffffff8114ed0b>] __slab_alloc+0x14b/0x4e0
>>> [48126.787730]  [<ffffffff81403f91>] ?
>>> security_vm_enough_memory_kern+0x21/0x30
>>> [48126.787736]  [<ffffffff81556e6a>] ? agp_alloc_page_array+0x5a/0x70
>>> [48126.787740]  [<ffffffff8115087f>] __kmalloc+0x11f/0x1c0
>>> [48126.787744]  [<ffffffff81556e6a>] agp_alloc_page_array+0x5a/0x70
>>> [48126.787747]  [<ffffffff81556ee4>] agp_generic_alloc_user+0x64/0x140
>>> [48126.787750]  [<ffffffff8155717a>] agp_allocate_memory+0x9a/0x140
>>> [48126.787755]  [<ffffffff8156c179>] drm_agp_allocate_memory+0x9/0x10
>>> [48126.787758]  [<ffffffff8156c1d7>] drm_agp_bind_pages+0x57/0x100
>>> [48126.787765]  [<ffffffff81627fe4>]
>>> i915_gem_object_bind_to_gtt+0x144/0x340
>>> [48126.787768]  [<ffffffff81628295>] i915_gem_object_pin+0xb5/0xd0
>>> [48126.787772]  [<ffffffff81629a4c>] i915_gem_do_execbuffer+0x6cc/0x14f0
>>> [48126.787777]  [<ffffffff81091ba0>] ? __is_ram+0x0/0x10
>>> [48126.787783]  [<ffffffff8106c76e>] ? lookup_memtype+0xce/0xe0
>>> [48126.787787]  [<ffffffff8162ab11>] ? i915_gem_execbuffer+0x91/0x390
>>> [48126.787790]  [<ffffffff8162ac55>] i915_gem_execbuffer+0x1d5/0x390
>>> [48126.787794]  [<ffffffff816255b0>] ? i915_gem_sw_finish_ioctl+0x90/0xc0
>>> [48126.787799]  [<ffffffff81565a0a>] drm_ioctl+0x32a/0x4b0
>>> [48126.787802]  [<ffffffff8162aa80>] ? i915_gem_execbuffer+0x0/0x390
>>> [48126.787807]  [<ffffffff8116c248>] vfs_ioctl+0x38/0xd0
>>> [48126.787810]  [<ffffffff8116c87a>] do_vfs_ioctl+0x8a/0x580
>>> [48126.787814]  [<ffffffff8116cdf1>] sys_ioctl+0x81/0xa0
>>> [48126.787820]  [<ffffffff8103af02>] system_call_fastpath+0x16/0x1b
>>>
>>>        
>> David, I have a vague feeling that we've been round this loop before..
>>
>> Why does agp_alloc_page_array() use __GFP_NORETRY?  It's pretty unusual
>> and it's what caused this spew.
>>
>> There's nothing in the changelog and the only relevant commentary
>> appears to be "This speeds things up and also saves memory for small
>> AGP regions", which is inscrutable.  Can you please add a usable
>> comment there?
>>      
> cc'ing Thomas, who added this, I expect we could drop the NORETRY or
> just add NOWARN. Though an order 1 page alloc failure isn't a pretty
> sight, not sure how a vmalloc fallback could save us.
>
>    

Hmm. IIRC that was an untested speed optimization back from the time 
when I was
reading ldd3. I think the idea was to avoid slow allocations of (order > 
0) if they weren't
immediately available and fall back to vmalloc single page allocations.
It might be that that functionality is no longer preserved and only the 
__GFP_NORETRY remains.
I think it should be safe to remove the NORETRY if it's annoying, but it 
should probably be equally safe to add a NOWARN and keep the vmalloc 
fallback.

Now if we still get a "definitive" page allocation failure in this 
codepath, that's not good, but hardly the AGP driver's fault.  Has Intel 
added some kind of accounting for pinned pages yet?


>> Presumably this was added in response to some observed behaviour, but
>> what was it??
>>
>>
>> If the __GFP_NORETRY is indeed useful and legitimate and given that we
>> have a vmalloc fallback, I'd suggest that we add __GFP_NOWARN there as
>> well to keep the bug reports away.
>>
>> btw, agp_memory.vmalloc_flag can be done away with - it's conventional
>> to use is_vmalloc_addr() for this.
>>      
> Lols, conventional my ass, we wanted to add that thing years ago for
> this purpose and got told that would be an insane interface, then the
> same person added the interface a year later and never fixed AGP to use
> it.
>
>    


Indeed. I even recall the phrase "Too ugly to live" :).

> I'll try and write a patch.
>
> Dave.
>
>    
/Thomas
Comment 4 Andrew Morton 2010-06-11 17:25:20 UTC
On Fri, 11 Jun 2010 10:46:07 +0200 Thomas Hellstrom <thellstrom@vmware.com> wrote:

> >>>
> >>>        
> >> David, I have a vague feeling that we've been round this loop before..
> >>
> >> Why does agp_alloc_page_array() use __GFP_NORETRY?  It's pretty unusual
> >> and it's what caused this spew.
> >>
> >> There's nothing in the changelog and the only relevant commentary
> >> appears to be "This speeds things up and also saves memory for small
> >> AGP regions", which is inscrutable.  Can you please add a usable
> >> comment there?
> >>      
> > cc'ing Thomas, who added this, I expect we could drop the NORETRY or
> > just add NOWARN. Though an order 1 page alloc failure isn't a pretty
> > sight, not sure how a vmalloc fallback could save us.
> >
> >    
> 
> Hmm. IIRC that was an untested speed optimization back from the time 
> when I was
> reading ldd3. I think the idea was to avoid slow allocations of (order > 
> 0) if they weren't
> immediately available and fall back to vmalloc single page allocations.
> It might be that that functionality is no longer preserved and only the 
> __GFP_NORETRY remains.
> I think it should be safe to remove the NORETRY if it's annoying, but it 
> should probably be equally safe to add a NOWARN and keep the vmalloc 
> fallback.

An order-1 GFP_KERNEL allocation is a breeze - slub does them often, we
use them for kernel stacks all the time.  I'd say just remove the
__GFP_NORETRY and be happy.

In fact if the allocations are always this small I'd say we can remove
the vmalloc fallback too.  However if under some circumstances the
allocations can be "large", say order-4 or higher then allocation
failures are still a risk.
Comment 5 Thomas Hellstrom 2010-06-11 20:23:47 UTC
On 06/11/2010 07:24 PM, Andrew Morton wrote:
> On Fri, 11 Jun 2010 10:46:07 +0200 Thomas Hellstrom<thellstrom@vmware.com> 
> wrote:
>
>    
>>>>>
>>>>>            
>>>> David, I have a vague feeling that we've been round this loop before..
>>>>
>>>> Why does agp_alloc_page_array() use __GFP_NORETRY?  It's pretty unusual
>>>> and it's what caused this spew.
>>>>
>>>> There's nothing in the changelog and the only relevant commentary
>>>> appears to be "This speeds things up and also saves memory for small
>>>> AGP regions", which is inscrutable.  Can you please add a usable
>>>> comment there?
>>>>
>>>>          
>>> cc'ing Thomas, who added this, I expect we could drop the NORETRY or
>>> just add NOWARN. Though an order 1 page alloc failure isn't a pretty
>>> sight, not sure how a vmalloc fallback could save us.
>>>
>>>
>>>        
>> Hmm. IIRC that was an untested speed optimization back from the time
>> when I was
>> reading ldd3. I think the idea was to avoid slow allocations of (order>
>> 0) if they weren't
>> immediately available and fall back to vmalloc single page allocations.
>> It might be that that functionality is no longer preserved and only the
>> __GFP_NORETRY remains.
>> I think it should be safe to remove the NORETRY if it's annoying, but it
>> should probably be equally safe to add a NOWARN and keep the vmalloc
>> fallback.
>>      
> An order-1 GFP_KERNEL allocation is a breeze - slub does them often, we
> use them for kernel stacks all the time.  I'd say just remove the
> __GFP_NORETRY and be happy.
>
> In fact if the allocations are always this small I'd say we can remove
> the vmalloc fallback too.  However if under some circumstances the
> allocations can be "large", say order-4 or higher then allocation
> failures are still a risk.
>
>    

Actually, At that time I was working with a SiS GPU (128MiB system), and 
was getting persistent failures for order 1 GFP_KERNEL page allocations 
(albeit not in this codepath). So while they are highly unlikely for 
modern systems, it might be worthwhile keeping the fallback.

/Thomas
Comment 6 Andrew Morton 2010-06-11 20:39:00 UTC
On Fri, 11 Jun 2010 22:22:28 +0200
Thomas Hellstrom <thellstrom@vmware.com> wrote:

> >>> cc'ing Thomas, who added this, I expect we could drop the NORETRY or
> >>> just add NOWARN. Though an order 1 page alloc failure isn't a pretty
> >>> sight, not sure how a vmalloc fallback could save us.
> >>>
> >>>
> >>>        
> >> Hmm. IIRC that was an untested speed optimization back from the time
> >> when I was
> >> reading ldd3. I think the idea was to avoid slow allocations of (order>
> >> 0) if they weren't
> >> immediately available and fall back to vmalloc single page allocations.
> >> It might be that that functionality is no longer preserved and only the
> >> __GFP_NORETRY remains.
> >> I think it should be safe to remove the NORETRY if it's annoying, but it
> >> should probably be equally safe to add a NOWARN and keep the vmalloc
> >> fallback.
> >>      
> > An order-1 GFP_KERNEL allocation is a breeze - slub does them often, we
> > use them for kernel stacks all the time.  I'd say just remove the
> > __GFP_NORETRY and be happy.
> >
> > In fact if the allocations are always this small I'd say we can remove
> > the vmalloc fallback too.  However if under some circumstances the
> > allocations can be "large", say order-4 or higher then allocation
> > failures are still a risk.
> >
> >    
> 
> Actually, At that time I was working with a SiS GPU (128MiB system), and 
> was getting persistent failures for order 1 GFP_KERNEL page allocations 
> (albeit not in this codepath). So while they are highly unlikely for 
> modern systems, it might be worthwhile keeping the fallback.

128MB total RAM?  Those were the days.

Various page reclaim changes have been made in the past year or so
which _should_ improve that (eg, lumpy reclaim) but yeah, it's by no
means a certainty.

The vmalloc fallback hardly hurts anyone.  But it does mean that hardly
anyone ever executes that codepath, so it won't get tested much.

There was a patch recently which added an API formalising the
alloc_pages-then-vmalloc fallback approach.  It didn't get merged,
although there weren't strong feelings either way really.  One benefit
of that approach is that the alloc/free code itself would get more
testing coverage, but callers can still screw things up by failing to
handle vmalloc memory correctly for DMA mapping purposes.

Oh well, where were we?  Remove that __GFP_NORETRY?
Comment 7 Thomas Hellstrom 2010-06-11 21:40:36 UTC
On 06/11/2010 10:37 PM, Andrew Morton wrote:
> On Fri, 11 Jun 2010 22:22:28 +0200
> Thomas Hellstrom<thellstrom@vmware.com>  wrote:
>
>    
>>>>> cc'ing Thomas, who added this, I expect we could drop the NORETRY or
>>>>> just add NOWARN. Though an order 1 page alloc failure isn't a pretty
>>>>> sight, not sure how a vmalloc fallback could save us.
>>>>>
>>>>>
>>>>>
>>>>>            
>>>> Hmm. IIRC that was an untested speed optimization back from the time
>>>> when I was
>>>> reading ldd3. I think the idea was to avoid slow allocations of (order>
>>>> 0) if they weren't
>>>> immediately available and fall back to vmalloc single page allocations.
>>>> It might be that that functionality is no longer preserved and only the
>>>> __GFP_NORETRY remains.
>>>> I think it should be safe to remove the NORETRY if it's annoying, but it
>>>> should probably be equally safe to add a NOWARN and keep the vmalloc
>>>> fallback.
>>>>
>>>>          
>>> An order-1 GFP_KERNEL allocation is a breeze - slub does them often, we
>>> use them for kernel stacks all the time.  I'd say just remove the
>>> __GFP_NORETRY and be happy.
>>>
>>> In fact if the allocations are always this small I'd say we can remove
>>> the vmalloc fallback too.  However if under some circumstances the
>>> allocations can be "large", say order-4 or higher then allocation
>>> failures are still a risk.
>>>
>>>
>>>        
>> Actually, At that time I was working with a SiS GPU (128MiB system), and
>> was getting persistent failures for order 1 GFP_KERNEL page allocations
>> (albeit not in this codepath). So while they are highly unlikely for
>> modern systems, it might be worthwhile keeping the fallback.
>>      
> 128MB total RAM?  Those were the days.
>
> Various page reclaim changes have been made in the past year or so
> which _should_ improve that (eg, lumpy reclaim) but yeah, it's by no
> means a certainty.
>
> The vmalloc fallback hardly hurts anyone.  But it does mean that hardly
> anyone ever executes that codepath, so it won't get tested much.
>
> There was a patch recently which added an API formalising the
> alloc_pages-then-vmalloc fallback approach.  It didn't get merged,
> although there weren't strong feelings either way really.  One benefit
> of that approach is that the alloc/free code itself would get more
> testing coverage, but callers can still screw things up by failing to
> handle vmalloc memory correctly for DMA mapping purposes.
>
> Oh well, where were we?  Remove that __GFP_NORETRY?
>    

Yeah, I think that's the sanest thing to do!

/Thomas
Comment 8 Mikko C. 2010-06-13 13:01:53 UTC
I have been getting this with 2.6.35-rc2 and rc3.
Could it be the same problem?


X: page allocation failure. order:0, mode:0x4                                                                           
Pid: 1514, comm: X Not tainted 2.6.35-rc3 #1                                                                            
Call Trace:                                                                                                             
 [<ffffffff8108ce49>] ? __alloc_pages_nodemask+0x629/0x680                                                              
 [<ffffffff8108c920>] ? __alloc_pages_nodemask+0x100/0x680                                                              
 [<ffffffffa00db8f3>] ? ttm_get_pages+0x2c3/0x448 [ttm]                                                                 
 [<ffffffffa00d4658>] ? __ttm_tt_get_page+0x98/0xc0 [ttm]                                                               
 [<ffffffffa00d4988>] ? ttm_tt_populate+0x48/0x90 [ttm]                                                                 
 [<ffffffffa00d4a26>] ? ttm_tt_bind+0x56/0xa0 [ttm]                                                                     
 [<ffffffffa00d5230>] ? ttm_bo_handle_move_mem+0x1d0/0x430 [ttm]                                                        
 [<ffffffffa00d76d6>] ? ttm_bo_move_buffer+0x166/0x180 [ttm]                                                            
 [<ffffffffa00b9736>] ? drm_mm_kmalloc+0x26/0xc0 [drm]                                                                  
 [<ffffffff81030ea9>] ? get_parent_ip+0x9/0x20                                                                          
 [<ffffffffa00d7786>] ? ttm_bo_validate+0x96/0x130 [ttm]                                                                
 [<ffffffffa00d7b35>] ? ttm_bo_init+0x315/0x390 [ttm]                                                                   
 [<ffffffffa0122eb8>] ? radeon_bo_create+0x118/0x210 [radeon]                                                           
 [<ffffffffa0122fb0>] ? radeon_ttm_bo_destroy+0x0/0xb0 [radeon]                                                         
 [<ffffffffa013704c>] ? radeon_gem_object_create+0x8c/0x110 [radeon]                                                    
 [<ffffffffa013711f>] ? radeon_gem_create_ioctl+0x4f/0xe0 [radeon]                                                      
 [<ffffffffa00b10e6>] ? drm_ioctl+0x3d6/0x470 [drm]                                                                     
 [<ffffffffa01370d0>] ? radeon_gem_create_ioctl+0x0/0xe0 [radeon]                                                       
 [<ffffffff810b965f>] ? do_sync_read+0xbf/0x100                                                                         
 [<ffffffff810c8965>] ? vfs_ioctl+0x35/0xd0                                                                             
 [<ffffffff810c8b28>] ? do_vfs_ioctl+0x88/0x530                                                                         
 [<ffffffff81031ed7>] ? sub_preempt_count+0x87/0xb0                                                                     
 [<ffffffff810c9019>] ? sys_ioctl+0x49/0x80                                                                             
 [<ffffffff810ba4fe>] ? sys_read+0x4e/0x90                                                                              
 [<ffffffff810024ab>] ? system_call_fastpath+0x16/0x1b                                                                  
Mem-Info:                                                                                                               
DMA per-cpu:                                                                                                            
CPU    0: hi:    0, btch:   1 usd:   0                                                                                  
CPU    1: hi:    0, btch:   1 usd:   0                                                                                  
CPU    2: hi:    0, btch:   1 usd:   0                                                                                  
CPU    3: hi:    0, btch:   1 usd:   0                                                                                  
DMA32 per-cpu:                                                                                                          
CPU    0: hi:  186, btch:  31 usd: 156                                                                                  
CPU    1: hi:  186, btch:  31 usd: 164                                                                                  
CPU    2: hi:  186, btch:  31 usd: 192                                                                                  
CPU    3: hi:  186, btch:  31 usd: 175                                                                                  
Normal per-cpu:                                                                                                         
CPU    0: hi:  186, btch:  31 usd: 177                                                                                  
CPU    1: hi:  186, btch:  31 usd: 153                                                                                  
CPU    2: hi:  186, btch:  31 usd: 147                                                                                  
CPU    3: hi:  186, btch:  31 usd: 168                                                                                  
active_anon:248943 inactive_anon:70653 isolated_anon:0                                                                  
 active_file:231008 inactive_file:258837 isolated_file:0                                                                
 unevictable:1 dirty:5 writeback:0 unstable:0                                                                           
 free:5083 slab_reclaimable:53052 slab_unreclaimable:16244
 mapped:45401 shmem:4420 pagetables:6333 bounce:0
DMA free:13036kB min:28kB low:32kB high:40kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15704kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB slab_unreclaimable:8kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes
lowmem_reserve[]: 0 3255 3760 3760
DMA32 free:5704kB min:6780kB low:8472kB high:10168kB active_anon:915704kB inactive_anon:184904kB active_file:789940kB inactive_file:901472kB unevictable:4kB isolated(anon):0kB isolated(file):0kB present:3333920kB mlocked:4kB dirty:4kB writeback:0kB mapped:114500kB shmem:17136kB slab_reclaimable:197620kB slab_unreclaimable:38500kB kernel_stack:776kB pagetables:12928kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no
lowmem_reserve[]: 0 0 505 505
Normal free:1840kB min:1048kB low:1308kB high:1572kB active_anon:80068kB inactive_anon:97708kB active_file:134092kB inactive_file:133876kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:517120kB mlocked:0kB dirty:16kB writeback:0kB mapped:67104kB shmem:544kB slab_reclaimable:14588kB slab_unreclaimable:26468kB kernel_stack:1936kB pagetables:12404kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no
lowmem_reserve[]: 0 0 0 0
DMA: 1*4kB 1*8kB 0*16kB 1*32kB 1*64kB 1*128kB 0*256kB 1*512kB 0*1024kB 0*2048kB 3*4096kB = 13036kB
DMA32: 200*4kB 10*8kB 1*16kB 0*32kB 1*64kB 1*128kB 1*256kB 1*512kB 0*1024kB 0*2048kB 1*4096kB = 5952kB
Normal: 232*4kB 2*8kB 0*16kB 2*32kB 3*64kB 1*128kB 2*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 1840kB
494223 total pagecache pages
66 pages in swap cache
Swap cache stats: add 475, delete 409, find 6/7
Free swap  = 2094600kB
Total swap = 2096476kB
983024 pages RAM
33615 pages reserved
397983 pages shared
655130 pages non-shared
[TTM] Unable to allocate page.
radeon 0000:01:05.0: object_init failed for (7827456, 0x00000002)
[drm:radeon_gem_object_create] *ERROR* Failed to allocate GEM object (7827456, 2, 4096, -12)
SysRq : HELP : loglevel(0-9) reBoot Crash terminate-all-tasks(E) memory-full-oom-kill(F) kill-all-tasks(I) thaw-filesystems(J) saK show-backtrace-all-active-cpus(L) show-memory-usage(M) nice-all-RT-tasks(N) powerOff show-registers(P) show-all-timers(Q) unRaw Sync show-task-states(T) Unmount force-fb(V) show-blocked-tasks(W)
SysRq : HELP : loglevel(0-9) reBoot Crash terminate-all-tasks(E) memory-full-oom-kill(F) kill-all-tasks(I) thaw-filesystems(J) saK show-backtrace-all-active-cpus(L) show-memory-usage(M) nice-all-RT-tasks(N) powerOff show-registers(P) show-all-timers(Q) unRaw Sync show-task-states(T) Unmount force-fb(V) show-blocked-tasks(W)
SysRq : HELP : loglevel(0-9) reBoot Crash terminate-all-tasks(E) memory-full-oom-kill(F) kill-all-tasks(I) thaw-filesystems(J) saK show-backtrace-all-active-cpus(L) show-memory-usage(M) nice-all-RT-tasks(N) powerOff show-registers(P) show-all-timers(Q) unRaw Sync show-task-states(T) Unmount force-fb(V) show-blocked-tasks(W)
SysRq : HELP : loglevel(0-9) reBoot Crash terminate-all-tasks(E) memory-full-oom-kill(F) kill-all-tasks(I) thaw-filesystems(J) saK show-backtrace-all-active-cpus(L) show-memory-usage(M) nice-all-RT-tasks(N) powerOff show-registers(P) show-all-timers(Q) unRaw Sync show-task-states(T) Unmount force-fb(V) show-blocked-tasks(W)
SysRq : HELP : loglevel(0-9) reBoot Crash terminate-all-tasks(E) memory-full-oom-kill(F) kill-all-tasks(I) thaw-filesystems(J) saK show-backtrace-all-active-cpus(L) show-memory-usage(M) nice-all-RT-tasks(N) powerOff show-registers(P) show-all-timers(Q) unRaw Sync show-task-states(T) Unmount force-fb(V) show-blocked-tasks(W)
SysRq : HELP : loglevel(0-9) reBoot Crash terminate-all-tasks(E) memory-full-oom-kill(F) kill-all-tasks(I) thaw-filesystems(J) saK show-backtrace-all-active-cpus(L) show-memory-usage(M) nice-all-RT-tasks(N) powerOff show-registers(P) show-all-timers(Q) unRaw Sync show-task-states(T) Unmount force-fb(V) show-blocked-tasks(W)
Comment 9 Andrew Morton 2010-06-15 22:42:21 UTC
(switched to email.  Please respond via emailed reply-to-all, not via the
bugzilla web interface).

(switching back to email, actually)

On Sun, 13 Jun 2010 13:01:57 GMT
bugzilla-daemon@bugzilla.kernel.org wrote:

> https://bugzilla.kernel.org/show_bug.cgi?id=16148
>
>
> Mikko C. <mikko.cal@gmail.com> changed:
>
>            What    |Removed                     |Added
> ----------------------------------------------------------------------------
>                  CC|                            |mikko.cal@gmail.com
>
>
>
>
> --- Comment #8 from Mikko C. <mikko.cal@gmail.com>  2010-06-13 13:01:53 ---
> I have been getting this with 2.6.35-rc2 and rc3.
> Could it be the same problem?
>
>
> X: page allocation failure. order:0, mode:0x4
> Pid: 1514, comm: X Not tainted 2.6.35-rc3 #1
> Call Trace:
>  [<ffffffff8108ce49>] ? __alloc_pages_nodemask+0x629/0x680
>  [<ffffffff8108c920>] ? __alloc_pages_nodemask+0x100/0x680
>  [<ffffffffa00db8f3>] ? ttm_get_pages+0x2c3/0x448 [ttm]
>  [<ffffffffa00d4658>] ? __ttm_tt_get_page+0x98/0xc0 [ttm]
>  [<ffffffffa00d4988>] ? ttm_tt_populate+0x48/0x90 [ttm]
>  [<ffffffffa00d4a26>] ? ttm_tt_bind+0x56/0xa0 [ttm]
>  [<ffffffffa00d5230>] ? ttm_bo_handle_move_mem+0x1d0/0x430 [ttm]
>  [<ffffffffa00d76d6>] ? ttm_bo_move_buffer+0x166/0x180 [ttm]
>  [<ffffffffa00b9736>] ? drm_mm_kmalloc+0x26/0xc0 [drm]
>  [<ffffffff81030ea9>] ? get_parent_ip+0x9/0x20
>  [<ffffffffa00d7786>] ? ttm_bo_validate+0x96/0x130 [ttm]
>  [<ffffffffa00d7b35>] ? ttm_bo_init+0x315/0x390 [ttm]
>  [<ffffffffa0122eb8>] ? radeon_bo_create+0x118/0x210 [radeon]
>  [<ffffffffa0122fb0>] ? radeon_ttm_bo_destroy+0x0/0xb0 [radeon]
>  [<ffffffffa013704c>] ? radeon_gem_object_create+0x8c/0x110 [radeon]
>  [<ffffffffa013711f>] ? radeon_gem_create_ioctl+0x4f/0xe0 [radeon]
>  [<ffffffffa00b10e6>] ? drm_ioctl+0x3d6/0x470 [drm]
>  [<ffffffffa01370d0>] ? radeon_gem_create_ioctl+0x0/0xe0 [radeon]
>  [<ffffffff810b965f>] ? do_sync_read+0xbf/0x100
>  [<ffffffff810c8965>] ? vfs_ioctl+0x35/0xd0
>  [<ffffffff810c8b28>] ? do_vfs_ioctl+0x88/0x530
>  [<ffffffff81031ed7>] ? sub_preempt_count+0x87/0xb0
>  [<ffffffff810c9019>] ? sys_ioctl+0x49/0x80
>  [<ffffffff810ba4fe>] ? sys_read+0x4e/0x90
>  [<ffffffff810024ab>] ? system_call_fastpath+0x16/0x1b

That's different.  ttm_get_pages() looks pretty busted to me.  It's not
using __GFP_WAIT and it's not using __GFP_FS.  It's using a plain
GFP_DMA32 so it's using atomic allocations even though it doesn't need
to.  IOW, it's shooting itself in the head.

Given that it will sometimes use GFP_HIGHUSER which includes __GFP_FS
and __GFP_WAIT, I assume it can always include __GFP_FS and __GFP_WAIT.
If so, it should very much do so.  If not then the function is
misdesigned and should be altered to take a gfp_t argument so the
caller can tell ttm_get_pages() which is the strongest allocation mode
which it may use.

> [TTM] Unable to allocate page.
> radeon 0000:01:05.0: object_init failed for (7827456, 0x00000002)
> [drm:radeon_gem_object_create] *ERROR* Failed to allocate GEM object
> (7827456,
> 2, 4096, -12)

This bug actually broke stuff for you.

Something like this:

--- a/drivers/gpu/drm/ttm/ttm_page_alloc.c~a
+++ a/drivers/gpu/drm/ttm/ttm_page_alloc.c
@@ -677,7 +677,7 @@ int ttm_get_pages(struct list_head *page
 	/* No pool for cached pages */
 	if (pool == NULL) {
 		if (flags & TTM_PAGE_FLAG_DMA32)
-			gfp_flags |= GFP_DMA32;
+			gfp_flags |= GFP_KERNEL|GFP_DMA32;
 		else
 			gfp_flags |= GFP_HIGHUSER;
 
_

although I wonder whether it should be using pool->gfp_flags.


It's a shame that this code was developed and merged in secret :( Had
we known, we could have looked at enhancing mempools to cover the
requirement, or at implementing this in some generic fashion rather
than hiding it down in drivers/gpu/drm.
Comment 10 Anonymous Emailer 2010-06-15 23:04:38 UTC
Reply-To: airlied@redhat.com

On Tue, 2010-06-15 at 15:41 -0700, Andrew Morton wrote:
> (switched to email.  Please respond via emailed reply-to-all, not via the
> bugzilla web interface).
> 
> (switching back to email, actually)
> 
> On Sun, 13 Jun 2010 13:01:57 GMT
> bugzilla-daemon@bugzilla.kernel.org wrote:
> 
> > https://bugzilla.kernel.org/show_bug.cgi?id=16148
> >
> >
> > Mikko C. <mikko.cal@gmail.com> changed:
> >
> >            What    |Removed                     |Added
> >
> ----------------------------------------------------------------------------
> >                  CC|                            |mikko.cal@gmail.com
> >
> >
> >
> >
> > --- Comment #8 from Mikko C. <mikko.cal@gmail.com>  2010-06-13 13:01:53 ---
> > I have been getting this with 2.6.35-rc2 and rc3.
> > Could it be the same problem?
> >
> >
> > X: page allocation failure. order:0, mode:0x4
> > Pid: 1514, comm: X Not tainted 2.6.35-rc3 #1
> > Call Trace:
> >  [<ffffffff8108ce49>] ? __alloc_pages_nodemask+0x629/0x680
> >  [<ffffffff8108c920>] ? __alloc_pages_nodemask+0x100/0x680
> >  [<ffffffffa00db8f3>] ? ttm_get_pages+0x2c3/0x448 [ttm]
> >  [<ffffffffa00d4658>] ? __ttm_tt_get_page+0x98/0xc0 [ttm]
> >  [<ffffffffa00d4988>] ? ttm_tt_populate+0x48/0x90 [ttm]
> >  [<ffffffffa00d4a26>] ? ttm_tt_bind+0x56/0xa0 [ttm]
> >  [<ffffffffa00d5230>] ? ttm_bo_handle_move_mem+0x1d0/0x430 [ttm]
> >  [<ffffffffa00d76d6>] ? ttm_bo_move_buffer+0x166/0x180 [ttm]
> >  [<ffffffffa00b9736>] ? drm_mm_kmalloc+0x26/0xc0 [drm]
> >  [<ffffffff81030ea9>] ? get_parent_ip+0x9/0x20
> >  [<ffffffffa00d7786>] ? ttm_bo_validate+0x96/0x130 [ttm]
> >  [<ffffffffa00d7b35>] ? ttm_bo_init+0x315/0x390 [ttm]
> >  [<ffffffffa0122eb8>] ? radeon_bo_create+0x118/0x210 [radeon]
> >  [<ffffffffa0122fb0>] ? radeon_ttm_bo_destroy+0x0/0xb0 [radeon]
> >  [<ffffffffa013704c>] ? radeon_gem_object_create+0x8c/0x110 [radeon]
> >  [<ffffffffa013711f>] ? radeon_gem_create_ioctl+0x4f/0xe0 [radeon]
> >  [<ffffffffa00b10e6>] ? drm_ioctl+0x3d6/0x470 [drm]
> >  [<ffffffffa01370d0>] ? radeon_gem_create_ioctl+0x0/0xe0 [radeon]
> >  [<ffffffff810b965f>] ? do_sync_read+0xbf/0x100
> >  [<ffffffff810c8965>] ? vfs_ioctl+0x35/0xd0
> >  [<ffffffff810c8b28>] ? do_vfs_ioctl+0x88/0x530
> >  [<ffffffff81031ed7>] ? sub_preempt_count+0x87/0xb0
> >  [<ffffffff810c9019>] ? sys_ioctl+0x49/0x80
> >  [<ffffffff810ba4fe>] ? sys_read+0x4e/0x90
> >  [<ffffffff810024ab>] ? system_call_fastpath+0x16/0x1b
> 
> That's different.  ttm_get_pages() looks pretty busted to me.  It's not
> using __GFP_WAIT and it's not using __GFP_FS.  It's using a plain
> GFP_DMA32 so it's using atomic allocations even though it doesn't need
> to.  IOW, it's shooting itself in the head.
> 
> Given that it will sometimes use GFP_HIGHUSER which includes __GFP_FS
> and __GFP_WAIT, I assume it can always include __GFP_FS and __GFP_WAIT.
> If so, it should very much do so.  If not then the function is
> misdesigned and should be altered to take a gfp_t argument so the
> caller can tell ttm_get_pages() which is the strongest allocation mode
> which it may use.
> 
> > [TTM] Unable to allocate page.
> > radeon 0000:01:05.0: object_init failed for (7827456, 0x00000002)
> > [drm:radeon_gem_object_create] *ERROR* Failed to allocate GEM object
> (7827456,
> > 2, 4096, -12)
> 
> This bug actually broke stuff for you.
> 
> Something like this:
> 
> --- a/drivers/gpu/drm/ttm/ttm_page_alloc.c~a
> +++ a/drivers/gpu/drm/ttm/ttm_page_alloc.c
> @@ -677,7 +677,7 @@ int ttm_get_pages(struct list_head *page
>       /* No pool for cached pages */
>       if (pool == NULL) {
>               if (flags & TTM_PAGE_FLAG_DMA32)
> -                     gfp_flags |= GFP_DMA32;
> +                     gfp_flags |= GFP_KERNEL|GFP_DMA32;
>               else
>                       gfp_flags |= GFP_HIGHUSER;
>  
> _
> 
> although I wonder whether it should be using pool->gfp_flags.
> 
> 
> It's a shame that this code was developed and merged in secret :( Had
> we known, we could have looked at enhancing mempools to cover the
> requirement, or at implementing this in some generic fashion rather
> than hiding it down in drivers/gpu/drm.

Its been post to lkml at least once or twice over the past few years
though not as much as it was posted to dri-devel, but that was because
we had never seen anyone show any interest in it outside of kernel
hackers. Originally I was going to use the generic allocator stuff ia64
uses for uncached allocations but it allocates memory ranges not pages
so it wasn't useful. I also suggested getting a page flag for uncached
allocator stuff, I was told to go write the code in my own corner and
prove it was required. So I did, and it was cleaned up and worked on by
others and I merged it. So can we lay off with the "in secret", the
original code is nearly 2 years old at this point and just because -mm
hackers choose to ignore it isn't our fault. Patches welcome.

So now back to the bug:

So the page pools are setup with gfp flags, in the normal case, 4 pools,
one WC GFP_HIGHUSER pages, one UC HIGHUSER pages, one WC GFP_USER|
GFP_DMA32, one UC GFP_USER|GFP_DMA32, so the pools are all fine, the
problem here is the same as before we added the pools, which is the
normal page allocation path, which needs the GFP_USER added instead of
GFP_KERNEL.

That said I've noticed a lot more page allocation failure reports in
2.6.35-rcX than we've gotten for a long time, in code that hasn't
changed (the AGP ones the other day for example) has something in the
core MM regressed (again... didn't this happen back in 2.6.31 or
something).

(cc'ing Mel who tracked these down before).

Dave.
Comment 11 Anonymous Emailer 2010-06-16 14:49:21 UTC
Reply-To: mel@csn.ul.ie

Hi Dave,

On Wed, Jun 16, 2010 at 08:57:09AM +1000, Dave Airlie wrote:
> > (switching back to email, actually)
> > 
> > On Sun, 13 Jun 2010 13:01:57 GMT
> > bugzilla-daemon@bugzilla.kernel.org wrote:
> > 
> > > https://bugzilla.kernel.org/show_bug.cgi?id=16148
> > >
> > >
> > > Mikko C. <mikko.cal@gmail.com> changed:
> > >
> > >            What    |Removed                     |Added
> > >
> ----------------------------------------------------------------------------
> > >                  CC|                            |mikko.cal@gmail.com
> > >
> > >
> > >
> > >
> > > --- Comment #8 from Mikko C. <mikko.cal@gmail.com>  2010-06-13 13:01:53
> ---
> > > I have been getting this with 2.6.35-rc2 and rc3.
> > > Could it be the same problem?
> > >
> > >
> > > X: page allocation failure. order:0, mode:0x4
> > > Pid: 1514, comm: X Not tainted 2.6.35-rc3 #1
> > > Call Trace:
> > >  [<ffffffff8108ce49>] ? __alloc_pages_nodemask+0x629/0x680
> > >  [<ffffffff8108c920>] ? __alloc_pages_nodemask+0x100/0x680
> > >  [<ffffffffa00db8f3>] ? ttm_get_pages+0x2c3/0x448 [ttm]
> > >  [<ffffffffa00d4658>] ? __ttm_tt_get_page+0x98/0xc0 [ttm]
> > >  [<ffffffffa00d4988>] ? ttm_tt_populate+0x48/0x90 [ttm]
> > >  [<ffffffffa00d4a26>] ? ttm_tt_bind+0x56/0xa0 [ttm]
> > >  [<ffffffffa00d5230>] ? ttm_bo_handle_move_mem+0x1d0/0x430 [ttm]
> > >  [<ffffffffa00d76d6>] ? ttm_bo_move_buffer+0x166/0x180 [ttm]
> > >  [<ffffffffa00b9736>] ? drm_mm_kmalloc+0x26/0xc0 [drm]
> > >  [<ffffffff81030ea9>] ? get_parent_ip+0x9/0x20
> > >  [<ffffffffa00d7786>] ? ttm_bo_validate+0x96/0x130 [ttm]
> > >  [<ffffffffa00d7b35>] ? ttm_bo_init+0x315/0x390 [ttm]
> > >  [<ffffffffa0122eb8>] ? radeon_bo_create+0x118/0x210 [radeon]
> > >  [<ffffffffa0122fb0>] ? radeon_ttm_bo_destroy+0x0/0xb0 [radeon]
> > >  [<ffffffffa013704c>] ? radeon_gem_object_create+0x8c/0x110 [radeon]
> > >  [<ffffffffa013711f>] ? radeon_gem_create_ioctl+0x4f/0xe0 [radeon]
> > >  [<ffffffffa00b10e6>] ? drm_ioctl+0x3d6/0x470 [drm]
> > >  [<ffffffffa01370d0>] ? radeon_gem_create_ioctl+0x0/0xe0 [radeon]
> > >  [<ffffffff810b965f>] ? do_sync_read+0xbf/0x100
> > >  [<ffffffff810c8965>] ? vfs_ioctl+0x35/0xd0
> > >  [<ffffffff810c8b28>] ? do_vfs_ioctl+0x88/0x530
> > >  [<ffffffff81031ed7>] ? sub_preempt_count+0x87/0xb0
> > >  [<ffffffff810c9019>] ? sys_ioctl+0x49/0x80
> > >  [<ffffffff810ba4fe>] ? sys_read+0x4e/0x90
> > >  [<ffffffff810024ab>] ? system_call_fastpath+0x16/0x1b
> > 
> > That's different.  ttm_get_pages() looks pretty busted to me.  It's not
> > using __GFP_WAIT and it's not using __GFP_FS.  It's using a plain
> > GFP_DMA32 so it's using atomic allocations even though it doesn't need
> > to.  IOW, it's shooting itself in the head.
> > 

This is an accurate assessment.

> > Given that it will sometimes use GFP_HIGHUSER which includes __GFP_FS
> > and __GFP_WAIT, I assume it can always include __GFP_FS and __GFP_WAIT.
> > If so, it should very much do so.  If not then the function is
> > misdesigned and should be altered to take a gfp_t argument so the
> > caller can tell ttm_get_pages() which is the strongest allocation mode
> > which it may use.
> > 
> > > [TTM] Unable to allocate page.
> > > radeon 0000:01:05.0: object_init failed for (7827456, 0x00000002)
> > > [drm:radeon_gem_object_create] *ERROR* Failed to allocate GEM object
> (7827456,
> > > 2, 4096, -12)
> > 
> > This bug actually broke stuff for you.
> > 
> > Something like this:
> > 
> > --- a/drivers/gpu/drm/ttm/ttm_page_alloc.c~a
> > +++ a/drivers/gpu/drm/ttm/ttm_page_alloc.c
> > @@ -677,7 +677,7 @@ int ttm_get_pages(struct list_head *page
> >     /* No pool for cached pages */
> >     if (pool == NULL) {
> >             if (flags & TTM_PAGE_FLAG_DMA32)
> > -                   gfp_flags |= GFP_DMA32;
> > +                   gfp_flags |= GFP_KERNEL|GFP_DMA32;
> >             else
> >                     gfp_flags |= GFP_HIGHUSER;
> >  
> > _
> > 
> > although I wonder whether it should be using pool->gfp_flags.
> > 
> > 
> > It's a shame that this code was developed and merged in secret :( Had
> > we known, we could have looked at enhancing mempools to cover the
> > requirement, or at implementing this in some generic fashion rather
> > than hiding it down in drivers/gpu/drm.
> 
> Its been post to lkml at least once or twice over the past few years
> though not as much as it was posted to dri-devel, but that was because
> we had never seen anyone show any interest in it outside of kernel
> hackers. Originally I was going to use the generic allocator stuff ia64
> uses for uncached allocations but it allocates memory ranges not pages
> so it wasn't useful. I also suggested getting a page flag for uncached
> allocator stuff, I was told to go write the code in my own corner and
> prove it was required. So I did, and it was cleaned up and worked on by
> others and I merged it. So can we lay off with the "in secret", the
> original code is nearly 2 years old at this point and just because -mm
> hackers choose to ignore it isn't our fault. Patches welcome.
> 

No comment on this aspect of things.

> So now back to the bug:
> 
> So the page pools are setup with gfp flags, in the normal case, 4 pools,
> one WC GFP_HIGHUSER pages, one UC HIGHUSER pages, one WC GFP_USER|
> GFP_DMA32, one UC GFP_USER|GFP_DMA32, so the pools are all fine, the
> problem here is the same as before we added the pools, which is the
> normal page allocation path, which needs the GFP_USER added instead of
> GFP_KERNEL.
> 
> That said I've noticed a lot more page allocation failure reports in
> 2.6.35-rcX than we've gotten for a long time, in code that hasn't
> changed (the AGP ones the other day for example) has something in the
> core MM regressed (again... didn't this happen back in 2.6.31 or
> something).
> 
> (cc'ing Mel who tracked these down before).
> 

Ok, so there has been very little changed in the page allocator that
might have caused this. The only possibility I can think of is
[6dda9d: page allocator: reduce fragmentation in buddy allocator by
adding buddies that are merging to the tail of the free lists] but I
would have expected the patch to help, not hinder, this situation.

In the last few kernel releases when there has been a spike in high-order
allocation, it has been largely due to an increased source of high-order
callers from somewhere. I am 99% sure the last time I encountered this, I used
the following script to debug it even though it's a bit of a hack. It uses
ftrace to track high-order allocators and reports the number of high-order
allocations (either atomic or normal) and the count of unique backtraces
(to point the finger at bad callers)

Unfortunately, I don't have access to a suitable machine to test this with
right now (my laptop isn't making high-order allocations to confirm yet)
but I'm pretty sure it's the right one!

If you do some fixed set of tests with 2.6.34 and 2.6.35-rc2 with this script
running and compare the reports, it might point the finger at a new source
of frequent high-order allocations that might be leading to more failures.
Do something like

./watch-highorder.pl -o highorder-2.6.34-report.txt
PID=$!
Do your tests
# Two sigints in quick succession to make the monitor exit
kill $PID
kill $PID 

==== CUT HERE ====
#!/usr/bin/perl
# This is a tool that analyses the trace output related to page allocation,
# sums up the number of high-order allocations taking place and who the
# callers are
#
# Example usage: trace-pagealloc-highorder.pl -o output-report.txt
# options
# -o, --output	Where to store the report
#
# Copyright (c) IBM Corporation 2009
# Author: Mel Gorman <mel@csn.ul.ie>
use strict;
use Getopt::Long;

# Tracepoint events
use constant MM_PAGE_ALLOC		=> 1;
use constant EVENT_UNKNOWN		=> 7;

use constant HIGH_NORMAL_HIGHORDER_ALLOC	=> 10;
use constant HIGH_ATOMIC_HIGHORDER_ALLOC	=> 11;

my $opt_output;
my %stats;
my %unique_events;
my $last_updated = 0;

$stats{HIGH_NORMAL_HIGHORDER_ALLOC} = 0;
$stats{HIGH_ATOMIC_HIGHORDER_ALLOC} = 0;

# Catch sigint and exit on request
my $sigint_report = 0;
my $sigint_exit = 0;
my $sigint_pending = 0;
my $sigint_received = 0;
sub sigint_handler {
	my $current_time = time;
	if ($current_time - 2 > $sigint_received) {
		print "SIGINT received, report pending. Hit ctrl-c again to exit\n";
		$sigint_report = 1;
	} else {
		if (!$sigint_exit) {
			print "Second SIGINT received quickly, exiting\n";
		}
		$sigint_exit++;
	}

	if ($sigint_exit > 3) {
		print "Many SIGINTs received, exiting now without report\n";
		exit;
	}

	$sigint_received = $current_time;
	$sigint_pending = 1;
}
$SIG{INT} = "sigint_handler";

# Parse command line options
GetOptions(
	'output=s'    => \$opt_output,
);

# Defaults for dynamically discovered regex's
my $regex_pagealloc_default = 'page=([0-9a-f]*) pfn=([0-9]*) order=([-0-9]*) migratetype=([-0-9]*) gfp_flags=([A-Z_|]*)';

# Dyanically discovered regex
my $regex_pagealloc;

# Static regex used. Specified like this for readability and for use with /o
#                      (process_pid)     (cpus      )   ( time  )   (tpoint    ) (details)
my $regex_traceevent = '\s*([a-zA-Z0-9-]*)\s*(\[[0-9]*\])\s*([0-9.]*):\s*([a-zA-Z_]*):\s*(.*)';
my $regex_statname = '[-0-9]*\s\((.*)\).*';
my $regex_statppid = '[-0-9]*\s\(.*\)\s[A-Za-z]\s([0-9]*).*';

sub generate_traceevent_regex {
	my $event = shift;
	my $default = shift;
	my $regex;

	# Read the event format or use the default
	if (!open (FORMAT, "/sys/kernel/debug/tracing/events/$event/format")) {
		$regex = $default;
	} else {
		my $line;
		while (!eof(FORMAT)) {
			$line = <FORMAT>;
			if ($line =~ /^print fmt:\s"(.*)",.*/) {
				$regex = $1;
				$regex =~ s/%p/\([0-9a-f]*\)/g;
				$regex =~ s/%d/\([-0-9]*\)/g;
				$regex =~ s/%lu/\([0-9]*\)/g;
				$regex =~ s/%s/\([A-Z_|]*\)/g;
				$regex =~ s/\(REC->gfp_flags\).*/REC->gfp_flags/;
				$regex =~ s/\",.*//;
			}
		}
	}

	# Verify fields are in the right order
	my $tuple;
	foreach $tuple (split /\s/, $regex) {
		my ($key, $value) = split(/=/, $tuple);
		my $expected = shift;
		if ($key ne $expected) {
			print("WARNING: Format not as expected '$key' != '$expected'");
			$regex =~ s/$key=\((.*)\)/$key=$1/;
		}
	}

	if (defined shift) {
		die("Fewer fields than expected in format");
	}

	return $regex;
}
$regex_pagealloc = generate_traceevent_regex("kmem/mm_page_alloc",
			$regex_pagealloc_default,
			"page", "pfn",
			"order", "migratetype", "gfp_flags");

sub process_events {
	my $traceevent;
	my $process_pid = 0;
	my $cpus;
	my $timestamp;
	my $tracepoint;
	my $details;
	my $statline;
	my $nextline = 1;

	# Read each line of the event log
EVENT_PROCESS:
	while (($traceevent = <TRACING>) && !$sigint_exit) {
SKIP_LINEREAD:

		if ($traceevent eq "") {
			last EVENT_PROCSS;
		}

		if ($traceevent =~ /$regex_traceevent/o) {
			$process_pid = $1;
			$tracepoint = $4;
		} else {
			next;
		}

		# Perl Switch() sucks majorly
		if ($tracepoint eq "mm_page_alloc") {
			my ($page, $order, $gfp_flags, $type);
			my ($atomic);
			my $details = $5;

			if ($details !~ /$regex_pagealloc/o) {
				print "WARNING: Failed to parse mm_page_alloc as expected\n";
				print "$details\n";
				print "$regex_pagealloc\n";
				print "\n";
				next;
			}
			$page = $1;
			$order = $3;
			$gfp_flags = $5;

			# Only concerned with high-order allocs
			if ($order == 0) {
				next;
			}

			$stats{MM_PAGE_ALLOC}++;

			if ($gfp_flags =~ /ATOMIC/) {
				$stats{HIGH_ATOMIC_HIGHORDER_ALLOC}++;
				$type = "atomic";
			} else {
				$stats{HIGH_NORMAL_HIGHORDER_ALLOC}++;
				$type = "normal";
			}

			# Record the stack trace
			$traceevent = <TRACING>;
			if ($traceevent !~ /stack trace/) {
				goto SKIP_LINEREAD;
			}
			my $event = "order=$order $type gfp_flags=$gfp_flags\n";;
			while ($traceevent = <TRACING>) {
				if ($traceevent !~ /^ =>/) {
					$unique_events{$event}++;
					my $current = time;

					if ($current - $last_updated > 60) {
						$last_updated = $current;
						update_report();
					}
					goto SKIP_LINEREAD;
				}
				$event .= $traceevent;
			}
		} else {
			$stats{EVENT_UNKNOWN}++;
		}

		if ($sigint_pending) {
			last EVENT_PROCESS;
		}
	}
}

sub update_report() {
	my $event;
	open (REPORT, ">$opt_output") || die ("Failed to open $opt_output for writing");

	foreach $event (keys %unique_events) {
		print REPORT $unique_events{$event} . " instances $event\n";
	}
	print REPORT "High-order normal allocations: " . $stats{HIGH_NORMAL_HIGHORDER_ALLOC} . "\n";
	print REPORT "High-order atomic allocations: " . $stats{HIGH_ATOMIC_HIGHORDER_ALLOC} . "\n";

	close REPORT;
}

sub print_report() {
	print "\nReport\n";
	open (REPORT, $opt_output) || die ("Failed to open $opt_output for reading");
	while (<REPORT>) {
		print $_;
	}
	close REPORT;
}

# Process events or signals until neither is available
sub signal_loop() {
	my $sigint_processed;
	do {
		$sigint_processed = 0;
		process_events();

		# Handle pending signals if any
		if ($sigint_pending) {
			my $current_time = time;

			if ($sigint_exit) {
				print "Received exit signal\n";
				$sigint_pending = 0;
			}
			if ($sigint_report) {
				if ($current_time >= $sigint_received + 2) {
					update_report();
					print_report();
					$sigint_report = 0;
					$sigint_pending = 0;
					$sigint_processed = 1;
					$sigint_exit = 0;
				}
			}
		}
	} while ($sigint_pending || $sigint_processed);
}

sub set_traceoption($) {
	my $option = shift;

	open(TRACEOPT, ">/sys/kernel/debug/tracing/trace_options") || die("Failed to open trace_options");
	print TRACEOPT $option;
	close TRACEOPT;
}

sub enable_tracing($) {
	my $tracing = shift;

	open(TRACING, ">/sys/kernel/debug/tracing/events/$tracing/enable") || die("Failed to open tracing event $tracing");
	print TRACING "1";
	close TRACING;
}

sub disable_tracing($) {
	my $tracing = shift;

	open(TRACING, ">/sys/kernel/debug/tracing/events/$tracing/enable") || die("Failed to open tracing event $tracing");
	print TRACING "0";
	close TRACING;

}

set_traceoption("stacktrace");
set_traceoption("sym-offset");
set_traceoption("sym-addr");
enable_tracing("kmem/mm_page_alloc");
open(TRACING, "/sys/kernel/debug/tracing/trace_pipe") || die("Failed to open trace_pipe");
signal_loop();
close TRACING;
disable_tracing("kmem/mm_page_alloc");
update_report();
print_report();