Bug 8464

Summary: autoreconf: page allocation failure. order:2, mode:0x84020
Product: Alternate Trees Reporter: Nicolas Mailhot (Nicolas.Mailhot)
Component: mmAssignee: Christoph Lameter (clameter)
Status: CLOSED PATCH_ALREADY_AVAILABLE    
Severity: normal CC: akpm
Priority: P2    
Hardware: i386   
OS: Linux   
Kernel Version: 2.6.21-mm2 with SLUB Subsystem:
Regression: --- Bisected commit-id:
Attachments: full kernel logs
kernel config
lspci

Description Nicolas Mailhot 2007-05-10 14:28:17 UTC
Most recent kernel where this bug did *NOT* occur: 2.6.21-rc6.mm1 with SLAB
Distribution: Fedora Devel
Hardware Environment: AMD X2 on CK804
Software Environment: N/A
Problem Description:

Just noticed this in kernel logs :

 autoreconf: page allocation failure. order:2, mode:0x84020
May 10 20:13:13 rousalka kernel: 
May 10 20:13:13 rousalka kernel: Call Trace:
May 10 20:13:13 rousalka kernel: [<ffffffff8025b56a>] __alloc_pages+0x2aa/0x2c3
May 10 20:13:13 rousalka kernel: [<ffffffff8029c05f>] bio_alloc+0x10/0x1f
May 10 20:13:13 rousalka kernel: [<ffffffff8027519d>] __slab_alloc+0x196/0x586
May 10 20:13:13 rousalka kernel: [<ffffffff80300d21>]
radix_tree_node_alloc+0x36/0x7e
May 10 20:13:13 rousalka kernel: [<ffffffff80275922>] kmem_cache_alloc+0x32/0x4e
May 10 20:13:13 rousalka kernel: [<ffffffff80300d21>]
radix_tree_node_alloc+0x36/0x7e
May 10 20:13:13 rousalka kernel: [<ffffffff803011a4>] radix_tree_insert+0xcb/0x18c
May 10 20:13:13 rousalka kernel: [<ffffffff88029bd0>] :ext3:ext3_get_block+0x0/0xe4
May 10 20:13:13 rousalka kernel: [<ffffffff80256ac4>] add_to_page_cache+0x3d/0x95
May 10 20:13:13 rousalka kernel: [<ffffffff8029fe29>] mpage_readpages+0x85/0x12c
May 10 20:13:13 rousalka kernel: [<ffffffff88029bd0>] :ext3:ext3_get_block+0x0/0xe4
May 10 20:13:13 rousalka kernel: [<ffffffff8025cde1>]
__do_page_cache_readahead+0x158/0x22d
May 10 20:13:13 rousalka kernel: [<ffffffff88084aa7>]
:dm_mod:dm_table_any_congested+0x46/0x63
May 10 20:13:13 rousalka kernel: [<ffffffff88082ce8>]
:dm_mod:dm_any_congested+0x3b/0x42
May 10 20:13:13 rousalka kernel: [<ffffffff80258802>] filemap_fault+0x162/0x347
May 10 20:13:13 rousalka kernel: [<ffffffff80261c66>] __do_fault+0x66/0x446
May 10 20:13:13 rousalka kernel: [<ffffffff80263ca9>] __handle_mm_fault+0x4b1/0x8f5
May 10 20:13:13 rousalka kernel: [<ffffffff80419e84>] do_page_fault+0x39a/0x7b7
May 10 20:13:13 rousalka kernel: [<ffffffff80419f31>] do_page_fault+0x447/0x7b7
May 10 20:13:13 rousalka kernel: [<ffffffff8041847d>] error_exit+0x0/0x84
Comment 1 Nicolas Mailhot 2007-05-10 14:30:55 UTC
Created attachment 11469 [details]
full kernel logs
Comment 2 Nicolas Mailhot 2007-05-10 14:31:26 UTC
Created attachment 11470 [details]
kernel config
Comment 3 Nicolas Mailhot 2007-05-10 14:31:56 UTC
Created attachment 11471 [details]
lspci
Comment 4 Nicolas Mailhot 2007-05-10 14:32:34 UTC
may be a continuation of bug #8460
Comment 5 Christoph Lameter 2007-05-10 14:42:54 UTC
The issue with higher order alloc is Mel Gorman's area. I cannot add his CC though.

His email is Mel Gorman <mel@skynet.ie>

Could someone fix up the bug tracking system please?

If you want to avoid higher order allocs (for some reason we run out... Mel
needs to know about this!!) then boot with

slub_max_order=1 slub_min_objects=4
Comment 6 Christoph Lameter 2007-05-10 14:49:35 UTC
On Thu, 10 May 2007, Andrew Morton wrote:

> Christoph, can we please take a look at /proc/slabinfo and its slub
> equivalent (I forget what that is?) and review any and all changes to the
> underlying allocation size for each cache?
> 
> Because this is *not* something we should change lightly.

It was changed specially for mm in order to stress the antifrag code. If 
this causes trouble then do not merge the patches against SLUB that 
exploit the antifrag methods. This failure should help see how effective 
Mel's antifrag patches are. He needs to get on this dicussion.

Upstream has slub_max_order=1.

Comment 7 Nicolas Mailhot 2007-05-10 14:50:39 UTC
Well the point is to test and report. If I wanted to avoid all problems, I
wouldn't be running mm.

So no kernel boot argument cheating for me :)
Comment 8 Christoph Lameter 2007-05-10 14:53:06 UTC
> ------- Additional Comments From Nicolas.Mailhot@LaPoste.net  2007-05-10
> 14:50 -------
> Well the point is to test and report. If I wanted to avoid all problems, I
> wouldn't be running mm.
> 
> So no kernel boot argument cheating for me :)

Then please assign the bug to Mel.... The only think that I can do is to 
switch the code in mm off that exercises the antifrag patches.
Comment 9 Anonymous Emailer 2007-05-10 15:07:04 UTC
Reply-To: mel@skynet.skynet.ie

On (10/05/07 14:49), Christoph Lameter didst pronounce:
> On Thu, 10 May 2007, Andrew Morton wrote:
> 
> > Christoph, can we please take a look at /proc/slabinfo and its slub
> > equivalent (I forget what that is?) and review any and all changes to the
> > underlying allocation size for each cache?
> > 
> > Because this is *not* something we should change lightly.
> 
> It was changed specially for mm in order to stress the antifrag code. If 
> this causes trouble then do not merge the patches against SLUB that 
> exploit the antifrag methods. This failure should help see how effective 
> Mel's antifrag patches are. He needs to get on this dicussion.
> 

The antfrag mechanism depends on the caller being able to sleep and reclaim
pages if necessary to get the contiguous allocation. No attempts are being
currently made to keep pages at a particular order free.

I see the gfpmask was 0x84020. That doesn't look like __GFP_WAIT was set,
right? Does that mean that SLUB is trying to allocate pages atomically? If so,
it would explain why this situation could still occur even though high-order
allocations that could sleep would succeed.

> Upstream has slub_max_order=1.

Comment 10 Christoph Lameter 2007-05-10 15:11:55 UTC
On Thu, 10 May 2007, Mel Gorman wrote:

> I see the gfpmask was 0x84020. That doesn't look like __GFP_WAIT was set,
> right? Does that mean that SLUB is trying to allocate pages atomically? If so,
> it would explain why this situation could still occur even though high-order
> allocations that could sleep would succeed.

SLUB is following the gfp mask of the caller like all well behaved slab 
allocators do. If the caller does not set __GFP_WAIT then the page 
allocator also cannot wait.

Comment 11 Nicolas Mailhot 2007-05-10 15:13:39 UTC
> Then please assign the bug to Mel.... The only think that I can do is to 
> switch the code in mm off that exercises the antifrag patches.

No one can assign the bug to Mel if Mel has not declared himself in bugzilla:(
Comment 12 Anonymous Emailer 2007-05-10 15:16:12 UTC
Reply-To: mel@skynet.skynet.ie

On (10/05/07 15:11), Christoph Lameter didst pronounce:
> On Thu, 10 May 2007, Mel Gorman wrote:
> 
> > I see the gfpmask was 0x84020. That doesn't look like __GFP_WAIT was set,
> > right? Does that mean that SLUB is trying to allocate pages atomically? If so,
> > it would explain why this situation could still occur even though high-order
> > allocations that could sleep would succeed.
> 
> SLUB is following the gfp mask of the caller like all well behaved slab 
> allocators do. If the caller does not set __GFP_WAIT then the page 
> allocator also cannot wait.

Then SLUB should not use the higher orders for slab allocations that cannot
sleep during allocations. What could be done in the longer term is decide
how to tell kswapd to keep pages free at an order other than 0 when it is
known there are a large number of high-order long-lived allocations like this.

Comment 13 Christoph Lameter 2007-05-10 15:27:22 UTC
On Thu, 10 May 2007, Mel Gorman wrote:

> On (10/05/07 15:11), Christoph Lameter didst pronounce:
> > On Thu, 10 May 2007, Mel Gorman wrote:
> > 
> > > I see the gfpmask was 0x84020. That doesn't look like __GFP_WAIT was set,
> > > right? Does that mean that SLUB is trying to allocate pages atomically? If so,
> > > it would explain why this situation could still occur even though high-order
> > > allocations that could sleep would succeed.
> > 
> > SLUB is following the gfp mask of the caller like all well behaved slab 
> > allocators do. If the caller does not set __GFP_WAIT then the page 
> > allocator also cannot wait.
> 
> Then SLUB should not use the higher orders for slab allocations that cannot
> sleep during allocations. What could be done in the longer term is decide
> how to tell kswapd to keep pages free at an order other than 0 when it is
> known there are a large number of high-order long-lived allocations like this.

I cannot predict how allocations on a slab will be performed. In order 
to avoid the higher order allocations in we would have to add a flag 
that tells SLUB at slab creation creation time that this cache will be 
used for atomic allocs and thus we can avoid configuring slabs in such a 
way that they use higher order allocs.

The other solution is not to use higher order allocations by dropping the 
antifrag patches in mm that allow SLUB to use higher order allocations. 
But then there would be no higher order allocations at all that would use 
the benefits of antifrag measures.

Comment 14 Anonymous Emailer 2007-05-10 15:44:47 UTC
Reply-To: mel@skynet.skynet.ie

On (10/05/07 15:27), Christoph Lameter didst pronounce:
> On Thu, 10 May 2007, Mel Gorman wrote:
> 
> > On (10/05/07 15:11), Christoph Lameter didst pronounce:
> > > On Thu, 10 May 2007, Mel Gorman wrote:
> > > 
> > > > I see the gfpmask was 0x84020. That doesn't look like __GFP_WAIT was set,
> > > > right? Does that mean that SLUB is trying to allocate pages atomically? If so,
> > > > it would explain why this situation could still occur even though high-order
> > > > allocations that could sleep would succeed.
> > > 
> > > SLUB is following the gfp mask of the caller like all well behaved slab 
> > > allocators do. If the caller does not set __GFP_WAIT then the page 
> > > allocator also cannot wait.
> > 
> > Then SLUB should not use the higher orders for slab allocations that cannot
> > sleep during allocations. What could be done in the longer term is decide
> > how to tell kswapd to keep pages free at an order other than 0 when it is
> > known there are a large number of high-order long-lived allocations like this.
> 
> I cannot predict how allocations on a slab will be performed. In order 
> to avoid the higher order allocations in we would have to add a flag 
> that tells SLUB at slab creation creation time that this cache will be 
> used for atomic allocs and thus we can avoid configuring slabs in such a 
> way that they use higher order allocs.
> 

It is an option. I had the gfp flags passed in to kmem_cache_create() in
mind for determining this but SLUB creates slabs differently and different
flags could be passed into kmem_cache_alloc() of course.

> The other solution is not to use higher order allocations by dropping the 
> antifrag patches in mm that allow SLUB to use higher order allocations. 
> But then there would be no higher order allocations at all that would
> use the benefits of antifrag measures.

That would be an immediate solution.

Another alternative is that anti-frag used to also group high-order
allocations together and make it hard to fallback to those areas
for non-atomic allocations. It is currently backed out by the
patch dont-group-high-order-atomic-allocations.patch because
it was intended for rare high-order short-lived allocations
such as e1000 that are currently dealt with by MIGRATE_RESERVE
(bias-the-location-of-pages-freed-for-min_free_kbytes-in-the-same-max_order_nr_pages-blocks.patch)
. The high-order atomic groupings may help here because the high-order
allocations are long-lived and would claim contiguous areas.

The last alternative I think I mentioned already is to have the minimum
order kswapd reclaims as the same order SLUB uses instead of 0 so that
min_free_kbytes is kept at higher orders than current.

Comment 15 Christoph Lameter 2007-05-10 15:49:19 UTC
On Thu, 10 May 2007, Mel Gorman wrote:

> > I cannot predict how allocations on a slab will be performed. In order 
> > to avoid the higher order allocations in we would have to add a flag 
> > that tells SLUB at slab creation creation time that this cache will be 
> > used for atomic allocs and thus we can avoid configuring slabs in such a 
> > way that they use higher order allocs.
> > 
> 
> It is an option. I had the gfp flags passed in to kmem_cache_create() in
> mind for determining this but SLUB creates slabs differently and different
> flags could be passed into kmem_cache_alloc() of course.

So we have a collection of flags to add

SLAB_USES_ATOMIC
SLAB_TEMPORARY
SLAB_PERSISTENT
SLAB_RECLAIMABLE
SLAB_MOVABLE

?

> Another alternative is that anti-frag used to also group high-order
> allocations together and make it hard to fallback to those areas
> for non-atomic allocations. It is currently backed out by the
> patch dont-group-high-order-atomic-allocations.patch because
> it was intended for rare high-order short-lived allocations
> such as e1000 that are currently dealt with by MIGRATE_RESERVE
> (bias-the-location-of-pages-freed-for-min_free_kbytes-in-the-same-max_order_nr_pages-blocks.patch)
>  The high-order atomic groupings may help here because the high-order
> allocations are long-lived and would claim contiguous areas.
> 
> The last alternative I think I mentioned already is to have the minimum
> order kswapd reclaims as the same order SLUB uses instead of 0 so that
> min_free_kbytes is kept at higher orders than current.

Would you get a patch to Nicholas to test either of these solutions?

Comment 16 Anonymous Emailer 2007-05-10 16:00:50 UTC
Reply-To: mel@skynet.skynet.ie

On (10/05/07 15:49), Christoph Lameter didst pronounce:
> On Thu, 10 May 2007, Mel Gorman wrote:
> 
> > > I cannot predict how allocations on a slab will be performed. In order 
> > > to avoid the higher order allocations in we would have to add a flag 
> > > that tells SLUB at slab creation creation time that this cache will be 
> > > used for atomic allocs and thus we can avoid configuring slabs in such a 
> > > way that they use higher order allocs.
> > > 
> > 
> > It is an option. I had the gfp flags passed in to kmem_cache_create() in
> > mind for determining this but SLUB creates slabs differently and different
> > flags could be passed into kmem_cache_alloc() of course.
> 
> So we have a collection of flags to add
> 
> SLAB_USES_ATOMIC

This is a possibility.

> SLAB_TEMPORARY

I have a patch for this sitting in a queue waiting for testing

> SLAB_PERSISTENT
> SLAB_RECLAIMABLE
> SLAB_MOVABLE

I don't think these are required because the necessary information is
available from the GFP flags.

> 
> ?
> 
> > Another alternative is that anti-frag used to also group high-order
> > allocations together and make it hard to fallback to those areas
> > for non-atomic allocations. It is currently backed out by the
> > patch dont-group-high-order-atomic-allocations.patch because
> > it was intended for rare high-order short-lived allocations
> > such as e1000 that are currently dealt with by MIGRATE_RESERVE
> > (bias-the-location-of-pages-freed-for-min_free_kbytes-in-the-same-max_order_nr_pages-blocks.patch)
> >  The high-order atomic groupings may help here because the high-order
> > allocations are long-lived and would claim contiguous areas.
> > 
> > The last alternative I think I mentioned already is to have the minimum
> > order kswapd reclaims as the same order SLUB uses instead of 0 so that
> > min_free_kbytes is kept at higher orders than current.
> 
> Would you get a patch to Nicholas to test either of these solutions?

I do not have a kswapd related patch ready but the first alternative is
readily available.

Nicholas, could you backout the patch
dont-group-high-order-atomic-allocations.patch and test again please?
The following patch has the same effect. Thanks

diff -rup -X /usr/src/patchset-0.6/bin//dontdiff linux-2.6.21-mm2-clean/include/linux/mmzone.h linux-2.6.21-mm2-grouphigh/include/linux/mmzone.h
--- linux-2.6.21-mm2-clean/include/linux/mmzone.h	2007-05-09 10:21:28.000000000 +0100
+++ linux-2.6.21-mm2-grouphigh/include/linux/mmzone.h	2007-05-10 23:54:45.000000000 +0100
@@ -38,8 +38,9 @@ extern int page_group_by_mobility_disabl
 #define MIGRATE_UNMOVABLE     0
 #define MIGRATE_RECLAIMABLE   1
 #define MIGRATE_MOVABLE       2
-#define MIGRATE_RESERVE       3
-#define MIGRATE_TYPES         4
+#define MIGRATE_HIGHATOMIC    3
+#define MIGRATE_RESERVE       4
+#define MIGRATE_TYPES         5
 
 #define for_each_migratetype_order(order, type) \
 	for (order = 0; order < MAX_ORDER; order++) \
diff -rup -X /usr/src/patchset-0.6/bin//dontdiff linux-2.6.21-mm2-clean/include/linux/pageblock-flags.h linux-2.6.21-mm2-grouphigh/include/linux/pageblock-flags.h
--- linux-2.6.21-mm2-clean/include/linux/pageblock-flags.h	2007-05-09 10:21:28.000000000 +0100
+++ linux-2.6.21-mm2-grouphigh/include/linux/pageblock-flags.h	2007-05-10 23:54:45.000000000 +0100
@@ -31,7 +31,7 @@
 
 /* Bit indices that affect a whole block of pages */
 enum pageblock_bits {
-	PB_range(PB_migrate, 2), /* 2 bits required for migrate types */
+	PB_range(PB_migrate, 3), /* 3 bits required for migrate types */
 	NR_PAGEBLOCK_BITS
 };
 
diff -rup -X /usr/src/patchset-0.6/bin//dontdiff linux-2.6.21-mm2-clean/mm/page_alloc.c linux-2.6.21-mm2-grouphigh/mm/page_alloc.c
--- linux-2.6.21-mm2-clean/mm/page_alloc.c	2007-05-09 10:21:28.000000000 +0100
+++ linux-2.6.21-mm2-grouphigh/mm/page_alloc.c	2007-05-10 23:54:45.000000000 +0100
@@ -167,6 +167,11 @@ static inline int allocflags_to_migratet
 	if (unlikely(page_group_by_mobility_disabled))
 		return MIGRATE_UNMOVABLE;
 
+	/* Cluster high-order atomic allocations together */
+	if (unlikely(order > 0) &&
+			(!(gfp_flags & __GFP_WAIT) || in_interrupt()))
+		return MIGRATE_HIGHATOMIC;
+
 	/* Cluster based on mobility */
 	return (((gfp_flags & __GFP_MOVABLE) != 0) << 1) |
 		((gfp_flags & __GFP_RECLAIMABLE) != 0);
@@ -713,10 +718,11 @@ static struct page *__rmqueue_smallest(s
  * the free lists for the desirable migrate type are depleted
  */
 static int fallbacks[MIGRATE_TYPES][MIGRATE_TYPES-1] = {
-	[MIGRATE_UNMOVABLE]   = { MIGRATE_RECLAIMABLE, MIGRATE_MOVABLE,   MIGRATE_RESERVE },
-	[MIGRATE_RECLAIMABLE] = { MIGRATE_UNMOVABLE,   MIGRATE_MOVABLE,   MIGRATE_RESERVE },
-	[MIGRATE_MOVABLE]     = { MIGRATE_RECLAIMABLE, MIGRATE_UNMOVABLE, MIGRATE_RESERVE },
-	[MIGRATE_RESERVE]     = { MIGRATE_RESERVE,     MIGRATE_RESERVE,   MIGRATE_RESERVE }, /* Never used */
+	[MIGRATE_UNMOVABLE]   = { MIGRATE_RECLAIMABLE, MIGRATE_MOVABLE,   MIGRATE_HIGHATOMIC, MIGRATE_RESERVE },
+	[MIGRATE_RECLAIMABLE] = { MIGRATE_UNMOVABLE,   MIGRATE_MOVABLE,   MIGRATE_HIGHATOMIC, MIGRATE_RESERVE },
+	[MIGRATE_MOVABLE]     = { MIGRATE_RECLAIMABLE, MIGRATE_UNMOVABLE, MIGRATE_HIGHATOMIC, MIGRATE_RESERVE },
+	[MIGRATE_HIGHATOMIC]  = { MIGRATE_RECLAIMABLE, MIGRATE_UNMOVABLE, MIGRATE_MOVABLE,    MIGRATE_RESERVE },
+	[MIGRATE_RESERVE]     = { MIGRATE_RESERVE,     MIGRATE_RESERVE,   MIGRATE_RESERVE,    MIGRATE_RESERVE }, /* Never used */
 };
 
 /*
@@ -810,7 +816,9 @@ static struct page *__rmqueue_fallback(s
 	int current_order;
 	struct page *page;
 	int migratetype, i;
+	int nonatomic_fallback_atomic = 0;
 
+retry:
 	/* Find the largest possible block of pages in the other list */
 	for (current_order = MAX_ORDER-1; current_order >= order;
 						--current_order) {
@@ -820,6 +828,14 @@ static struct page *__rmqueue_fallback(s
 			/* MIGRATE_RESERVE handled later if necessary */
 			if (migratetype == MIGRATE_RESERVE)
 				continue;
+			/*
+			 * Make it hard to fallback to blocks used for
+			 * high-order atomic allocations
+			 */
+			if (migratetype == MIGRATE_HIGHATOMIC &&
+				start_migratetype != MIGRATE_UNMOVABLE &&
+				!nonatomic_fallback_atomic)
+				continue;
 
 			area = &(zone->free_area[current_order]);
 			if (list_empty(&area->free_list[migratetype]))
@@ -845,7 +861,8 @@ static struct page *__rmqueue_fallback(s
 								start_migratetype);
 
 				/* Claim the whole block if over half of it is free */
-				if ((pages << current_order) >= (1 << (MAX_ORDER-2)))
+				if ((pages << current_order) >= (1 << (MAX_ORDER-2)) &&
+						migratetype != MIGRATE_HIGHATOMIC)
 					set_pageblock_migratetype(page,
 								start_migratetype);
 
@@ -867,6 +884,12 @@ static struct page *__rmqueue_fallback(s
 		}
 	}
 
+	/* Allow fallback to high-order atomic blocks if memory is that low */
+	if (!nonatomic_fallback_atomic) {
+		nonatomic_fallback_atomic = 1;
+		goto retry;
+	}
+
 	/* Use MIGRATE_RESERVE rather than fail an allocation */
 	return __rmqueue_smallest(zone, order, MIGRATE_RESERVE);
 }
Comment 17 Christoph Lameter 2007-05-10 16:01:42 UTC
On Fri, 11 May 2007, Mel Gorman wrote:

> Nicholas, could you backout the patch
> dont-group-high-order-atomic-allocations.patch and test again please?
> The following patch has the same effect. Thanks

Great! Thanks.

Comment 18 Nicolas Mailhot 2007-05-10 22:56:53 UTC
Le jeudi 10 mai 2007 à 16:01 -0700, Christoph Lameter a écrit :
> On Fri, 11 May 2007, Mel Gorman wrote:
> 
> > Nicholas, could you backout the patch
> > dont-group-high-order-atomic-allocations.patch and test again please?
> > The following patch has the same effect. Thanks
> 
> Great! Thanks.

The proposed patch did not apply

+ cd /builddir/build/BUILD
+ rm -rf linux-2.6.21
+ /usr/bin/bzip2 -dc /builddir/build/SOURCES/linux-2.6.21.tar.bz2
+ tar -xf -
+ STATUS=0
+ '[' 0 -ne 0 ']'
+ cd linux-2.6.21
++ /usr/bin/id -u
+ '[' 499 = 0 ']'
++ /usr/bin/id -u
+ '[' 499 = 0 ']'
+ /bin/chmod -Rf a+rX,u+w,g-w,o-w .
+ echo 'Patch #2 (2.6.21-mm2.bz2):'
Patch #2 (2.6.21-mm2.bz2):
+ /usr/bin/bzip2 -d
+ patch -p1 -s
+ STATUS=0
+ '[' 0 -ne 0 ']'
+ echo 'Patch #3 (md-improve-partition-detection-in-md-array.patch):'
Patch #3 (md-improve-partition-detection-in-md-array.patch):
+ patch -p1 -R -s
+ echo 'Patch #4 (bug-8464.patch):'
Patch #4 (bug-8464.patch):
+ patch -p1 -s
1 out of 1 hunk FAILED -- saving rejects to file
include/linux/pageblock-flags.h
.rej
6 out of 6 hunks FAILED -- saving rejects to file mm/page_alloc.c.rej

Backing out dont-group-high-order-atomic-allocations.patch worked and
seems to have cured the system so far (need to charge it a bit longer to
be sure)

Comment 19 Anonymous Emailer 2007-05-11 02:08:31 UTC
Reply-To: mel@skynet.ie

On (11/05/07 07:56), Nicolas Mailhot didst pronounce:
> Le jeudi 10 mai 2007 
Comment 20 Nicolas Mailhot 2007-05-11 04:51:36 UTC
Le vendredi 11 mai 2007 à 10:08 +0100, Mel Gorman a écrit :

> > seems to have cured the system so far (need to charge it a bit longer to
> > be sure)
> > 
> 
> The longer it runs the better, particularly under load and after
> updatedb has run. Thanks a lot for testing

After a few hours of load testing still nothing in the logs, so the
revert was probably the right thing to do

Comment 21 Anonymous Emailer 2007-05-11 10:38:19 UTC
Reply-To: mel@skynet.ie

On (11/05/07 13:51), Nicolas Mailhot didst pronounce:
> Le vendredi 11 mai 2007 
Comment 22 Nicolas Mailhot 2007-05-11 10:45:53 UTC
Le vendredi 11 mai 2007 à 18:38 +0100, Mel Gorman a écrit :
> On (11/05/07 13:51), Nicolas Mailhot didst pronounce:
> > Le vendredi 11 mai 2007 à 10:08 +0100, Mel Gorman a écrit :
> > 
> > > > seems to have cured the system so far (need to charge it a bit longer to
> > > > be sure)
> > > > 
> > > 
> > > The longer it runs the better, particularly under load and after
> > > updatedb has run. Thanks a lot for testing
> > 
> > After a few hours of load testing still nothing in the logs, so the
> > revert was probably the right thing to do
> 
> Excellent. I am somewhat suprised by the result 

And you're probably right, it just banged after a day working fine

19:20:00  tar: page allocation failure. order:2, mode:0x84020
19:20:00  
19:20:00  Call Trace:
19:20:00  [<ffffffff8025b5c3>] __alloc_pages+0x2aa/0x2c3
19:20:00  [<ffffffff802751f5>] __slab_alloc+0x196/0x586
19:20:00  [<ffffffff80300d79>] radix_tree_node_alloc+0x36/0x7e
19:20:00  [<ffffffff8027597a>] kmem_cache_alloc+0x32/0x4e
19:20:00  [<ffffffff80300d79>] radix_tree_node_alloc+0x36/0x7e
19:20:00  [<ffffffff8030118e>] radix_tree_insert+0x5d/0x18c
19:20:00  [<ffffffff80256ac4>] add_to_page_cache+0x3d/0x95
19:20:00  [<ffffffff80257aa4>] generic_file_buffered_write+0x222/0x7c8
19:20:00  [<ffffffff88013c74>] :jbd:do_get_write_access+0x506/0x53d
19:20:00  [<ffffffff8022c7d5>] current_fs_time+0x3b/0x40
19:20:00  [<ffffffff8025838c>] __generic_file_aio_write_nolock+0x342/0x3ac
19:20:00  [<ffffffff80416ac1>] __mutex_lock_slowpath+0x216/0x221
19:20:00  [<ffffffff80258457>] generic_file_aio_write+0x61/0xc1
19:20:00  [<ffffffff880271be>] :ext3:ext3_file_write+0x16/0x94
19:20:00  [<ffffffff8027938c>] do_sync_write+0xc9/0x10c
19:20:00  [<ffffffff80239c56>] autoremove_wake_function+0x0/0x2e
19:20:00  [<ffffffff80279ba7>] vfs_write+0xce/0x177
19:20:00  [<ffffffff8027a16a>] sys_write+0x45/0x6e
19:20:00  [<ffffffff8020955c>] tracesys+0xdc/0xe1
19:20:00  
19:20:00  Mem-info:
19:20:00  DMA per-cpu:
19:20:00  CPU    0: Hot: hi:    0, btch:   1 usd:   0   Cold: hi:    0, btch:   1 usd:   0
19:20:00  CPU    1: Hot: hi:    0, btch:   1 usd:   0   Cold: hi:    0, btch:   1 usd:   0
19:20:00  DMA32 per-cpu:
19:20:00  CPU    0: Hot: hi:  186, btch:  31 usd: 149   Cold: hi:   62, btch:  15 usd:  19
19:20:00  CPU    1: Hot: hi:  186, btch:  31 usd: 147   Cold: hi:   62, btch:  15 usd:   2
19:20:00  Active:348968 inactive:105561 dirty:23054 writeback:0 unstable:0
19:20:00  free:9776 slab:28092 mapped:23015 pagetables:10226 bounce:0
19:20:00  DMA free:7960kB min:20kB low:24kB high:28kB active:0kB inactive:0kB present:7648kB pages_scanned:0 all_unreclaimable? yes
19:20:00  lowmem_reserve[]: 0 1988 1988 1988
19:20:00  DMA32 free:31144kB min:5692kB low:7112kB high:8536kB active:1395872kB inactive:422244kB present:2036004kB pages_scanned:0 all_unreclaimable? no
19:20:00  lowmem_reserve[]: 0 0 0 0
19:20:00  DMA: 6*4kB 6*8kB 7*16kB 3*32kB 8*64kB 8*128kB 6*256kB 1*512kB 0*1024kB 0*2048kB 1*4096kB = 7960kB
19:20:00  DMA32: 7560*4kB 0*8kB 8*16kB 0*32kB 1*64kB 1*128kB 0*256kB 1*512kB 0*1024kB 0*2048kB 0*4096kB = 31072kB
19:20:00  Swap cache: add 1527, delete 1521, find 216/286, race 397+0
19:20:00  Free swap  = 4192824kB
19:20:00  Total swap = 4192944kB
19:20:00  Free swap:       4192824kB
19:20:00  524272 pages of RAM
19:20:00  14123 reserved pages
19:20:00  252562 pages shared
19:20:00  6 pages swap cached

> so I'd like to look at the
> alternative option with kswapd as well. Could you put that patch back in again
> please and try the following patch instead? 

I'll try this one now (if it applies)

Regards,

Comment 23 Christoph Lameter 2007-05-11 10:46:31 UTC
On Fri, 11 May 2007, Mel Gorman wrote:

> Excellent. I am somewhat suprised by the result so I'd like to look at the
> alternative option with kswapd as well. Could you put that patch back in again
> please and try the following patch instead? The patch causes kswapd to reclaim
> at higher orders if it's requested to.  Christoph, can you look at the patch
> as well and make sure it's doing the right thing with respect to SLUB please?

Well this gives the impression that SLUB depends on larger orders. It 
*can* take advantage of higher order allocations. No must. It may be a 
performance benefit to be able to do higher order allocs though (it is not 
really established yet what kind of tradeoffs there are).

Looks fine to me. If this is stable then I want this to be merged ASAP 
(deal with the issues later???) .... Good stuff.

Comment 24 Nicolas Mailhot 2007-05-11 11:30:23 UTC
Le vendredi 11 mai 2007 à 19:45 +0200, Nicolas Mailhot a écrit :
> Le vendredi 11 mai 2007 à 18:38 +0100, Mel Gorman a écrit :

> > so I'd like to look at the
> > alternative option with kswapd as well. Could you put that patch back in again
> > please and try the following patch instead? 
> 
> I'll try this one now (if it applies)

Well it doesn't seem to apply. Are you sure you have a clean tree?
(I have vanilla mm2 + revert of
md-improve-partition-detection-in-md-array.patch for another bug)

+ umask 022
+ cd /builddir/build/BUILD
+ LANG=C
+ export LANG
+ unset DISPLAY
+ cd /builddir/build/BUILD
+ rm -rf linux-2.6.21
+ /usr/bin/bzip2 -dc /builddir/build/SOURCES/linux-2.6.21.tar.bz2
+ tar -xf -
+ STATUS=0
+ '[' 0 -ne 0 ']'
+ cd linux-2.6.21
++ /usr/bin/id -u
+ '[' 499 = 0 ']'
++ /usr/bin/id -u
+ '[' 499 = 0 ']'
+ /bin/chmod -Rf a+rX,u+w,g-w,o-w .
+ echo 'Patch #2 (2.6.21-mm2.bz2):'
Patch #2 (2.6.21-mm2.bz2):
+ /usr/bin/bzip2 -d
+ patch -p1 -s
+ STATUS=0
+ '[' 0 -ne 0 ']'
+ echo 'Patch #3 (md-improve-partition-detection-in-md-array.patch):'
Patch #3 (md-improve-partition-detection-in-md-array.patch):
+ patch -p1 -R -s
+ echo 'Patch #4 (bug-8464.patch):'
Patch #4 (bug-8464.patch):
+ patch -p1 -s
1 out of 1 hunk FAILED -- saving rejects to file mm/slub.c.rej
2 out of 3 hunks FAILED -- saving rejects to file mm/vmscan.c.r
Comment 25 Anonymous Emailer 2007-05-11 13:36:18 UTC
Reply-To: mel@skynet.ie

On (11/05/07 20:30), Nicolas Mailhot didst pronounce:
> Le vendredi 11 mai 2007 
Comment 26 Nicolas Mailhot 2007-05-12 01:11:49 UTC
Le vendredi 11 mai 2007 à 21:36 +0100, Mel Gorman a écrit :

> I'm pretty sure I have. I recreated the tree and reverted the same patch as
> you and regenerated the diff below. I sent it to myself and it appeared ok
> and another automated system was able to use it.
> 
> In case it's a mailer problem, the patch can be downloaded from
> http://www.csn.ul.ie/~mel/kswapd-minorder.patch . 

This one applies, but the kernel still has allocation failures (I just
found rpm -Va was a good trigger). So so far we have two proposed fixes
none of which work

Comment 27 Anonymous Emailer 2007-05-12 09:42:45 UTC
Reply-To: mel@skynet.ie

On (12/05/07 10:11), Nicolas Mailhot didst pronounce:
> Le vendredi 11 mai 2007 
Comment 28 Nicolas Mailhot 2007-05-12 11:09:18 UTC
Le samedi 12 mai 2007 à 17:42 +0100, Mel Gorman a écrit :

> order-2 (at least 19 pages but more are there) and higher pages were free
> and this was a NORMAL allocation. It should also be above watermarks so
> something screwy is happening
> 
> *peers suspiciously*
> 
> Can you try the following patch on top of the kswapd patch please? It is
> also available from http://www.csn.ul.ie/~mel/watermarks.patch

Ok, testing now

Comment 29 Nicolas Mailhot 2007-05-12 11:58:52 UTC
Le samedi 12 mai 2007 à 20:09 +0200, Nicolas Mailhot a écrit :
> Le samedi 12 mai 2007 à 17:42 +0100, Mel Gorman a écrit :
> 
> > order-2 (at least 19 pages but more are there) and higher pages were free
> > and this was a NORMAL allocation. It should also be above watermarks so
> > something screwy is happening
> > 
> > *peers suspiciously*
> > 
> > Can you try the following patch on top of the kswapd patch please? It is
> > also available from http://www.csn.ul.ie/~mel/watermarks.patch
> 
> Ok, testing now

And this one failed testing too 

Comment 30 Anonymous Emailer 2007-05-12 12:24:15 UTC
Reply-To: mel@skynet.ie

On (12/05/07 20:58), Nicolas Mailhot didst pronounce:
> Le samedi 12 mai 2007 
Comment 31 Nicolas Mailhot 2007-05-13 01:18:06 UTC
Le samedi 12 mai 2007 à 20:24 +0100, Mel Gorman a écrit :
> On (12/05/07 20:58), Nicolas Mailhot didst pronounce:
> > Le samedi 12 mai 2007 à 20:09 +0200, Nicolas Mailhot a écrit :
> > > Le samedi 12 mai 2007 à 17:42 +0100, Mel Gorman a écrit :
> > > 
> > > > order-2 (at least 19 pages but more are there) and higher pages were free
> > > > and this was a NORMAL allocation. It should also be above watermarks so
> > > > something screwy is happening
> > > > 
> > > > *peers suspiciously*
> > > > 
> > > > Can you try the following patch on top of the kswapd patch please? It is
> > > > also available from http://www.csn.ul.ie/~mel/watermarks.patch

> > And this one failed testing too 
> 
> And same thing, you have suitable free memory. The last patch was
> wrong because I forgot the !in_interrupt() part which was careless
> and dumb.  Please try the following, again on top of the kswapd patch -
> http://www.csn.ul.ie/~mel/watermarks-v2.patch

This one survived 12h of testing so far.

Regards,