Bug 5859 - cpusets: BUG: cpuset_excl_nodes_overlap() may sleep under tasklist_lock
Summary: cpusets: BUG: cpuset_excl_nodes_overlap() may sleep under tasklist_lock
Status: CLOSED CODE_FIX
Alias: None
Product: Other
Classification: Unclassified
Component: Other (show other bugs)
Hardware: i386 Linux
: P2 normal
Assignee: Paul Jackson
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2006-01-09 14:51 UTC by Darrick J. Wong
Modified: 2006-04-22 12:34 UTC (History)
2 users (show)

See Also:
Kernel Version: 2.6.15
Subsystem:
Regression: ---
Bisected commit-id:


Attachments

Description Darrick J. Wong 2006-01-09 14:51:26 UTC
Most recent kernel where this bug did not occur: 2.6.13
Distribution: Ubuntu 6.04
Hardware Environment: IBM IntelliStation Z20, 2x 3.4GHz Xeons, 2G RAM, 2x 80GB
SATA disks
Software Environment: 
Problem Description: cpuset_excl_nodes_overlap() may sleep (as it takes
semaphore), but is called from atomic context - select_bad_process() under
tasklist_lock. BUG. Found by Denis Lunev.

Steps to reproduce: Run pounder2 stress tests overnight until OOM occurs, and
then watch the debug messages pour off the screen:

Debug: sleeping function called from invalid context at include/asm/semaphore.h:105
in_atomic():1, irqs_disabled():0

Call Trace:<ffffffff80130640>{__might_sleep+179}
<ffffffff8015bb68>{cpuset_excl_nodes_overlap+31}
       <ffffffff80164d9f>{out_of_memory+123} <ffffffff801676c4>{__alloc_pages+564}
       <ffffffff8017ec41>{alloc_pages_current+160}
<ffffffff8016873f>{__do_page_cache_readahead+197}
       <ffffffff8010e5c3>{__switch_to+50}
<ffffffff8031c547>{_spin_unlock_irqrestore+52}
       <ffffffff80166604>{free_pages_bulk+674}
<ffffffff80168a00>{do_page_cache_readahead+82}
       <ffffffff8016355a>{filemap_nopage+332}
<ffffffff8017378f>{__handle_mm_fault+1162}
       <ffffffff8031c547>{_spin_unlock_irqrestore+52}
<ffffffff8031df02>{do_page_fault+1096}
       <ffffffff801991ee>{sys_select+839} <ffffffff8011078d>{error_exit+0}

Eventually it becomes a never-ending stream of this:

scheduling while atomic: dd/0x00000001/510

Call Trace:<ffffffff8031a847>{schedule+122} <ffffffff8031c31f>{__down+229}
       <ffffffff80132ef7>{default_wake_function+0}
<ffffffff8031bf88>{__down_failed+53}
       <ffffffff8015cfbe>{.text.lock.cpuset+85}
<ffffffff80164d9f>{out_of_memory+123}
       <ffffffff801676c4>{__alloc_pages+564} <ffffffff8017ed26>{alloc_page_vma+221}
       <ffffffff8017ae57>{read_swap_cache_async+74}
<ffffffff801717b4>{swapin_readahead+97}
       <ffffffff8031c48c>{_read_unlock_irq+46}
<ffffffff80173a9b>{__handle_mm_fault+1942}
       <ffffffff8031c547>{_spin_unlock_irqrestore+52}
<ffffffff8031df02>{do_page_fault+1096}
       <ffffffff802050ab>{_raw_spin_trylock+8}
<ffffffff801acb0c>{inotify_dentry_parent_queue_event+136}
       <ffffffff8011078d>{error_exit+0}
Comment 1 Darrick J. Wong 2006-01-09 14:52:22 UTC
This has already been reported to LKML:

http://marc.theaimsgroup.com/?l=linux-kernel&m=113577406107958&w=2
Comment 2 Adrian Bunk 2006-01-09 15:45:08 UTC
Paul, can you look into at this issue?
Comment 3 Paul Jackson 2006-01-09 16:35:25 UTC
The report makes sense.  I am back from vacation now, and should be able
to provide a fix later this week.  I'll need to rework the semantics
a little, and pull the evaluation of the enclosing cpuset outside the
oom tasklist loop.

A workaround, that could result in killing a task in a non-overlapping
cpuset, would be to stub out the cpuset_excl_nodes_overlap() call.

Thanks for reporting this.
Comment 4 Adrian Bunk 2006-04-22 10:01:23 UTC
Paul, what is the status of this issue?
Comment 5 Paul Jackson 2006-04-22 12:34:37 UTC
The fix for this went into Linus's tree in the following change:
  date:        Sun Jan 15 10:27:10 2006 +0800
  summary:     [PATCH] cpuset oom lock fix

Grep for 'cpuset_lock' in the kernel file mm/oom_kill.c to see the fix.

Thanks for the reminder to update the status of this bug - I should have
closed this bug in January.

Note You need to log in before you can comment on or make changes to this bug.