Most recent kernel where this bug did not occur: 2.6.13 Distribution: Ubuntu 6.04 Hardware Environment: IBM IntelliStation Z20, 2x 3.4GHz Xeons, 2G RAM, 2x 80GB SATA disks Software Environment: Problem Description: cpuset_excl_nodes_overlap() may sleep (as it takes semaphore), but is called from atomic context - select_bad_process() under tasklist_lock. BUG. Found by Denis Lunev. Steps to reproduce: Run pounder2 stress tests overnight until OOM occurs, and then watch the debug messages pour off the screen: Debug: sleeping function called from invalid context at include/asm/semaphore.h:105 in_atomic():1, irqs_disabled():0 Call Trace:<ffffffff80130640>{__might_sleep+179} <ffffffff8015bb68>{cpuset_excl_nodes_overlap+31} <ffffffff80164d9f>{out_of_memory+123} <ffffffff801676c4>{__alloc_pages+564} <ffffffff8017ec41>{alloc_pages_current+160} <ffffffff8016873f>{__do_page_cache_readahead+197} <ffffffff8010e5c3>{__switch_to+50} <ffffffff8031c547>{_spin_unlock_irqrestore+52} <ffffffff80166604>{free_pages_bulk+674} <ffffffff80168a00>{do_page_cache_readahead+82} <ffffffff8016355a>{filemap_nopage+332} <ffffffff8017378f>{__handle_mm_fault+1162} <ffffffff8031c547>{_spin_unlock_irqrestore+52} <ffffffff8031df02>{do_page_fault+1096} <ffffffff801991ee>{sys_select+839} <ffffffff8011078d>{error_exit+0} Eventually it becomes a never-ending stream of this: scheduling while atomic: dd/0x00000001/510 Call Trace:<ffffffff8031a847>{schedule+122} <ffffffff8031c31f>{__down+229} <ffffffff80132ef7>{default_wake_function+0} <ffffffff8031bf88>{__down_failed+53} <ffffffff8015cfbe>{.text.lock.cpuset+85} <ffffffff80164d9f>{out_of_memory+123} <ffffffff801676c4>{__alloc_pages+564} <ffffffff8017ed26>{alloc_page_vma+221} <ffffffff8017ae57>{read_swap_cache_async+74} <ffffffff801717b4>{swapin_readahead+97} <ffffffff8031c48c>{_read_unlock_irq+46} <ffffffff80173a9b>{__handle_mm_fault+1942} <ffffffff8031c547>{_spin_unlock_irqrestore+52} <ffffffff8031df02>{do_page_fault+1096} <ffffffff802050ab>{_raw_spin_trylock+8} <ffffffff801acb0c>{inotify_dentry_parent_queue_event+136} <ffffffff8011078d>{error_exit+0}
This has already been reported to LKML: http://marc.theaimsgroup.com/?l=linux-kernel&m=113577406107958&w=2
Paul, can you look into at this issue?
The report makes sense. I am back from vacation now, and should be able to provide a fix later this week. I'll need to rework the semantics a little, and pull the evaluation of the enclosing cpuset outside the oom tasklist loop. A workaround, that could result in killing a task in a non-overlapping cpuset, would be to stub out the cpuset_excl_nodes_overlap() call. Thanks for reporting this.
Paul, what is the status of this issue?
The fix for this went into Linus's tree in the following change: date: Sun Jan 15 10:27:10 2006 +0800 summary: [PATCH] cpuset oom lock fix Grep for 'cpuset_lock' in the kernel file mm/oom_kill.c to see the fix. Thanks for the reminder to update the status of this bug - I should have closed this bug in January.