Bug 11562 (unplug) - unpluggin a processor results in invalid context error
Summary: unpluggin a processor results in invalid context error
Status: CLOSED PATCH_ALREADY_AVAILABLE
Alias: unplug
Product: Platform Specific/Hardware
Classification: Unclassified
Component: x86-64 (show other bugs)
Hardware: All Linux
: P1 normal
Assignee: Mathieu Desnoyers
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2008-09-13 03:56 UTC by raz ben yehuda
Modified: 2008-09-16 05:37 UTC (History)
2 users (show)

See Also:
Kernel Version: 2.6.26.5
Subsystem:
Regression: Yes
Bisected commit-id:


Attachments

Description raz ben yehuda 2008-09-13 03:56:18 UTC
Latest working kernel version: unknown
Earliest failing kernel version: 2.6.26.5
Distribution:centos 5.0 64bit
Hardware Environment: Lenovo T61 Dual core.
Software Environment:
Problem Description:
1. configure kernel as describe in Documentation/submitCheckList section 12.
2. boot this kernel
3. run . echo 0 >  /sys/devices/system/node/node0/cpu1/online

result:

SMP alternatives: switching to UP code
BUG: sleeping function called from invalid context at mm/slab.c:3052
in_atomic():1, irqs_disabled():0
INFO: lockdep is turned off.
Pid: 2463, comm: bash Not tainted 2.6.26.5 #28

Call Trace:
 [<ffffffff80251c61>] ? __debug_show_held_locks+0x1b/0x24
 [<ffffffff8022aae6>] __might_sleep+0x108/0x10a
 [<ffffffff802962fd>] kmem_cache_alloc_node+0x39/0x1a3
 [<ffffffff80289432>] ? __get_vm_area_node+0xa4/0x1d7
 [<ffffffff80289432>] __get_vm_area_node+0xa4/0x1d7
 [<ffffffff8020a3d7>] ? disable_TSC+0x17/0x53
 [<ffffffff802895c5>] get_vm_area_caller+0x2f/0x31
 [<ffffffff804b2653>] ? text_poke+0x11d/0x19a
 [<ffffffff80289d40>] vmap+0x31/0x63
 [<ffffffff804b7b62>] ? _etext+0x0/0xe
 [<ffffffff804b2653>] text_poke+0x11d/0x19a
 [<ffffffff804b7b62>] ? _etext+0x0/0xe
 [<ffffffff802118d4>] alternatives_smp_unlock+0x4f/0x63
 [<ffffffff80211b77>] alternatives_smp_switch+0x161/0x19e
 [<ffffffff8021b80c>] __cpu_die+0x5c/0x86
 [<ffffffff8049d047>] _cpu_down+0x1b5/0x28d
 [<ffffffff8049d145>] cpu_down+0x26/0x36
 [<ffffffff8049e306>] store_online+0x32/0x75
 [<ffffffff8037402e>] sysdev_store+0x24/0x26
 [<ffffffff802e3134>] sysfs_write_file+0xe5/0x121
 [<ffffffff8029dddc>] vfs_write+0xae/0x124
 [<ffffffff8029e320>] sys_write+0x47/0x70
 [<ffffffff8020bffb>] system_call_after_swapgs+0x7b/0x80

Steps to reproduce:
Comment 1 Andrew Morton 2008-09-13 11:14:58 UTC
I'll reassign this regression to x86 - looks like something borked
in the smp->up text rewriting.
Comment 2 raz ben yehuda 2008-09-13 11:41:56 UTC
already patched and posted to LKML. simply replaced the spinlock with
a semaphore.

On Sat, Sep 13, 2008 at 8:14 PM,  <bugme-daemon@bugzilla.kernel.org> wrote:
> http://bugzilla.kernel.org/show_bug.cgi?id=11562
>
>
> akpm@osdl.org changed:
>
>           What    |Removed                     |Added
> ----------------------------------------------------------------------------
>         AssignedTo|acpi_other@kernel-          |platform_x86_64@kernel-
>                   |bugs.osdl.org               |bugs.osdl.org
>          Component|Other                       |x86-64
>            Product|ACPI                        |Platform Specific/Hardware
>         Regression|0                           |1
>
>
>
>
> ------- Comment #1 from akpm@osdl.org  2008-09-13 11:14 -------
> I'll reassign this regression to x86 - looks like something borked
> in the smp->up text rewriting.
>
>
> --
> Configure bugmail: http://bugzilla.kernel.org/userprefs.cgi?tab=email
> ------- You are receiving this mail because: -------
> You are on the CC list for the bug, or are watching someone who is.
> You reported the bug, or are watching the reporter.
>
Comment 3 Mathieu Desnoyers 2008-09-13 13:32:44 UTC
We vmap two RW pages in text_poke to make sure we can write to the
kernel text, even if it is read-only. Two possible solutions :

- either we use a 2 pages fixmap, so we don't have to use vmap.
- We find out why the cpu hotplug code disables preemption or interrupts
  and fix that instead.

I'll be traveling next week (kernel summit and plumber conf.), but I'll try to have a look soon.

Mathieu
Comment 4 raz ben yehuda 2008-09-13 13:36:46 UTC
On Sat, Sep 13, 2008 at 10:32 PM,  <bugme-daemon@bugzilla.kernel.org> wrote:
> http://bugzilla.kernel.org/show_bug.cgi?id=11562
>
>
> mathieu.desnoyers@polymtl.ca changed:
>
>           What    |Removed                     |Added
> ----------------------------------------------------------------------------
>                 CC|                            |mathieu.desnoyers@polymtl.ca
>
>
>
>
> ------- Comment #3 from mathieu.desnoyers@polymtl.ca  2008-09-13 13:32
> -------
> We vmap two RW pages in text_poke to make sure we can write to the
> kernel text, even if it is read-only. Two possible solutions :
>
> - either we use a 2 pages fixmap, so we don't have to use vmap.
> - We find out why the cpu hotplug code disables preemption or interrupts
>  and fix that instead.
I simply replaced the spinlocks to semaphores.
> I'll be traveling next week (kernel summit and plumber conf.), but I'll try
> to
> have a look soon.
>
> Mathieu
>
>
> --
> Configure bugmail: http://bugzilla.kernel.org/userprefs.cgi?tab=email
> ------- You are receiving this mail because: -------
> You are on the CC list for the bug, or are watching someone who is.
> You reported the bug, or are watching the reporter.
>
Comment 5 Mathieu Desnoyers 2008-09-13 13:43:46 UTC
Hrm, the comment from raz ben yehuda is right, 2.6.26 should include the fix already posted to LKML.

The commit id of this fix in mainline is :

2f1dafe50cc4e58a239fd81bd47f87f32042a1ee

I think some trivial patch modification will be needed to apply it to 2.6.26. Probably the original patch I posted on LKML would apply better :

http://lkml.org/lkml/2008/4/19/139

Can you try either of these two and see if it fixes the issue ? If it does, then the fix should be merged to 2.6.26.x.

Thanks,

Mathieu

Note You need to log in before you can comment on or make changes to this bug.