Bug 7249 - "write error: Invalid argument" when onlining already onlined CPU
Summary: "write error: Invalid argument" when onlining already onlined CPU
Status: REJECTED WILL_NOT_FIX
Alias: None
Product: Platform Specific/Hardware
Classification: Unclassified
Component: x86-64 (show other bugs)
Hardware: i386 Linux
: P2 normal
Assignee: Andi Kleen
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2006-10-02 15:08 UTC by Bryce Harrington
Modified: 2008-09-05 04:09 UTC (History)
2 users (show)

See Also:
Kernel Version: 2.6.18
Subsystem:
Regression: ---
Bisected commit-id:


Attachments

Description Bryce Harrington 2006-10-02 15:08:11 UTC
Most recent kernel where this bug did not occur:  Has been present in the 2.6.17
and 2.6.18 releases, -rc, -mm, and -git trees
Distribution:  Gentoo
Hardware Environment:  x86_64, 2x AMD Opteron
Software Environment:  See http://crucible.osdl.org/runs/2328/sysinfo/amd01.1/

Problem Description:

In testing hotplug cpu recently, I notice a discrepancy in how it's
handled on x86_64 vs. other architectures.  Normally, if you attempt to
online an already onlined cpu, it returns an exit code of 1 but no error
message.  However, on x86_64 it produces this error message:

x86_64:
# echo 1 > /sys/devices/system/cpu/cpu1/online
-bash: echo: write error: Invalid argument
# echo 0 > /sys/devices/system/cpu/cpu1/online
# echo 0 > /sys/devices/system/cpu/cpu1/online
-bash: echo: write error: Device or resource busy
# echo 1 > /sys/devices/system/cpu/cpu1/online
# echo 1 > /sys/devices/system/cpu/cpu1/online
-bash: echo: write error: Invalid argument

I think it should not be printing an error message in this case.
Here is sample output on a couple other architectures:

ia64, ppc64:
# echo 1 > /sys/devices/system/cpu/cpu1/online
# echo 1 > /sys/devices/system/cpu/cpu1/online
# echo $?
1
# echo 0 > /sys/devices/system/cpu/cpu1/online
# echo $?
0
# echo 1 > /sys/devices/system/cpu/cpu1/online
# echo $?
0

Test output that identified this issue:
    http://crucible.osdl.org/runs/2271/test_output/lhcs_regression.log
Comment 1 Andi Kleen 2006-10-02 15:22:00 UTC
No error would mean returning 0. 

But the problem is that the high level code in cpu.c for this actually
triggers an "uevent" when 0 is returned and on those architectures
that don't return errors these events will be triggered multiple times.
That looks like a bug to me.
 
Comment 2 Markus Rechberger 2007-02-08 11:08:58 UTC
I just investigated that issue and I don't see any bug here.

cpu1 # echo 1 > online
bash: echo: write error: Invalid argument
cpu1 # echo 1 > online
bash: echo: write error: Invalid argument
cpu1 # echo 0 > online
cpu1 # echo 0 > online
bash: echo: write error: Device or resource busy


Here a small trace (lines with 2 colons device are devided in function, line, 
errno)

argument: 31 0a 
returning: -22 (OS error code  22:  Invalid argument)

        if (cpu_online(cpu) || !cpu_present(cpu)) (cpu.c line 224)
                return -EINVAL;

args: 2
argument: 31 0a 
returning: -22
args: 2
argument: 31 0a 
returning: -22
args: 2
argument: 30 0a 
_cpu_down: 158: 1
CPU 1 is now offline
SMP alternatives: switching to UP code
_cpu_down: 194: 0
cpu_down: 210 0
returning: 0
args: 2
argument: 30 0a 
_cpu_down: 130: -EBUSY
        if (num_online_cpus() == 1){
                printk("%s: %d: -EBUSY\n",__FUNCTION__,__LINE__);
                return -EBUSY;
        }

cpu_down: 210 -16 (OS error code  16:  Device or resource busy)
returning: -16


so if someone wouldn't get these messages on PPC either the used kernelversion 
returns different returnvalues, or the glibc differs. 
Comment 3 Andi Kleen 2007-05-07 09:45:37 UTC
Setting a sysfs value to the same value shouldn't produce EINVAL.
But does it actually break something? 
I don't know why it behaves differently from other architecures.
Is it also different from i386? 

I guess the easy fix if it really was a problem would be just
to add a few lines in _cpu_up and return 0 when the state is the same.
Comment 4 Natalie Protasevich 2008-03-30 14:30:21 UTC
Any update on this problem, should the bug be just closed, since there is no actual bug here. If this point needs more consideration maybe it should be rather posted to lkml so other archs can discuss this?
Comment 5 Andi Kleen 2008-03-30 23:46:20 UTC
I think it should be closed and the regression test be fixed

Note You need to log in before you can comment on or make changes to this bug.