Most recent kernel where this bug did not occur: Has been present in the 2.6.17 and 2.6.18 releases, -rc, -mm, and -git trees Distribution: Gentoo Hardware Environment: x86_64, 2x AMD Opteron Software Environment: See http://crucible.osdl.org/runs/2328/sysinfo/amd01.1/ Problem Description: In testing hotplug cpu recently, I notice a discrepancy in how it's handled on x86_64 vs. other architectures. Normally, if you attempt to online an already onlined cpu, it returns an exit code of 1 but no error message. However, on x86_64 it produces this error message: x86_64: # echo 1 > /sys/devices/system/cpu/cpu1/online -bash: echo: write error: Invalid argument # echo 0 > /sys/devices/system/cpu/cpu1/online # echo 0 > /sys/devices/system/cpu/cpu1/online -bash: echo: write error: Device or resource busy # echo 1 > /sys/devices/system/cpu/cpu1/online # echo 1 > /sys/devices/system/cpu/cpu1/online -bash: echo: write error: Invalid argument I think it should not be printing an error message in this case. Here is sample output on a couple other architectures: ia64, ppc64: # echo 1 > /sys/devices/system/cpu/cpu1/online # echo 1 > /sys/devices/system/cpu/cpu1/online # echo $? 1 # echo 0 > /sys/devices/system/cpu/cpu1/online # echo $? 0 # echo 1 > /sys/devices/system/cpu/cpu1/online # echo $? 0 Test output that identified this issue: http://crucible.osdl.org/runs/2271/test_output/lhcs_regression.log
No error would mean returning 0. But the problem is that the high level code in cpu.c for this actually triggers an "uevent" when 0 is returned and on those architectures that don't return errors these events will be triggered multiple times. That looks like a bug to me.
I just investigated that issue and I don't see any bug here. cpu1 # echo 1 > online bash: echo: write error: Invalid argument cpu1 # echo 1 > online bash: echo: write error: Invalid argument cpu1 # echo 0 > online cpu1 # echo 0 > online bash: echo: write error: Device or resource busy Here a small trace (lines with 2 colons device are devided in function, line, errno) argument: 31 0a returning: -22 (OS error code 22: Invalid argument) if (cpu_online(cpu) || !cpu_present(cpu)) (cpu.c line 224) return -EINVAL; args: 2 argument: 31 0a returning: -22 args: 2 argument: 31 0a returning: -22 args: 2 argument: 30 0a _cpu_down: 158: 1 CPU 1 is now offline SMP alternatives: switching to UP code _cpu_down: 194: 0 cpu_down: 210 0 returning: 0 args: 2 argument: 30 0a _cpu_down: 130: -EBUSY if (num_online_cpus() == 1){ printk("%s: %d: -EBUSY\n",__FUNCTION__,__LINE__); return -EBUSY; } cpu_down: 210 -16 (OS error code 16: Device or resource busy) returning: -16 so if someone wouldn't get these messages on PPC either the used kernelversion returns different returnvalues, or the glibc differs.
Setting a sysfs value to the same value shouldn't produce EINVAL. But does it actually break something? I don't know why it behaves differently from other architecures. Is it also different from i386? I guess the easy fix if it really was a problem would be just to add a few lines in _cpu_up and return 0 when the state is the same.
Any update on this problem, should the bug be just closed, since there is no actual bug here. If this point needs more consideration maybe it should be rather posted to lkml so other archs can discuss this?
I think it should be closed and the regression test be fixed