After updating ACPI patch version from 21-Nov-2003 to 11-Dec-2003 (acpi-20031203-2.4.22.diff), a problem with booting on an SMP (P4 HT) machine was reported: booting takes about 3 minutes, and kernel complains about not responding processors: Booting processor 2/32 eip 3000 Not responding. Booting processor 2/33 eip 3000 Not responding. Booting processor 2/64 eip 3000 Not responding. Booting processor 2/65 eip 3000 Not responding. Booting processor 2/96 eip 3000 Not responding. Booting processor 2/97 eip 3000 Not responding. Booting processor 2/128 eip 3000 Not responding. Booting processor 2/129 eip 3000 Not responding. Booting processor 2/160 eip 3000 Not responding. Booting processor 2/161 eip 3000 Not responding. Booting processor 2/192 eip 3000 Not responding. Booting processor 2/193 eip 3000 Not responding. Booting processor 2/224 eip 3000 Not responding. Booting processor 2/225 eip 3000 Not responding. Total of 2 processors activated (9620.68 BogoMIPS). (full dmesg output is attached separately). Even acpi=off does not fix this problem (so the common SMP boot code seems to be broken by this patch). With maxcpus=2 the kernel boots as expected. It seems that the problem is caused by this chunk of the patch: diff -Nru a/arch/i386/kernel/smpboot.c b/arch/i386/kernel/smpboot.c --- a/arch/i386/kernel/smpboot.c Fri Dec 12 00:22:07 2003 +++ b/arch/i386/kernel/smpboot.c Fri Dec 12 00:22:07 2003 @@ -1106,7 +1106,7 @@ */ Dprintk("CPU present map: %lx\n", phys_cpu_present_map); - for (bit = 0; bit < NR_CPUS; bit++) { + for (bit = 0; bit < MAX_APICS; bit++) { apicid = cpu_present_to_apicid(bit); /* * Don't even attempt to start the boot CPU! MAX_APICS is 256 (the kernel was compiled with CONFIG_X86_CLUSTERED_APIC), but the hardware does not use clustered APIC mode, therefore cpu_present_to_apicid(bit) == bit, and later in the loop (1ul << bit) value becomes undefined when bit >= 32. Actually the CPU calculates (1UL << (bit & 0x1F)), therefore the code attempts to activate processors 32 and 33, 64 and 65, etc. In 2.4.23 the code was slightly changed; however, if the clustered APIC mode is not used, the same shift with out-of-range number of bits would be performed in apicid_to_phys_cpu_present(), with the same results (not tested actually). The latest ACPI patch (acpi-20031203-2.4.22.diff from 18-Dec-2003) was not tested too, but the problematic part of code was not changed...
Created attachment 1699 [details] dmesg from booting with 2.4.22 and acpi-20031203 (ACPI mode)
Created attachment 1700 [details] dmesg from booting wit 2.4.22 and acpi-20031203 (acpi=off)
Created attachment 1701 [details] dmesg from booting with 2.4.22 and previous version of the ACPI patch With the previous ACPI patch the problem does not happen
Created attachment 1814 [details] a proposal patch This problem seems to be introduced by GCC: #cat foo.c main() { int bit; for (bit=28; bit < 34; bit++){ printf("%.4x \n", (1ul << bit))); } } #gcc foo.c #./a.out 10000000 20000000 40000000 80000000 0001 ( 1<< 34 , It's overflowed, but it should be 0 ) 0002 ( 1<< 35 , It's overflowed, but it should be 0 ) # gcc -v Reading specs from /usr/lib/gcc-lib/i386-redhat-linux/3.2/specs Configured with: ../configure --prefix=/usr --mandir=/usr/share/man --infodir=/usr/share/info --enable-shared --enable-threads=posix --disable-checking --host=i386-redhat-linux --with-system-zlib --enable-__cxa_atexit Thread model: posix gcc version 3.2 20020903 (Red Hat Linux 8.0 3.2-7)
sizeof(unsigned long) in the patch should probably be BITS_PER_LONG. The behavior of gcc-generated code is correct - according to the C standard, the result of a shift operation with an out-of-range number of bits is undefined.
woops, this one got away... now fixed: http://ftp.kernel.org/pub/linux/kernel/people/lenb/acpi/patches/test/2.4.22/20040109024148-smpboot.patch Luming -- please mark this bug closed. thanks, -Len