Most recent kernel where this bug did *NOT* occur:2.6.20.11 Distribution:kernel.org 2.6.21.1 Hardware Environment:Dell 6300 SMP 4xpentium3 4G memory adaptec SCSI controller Software Environment:compiled gcc-4.1.3 (gcc-4.1 branch) Problem Description:Recognizes Adaptec controller and seems to try to go to init on successive processors. Code:Bad Eip value \n EIP[<00000000>} _stext+0x3fefff/0x20 SS:ESP 0068:c295fe58 \n Kernel panic - not syncing : Attempted to kill init! \n BUG: at arch/i386kernel/smpc 546 smp_call_function() [<c010e00b>] smp_call_function + 0x12b/0x130 \n ..... (above hand copied from screen, apparently one message like this for each CPU) Steps to reproduce: Recompiled both 2.6.20.11 and 2.6.21.1 and fault is considtent. Same kernel (2.6.21.1) boots fine on a SMP G4 {MAC) machine Will try various release candidate versions. Will need help to characterize further
Well! Boot Bug occurs ever since kernel-2.6.21-rc1. However, kernel-2.6.21.1 boots fine on a single processor Pentium3 machine. Thus, the bug is restricted to SMP i386 type machines ever since release candidate 1.
Thanks. A digital photo of the screen might help us to get a look at that oops. Or set up netconsole - it's pretty easy: Documentation/networking/netconsole.txt
Thanks for the direction. Will set-up and debug the netconsole on another machine, as boot on that server is quite slow and the SCSI drives do not like to stop and spin-up repeatedly. There is no reset on that machine. I do not have a digital camera and I do not believe what appears on the screen will be helpful as it is one set of consequential calls like "panic, do_exit, die, do_page_fault, do_page_fault, error code+, acpi_nmi_disable" etc. At the top of the screen is what appears to be the tail end of one other message sequence that appears whole. Therefore I surmise that there are three or four equivalent message sequences referring to either all four CPU's or just the additional three not used during the initial boot. After correctly identifying the three Adaptec controllers things scroll too fast to capture either by eye or camera until things lock up with the last message sequence.
Created attachment 11488 [details] Netconsole dumps of three boot attempts (1 panicked, 2 working These three boot attempts show that 2.6.21.1 fails on i386 SMP but works on single processor; 2.6.20.11 boots fine on SMP. the corresponding SMP .configs are equivalent but for menuconfig introduced differences. 'make V=1 2>&1 |tee .Build' are available if needed. same as .configs
Have some suggestions about netconsole Documentation if if requested with party to send to.
Post netconsole Doc. comments here or send them to me or to the netconsole owner: Matt Mackall <mpm@selenic.com>
Created attachment 11493 [details] Another panicked SMP boot using kernel-2.6.22-rc1 I am afraid that this is more bad news. I had not noticed before that in going from 2.6.20.11 to 2.6.21 both aacraid and aic7xxx had undergone significant changes. I was too fixed on SMP. As 2.6.22 has more changes in aacraid I am submitting another failed boot netconsole dump using 2.6.22-rc1. I will try 2.6.22-rc1 on the MAC SMP G4 on which I installed also a SCSI drive with an Adaptec APD-29160N Ultra160 controller. The good news is that netconsole is a fantastic tool that should much more prominence instead of being pratically hidden. Will review the 2.6.22 documentation and configuration before submitting my comments.
This coincides with the introduction of the adapter_comm and adapter_deliver platform functions. I need to know which aacraid based adapters are installed in the system. The panic appears to occur with an uninitialized adapter_deliver platform function pointer. I can see an oversight in the sa style adapters, but it would affect all kernel configurations, not just SMP. This may be the case because it appears the UP boot did NOT load the aacraid driver (!). These adapters are the Adaptec 5400S and HP NetRAID, last produced these cards in 2000. Inspection has not turned up any holes as this is part of a single threaded initialization of the Adapters. I am aware that the aac_command_thread has started up, but it is inert. If there is an Adaptec 5400S or HP NetRAID, please pull them to confirm that these are the cause of the panic.
Thanks for the prompt action! The machine in question; a Dell 6300/550; is now working with the patch applied to 2.6.22-rc1. It worked on the second try because a change in configuration was required. the details and answers to Mark's question follow: The controller is neither a HP NetRaid nor An Adaptec 5400S. It is an OEM Adaptec ASSY 1790106-01 with an Adaptec ASSY 1790206-01. It sports two Adaptec AIC-7897. In an earlier query Adaptec claimed no residual responsibility and referred me to Dell, who claimed it being obsolete. I am using that Dell 6300 not as a server but as a fantastic development machine with its four processor and three Gigabytes of memory. I am not even using it as a RAID machine but as a plain SCSI machine. When I first tried to bring it up with Linux I had a rather steep learning curve; and ended up with the old aictxxx_old driver but had to also activate the aacraid driver. However I never selected the RAID setup option. However, with the new drivers I had to also select the RAID option otherwise it would no find the root on /dev/sda3. As I am not familiar with the kernel/osdl/bugzilla arrangement. I only realized the existence of the patch when checking my mail as there was no mention in the bugzilla problem report. It seems that I am very spoiled by the excellent quality of the kernel releases. I only hacked the kernel in 1993 to get it to read the "old" SCO Xenix formatted hard drives. Luckkily I refrained from publishing my work, given the legal encumbrances imposed even then by SCO. Had I published it It could have been fodder for the "bad-new" SCO and their legal manoeverings. I am quite willing to act as a tester using the Dell and a MAC with dual G4. Just to introduce my-self a little here follows: I am 72 and retired but still active trying to preserve abouts 30 G of work-station packages in peril of ending in bit-buckets. Istarted programming with unit-record machine plug-boards and progressed to real-time assembly language programmer on central office telephone switches. Then went on to system designer and internation telecommincations consultant ending up in the satellite industry (COMSAT INTELSAT) I have had more exposure with bugzilla as operated by GCC-GNU.ORG, where I filed about ten problem reports. Oh yes I was also an airline pilot and want to take flightgear to become an instrument flying trainer to prevent unnecessary deaths like the one that befell the young Kennedy and his wife. Testing kernels and compilers just fits in with these activities. PS I could add Mark as mark_salyzyn to the CC. Pleas forward the info to him.