Bug 9131 - booting linux kernel results in kernel panic
Summary: booting linux kernel results in kernel panic
Status: REJECTED INVALID
Alias: None
Product: Platform Specific/Hardware
Classification: Unclassified
Component: i386 (show other bugs)
Hardware: All Linux
: P1 normal
Assignee: platform_i386
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2007-10-07 13:59 UTC by movgp0
Modified: 2007-12-13 03:58 UTC (History)
2 users (show)

See Also:
Kernel Version: 2.6.22.5
Subsystem:
Regression: ---
Bisected commit-id:


Attachments

Description movgp0 2007-10-07 13:59:25 UTC
Most recent kernel where this bug did not occur: 2.6.22.5 
(could not try newer ones or custom kernels because of lacking linux system)

Distribution: "Debian", "Gentoo", "Damn Small Linux" and "Linux From Scratch"; some Distributins including "Open Solaris" (ok - not Linux) and "Suse" are resulting in a Black Screen. Any MS Windows is working fine. 

Hardware Environment: 
 Mainboard: MSI 915P Neo2 Plantinum
 Processor: Intel P4 530
 Graphic: MSI Radeon X600XT
 (details at request)

Software Environment: 
 n/a (wanted to switch from Windows and never really worked with Linux)

Problem Description:
The following is a transcript of the screen resulting from booting Linux From Scratch (x86; Version 6.3 r2052); It appears immeadedly after uncompressing the kernel image. Changing 'booting arguments' did not help. It may be a bug of the mainboard, but I'm not sure about. BIOS Update and changing BIOS Settings didn't helped at all. (I think there are some lines before this, but its impossible to read them)


EIP: 0060:[<c00f0488>] Not tainted VLI
EFLAGS: 00010264 (2.6.22.5 #1)
EIP is at 0xc00f0488
eax: 0000 b100  ebx: 0000 0000  ecx: 0000 2504  edx: 2580 8086
esi: 0000 0000  edi: c035 bd34  ebp: 0000 b102  esp: df65 ff10
ds: 007b  es: 007b  fs: 00d8  gs: 0000  ss: 0068
Process swapper (pid: 1,  ti=df65e000  task=df644a50  task.ti=df65e000)
Stack: b1000400 c00f0266 c00f028d b1020400 0000bd34 00002580 0100b102 8086c00f
       00540000 0246c00f c0264868 00000060 00000006 df402658 df402400 df407800
       df65ff6c 00000000 00000000 df65ff72 00000000 00000000 00000001 df65ff6c
Call Trace: 
 [<c0264868>] pcibissort+0x58/0x180
 [<c0393159>] pcibios_init+0x79/0x90
 [<c0377590>] kernel_init+0x130/0x310
 [<c0104066>] ret_from_fork+0x6/0x20
 [<c0377460>] kernel_init+0x0/0x310
 [<c0377460>] kernel_init+0x0/0x310
 [<c0105397>] kernel_thread_helper+0x7/0x10
========================
Code: 81 fe 00 ff 75 05 8a c1 ee eb 11 66 81 fe 01 ff 75 07 66 8b c1 66 ef eb 03
8b c1 ef 66 9d 66 5a 58 f8 c3 c3 c3 66 50 66 53 b7 00 <2e> 8a 83 96 04 00 00 66
5b 8a d8 66 58 c3 00 01 02 03 04 05 06
EIP [<c00f0488>] 0xc00f0488  ss: ESP  0068: df65ff10
Kernel panic - not syncing: Attempted to kill init!
Comment 1 movgp0 2007-10-07 14:06:07 UTC
Re: I think that "Damn Small Linux" has another problem; I need further testing with Knoppix before I can say if it is the same problem... 
Comment 2 Randy Dunlap 2007-10-07 15:52:34 UTC
Is "pcibissort" a typo for pcibios_sort?

Were there any error/warning messages before this kernel fault?

Can you boot any Linux kernel on this system?  If so, please provide output
from 'lspci -v'.
 
Comment 3 movgp0 2007-10-07 21:57:27 UTC
* yes, its a typo ;-)
* I think so. 
* I run Linux in a VM under Windows sucessfully, but I don't think that that helps. Any other Linux kernel seems to do not work. If I can do so, I'll provide the requested data later. 
Comment 4 Randy Dunlap 2007-10-07 22:30:57 UTC
Please try to provide full & complete answers.
a.  What is the typo-ed function name supposed to be?
b.  what messages came before the kernel fault?
c.  thank you.
Comment 5 movgp0 2007-10-08 03:05:10 UTC
What is the typo-ed function name supposed to be?
a) "pcibis_sort" rather than "pcibiossort"; I've double checked the rest and couldn't find any other error. 

hat messages came before the kernel fault?
b) In case of LFS: 
 [Splash Screen Image]
 This is the official LFS Live CD, Version: x86-6.3-r2052
 Press [Enter] to boot, F1 - F4 for available options
 boot: _
 Loading Linux..................................
 Loading initramfs_data_cpio.gz...............................................
 .............................................................................
 ...................DONE

The next thing that is visible is "EIP: 0060:[<c00f0488>] Not tainted VLI". There are no other statements. 

btw: I've tried it with Knoppix; it is resulting in a black screen on default and resulting in almost the same error in debug mode. In debug mode there is more to read, but its to fast to write it down on paper. 
Comment 6 movgp0 2007-10-08 03:45:51 UTC
Is there any way to scroll through the error page-by-page like using the 'more' command? Some changes in the source of the bootstrap code might help doing so. I think that its not possible to save a protocol on HDD. Afaik the HDD driver don't loads in this early stage. 
Comment 7 movgp0 2007-10-09 15:51:16 UTC
I didn't make to install any linux. At least no one with a 2.6.xx Kernel. Finally I've managed to install FreeBSD (this took me some time because as Windows user I'm not familar with Unix). Because Randy asked for the "lspci -v", I'm providing Informations from "pciconf -ls", which does afaik (almost) the same. I hope that helps. So here it is: 


hostb0@pci0:0:0:	class=0x060000 card=0x70281462 chip=0x25808086 rev=0x04 hdr=0x00
    vendor   = 'Intel Corporation'
    device   = '82915G/GV/GL/P/PL/GL Grantsdale Host Bridge/DRAM Controller'
    class    = bridge
    subclass = HOST-PCI
pcib1@pci0:1:0:	class=0x060400 card=0x00000088 chip=0x25818086 rev=0x04 hdr=0x01
    vendor   = 'Intel Corporation'
    device   = '82915G/P/PL Grantsdale Host-PCI Express Graphics Bridge'
    class    = bridge
    subclass = PCI-PCI
none0@pci0:27:0:	class=0x040300 card=0x70281462 chip=0x26688086 rev=0x03 hdr=0x00
    vendor   = 'Intel Corporation'
    device   = '82801FB/FR/FW/FRW Intel High DefiNition Audio Controller'
    class    = multimedia
pcib2@pci0:28:0:	class=0x060400 card=0x00000040 chip=0x26608086 rev=0x03 hdr=0x01
    vendor   = 'Intel Corporation'
    device   = '82801FB/FR/FW/FRW PCI Express Port 1'
    class    = bridge
    subclass = PCI-PCI
pcib3@pci0:28:2:	class=0x060400 card=0x00000040 chip=0x26648086 rev=0x03 hdr=0x01
    vendor   = 'Intel Corporation'
    device   = '82801FB/FR/FW/FRW PCI Express Port 3'
    class    = bridge
    subclass = PCI-PCI
uhci0@pci0:29:0:	class=0x0c0300 card=0x70281462 chip=0x26588086 rev=0x03 hdr=0x00
    vendor   = 'Intel Corporation'
    device   = '82801FB/FR/FW/FRW USB UHCI Controller'
    class    = serial bus
    subclass = USB
uhci1@pci0:29:1:	class=0x0c0300 card=0x70281462 chip=0x26598086 rev=0x03 hdr=0x00
    vendor   = 'Intel Corporation'
    device   = '82801FB/FR/FW/FRW USB UHCI Controller'
    class    = serial bus
    subclass = USB
uhci2@pci0:29:2:	class=0x0c0300 card=0x70281462 chip=0x265a8086 rev=0x03 hdr=0x00
    vendor   = 'Intel Corporation'
    device   = '82801FB/FR/FW/FRW USB UHCI Controller'
    class    = serial bus
    subclass = USB
uhci3@pci0:29:3:	class=0x0c0300 card=0x70281462 chip=0x265b8086 rev=0x03 hdr=0x00
    vendor   = 'Intel Corporation'
    device   = '82801FB/FR/FW/FRW USB UHCI Controller'
    class    = serial bus
    subclass = USB
ehci0@pci0:29:7:	class=0x0c0320 card=0x70281462 chip=0x265c8086 rev=0x03 hdr=0x00
    vendor   = 'Intel Corporation'
    device   = '82801FB/FR/FW/FRW USB 2.0 EHCI Controller'
    class    = serial bus
    subclass = USB
pcib4@pci0:30:0:	class=0x060401 card=0x00000050 chip=0x244e8086 rev=0xd3 hdr=0x01
    vendor   = 'Intel Corporation'
    device   = '82801BA/CA/DB/DBL/EB/ER/FB (ICH2/3/4/4/5/5/6), 6300ESB Hub Interface to PCI Bridge'
    class    = bridge
    subclass = PCI-PCI
isab0@pci0:31:0:	class=0x060100 card=0x70281462 chip=0x26408086 rev=0x03 hdr=0x00
    vendor   = 'Intel Corporation'
    device   = '82801FB/FR ICH6/ICH6R LPC Interface Bridge'
    class    = bridge
    subclass = PCI-ISA
atapci0@pci0:31:1:	class=0x01018a card=0x70281462 chip=0x266f8086 rev=0x03 hdr=0x00
    vendor   = 'Intel Corporation'
    device   = '82801FB/FBM Ultra ATA Storage Controllers - 266F'
    class    = mass storage
    subclass = ATA
atapci1@pci0:31:2:	class=0x01018f card=0x70281462 chip=0x26528086 rev=0x03 hdr=0x00
    vendor   = 'Intel Corporation'
    device   = '82801FR/FRW ICH6R/ICH6RW SATA Controller'
    class    = mass storage
    subclass = ATA
none1@pci0:31:3:	class=0x0c0500 card=0x70281462 chip=0x266a8086 rev=0x03 hdr=0x00
    vendor   = 'Intel Corporation'
    device   = '82801FB/FR/FW/FRW SMBus Controller'
    class    = serial bus
    subclass = SMBus
none2@pci1:0:0:	class=0x030000 card=0x06121462 chip=0x3e501002 rev=0x00 hdr=0x00
    vendor   = 'ATI Technologies Inc'
    device   = 'Radeon X600 Series'
    class    = display
    subclass = VGA
none3@pci1:0:1:	class=0x038000 card=0x06131462 chip=0x3e701002 rev=0x00 hdr=0x00
    vendor   = 'ATI Technologies Inc'
    device   = 'Radeon X600 Series Secondary'
    class    = display
bge0@pci3:0:0:	class=0x020000 card=0x028c1462 chip=0x167714e4 rev=0x01 hdr=0x00
    vendor   = 'Broadcom Corporation'
    device   = 'BCM5750A1 NetXtreme Gigabit Ethernet PCI Express'
    class    = network
    subclass = ethernet
Comment 8 Randy Dunlap 2007-10-09 17:20:49 UTC
On Linux kernels, you could also try some of the kernel command-line boot
options, such as:  pci=nosort
or:  pci=bfsort
or:  pci=nobfsort
to see if they have any effect on the PCI device sorting problem.

I'll take a look at the PCI devices list, thanks.
Comment 9 Randy Dunlap 2007-10-09 22:37:21 UTC
Or take the PCI BIOS out of the picture complete by booting with:
  pci=nobios
since the crash looks a bit like the PCI BIOS is flaky.
Comment 10 movgp0 2007-10-09 23:05:04 UTC
Thank you Randy! 
I'm not experiencing any problems with "pci=nosort" and "pci=nobios"; with "pci=bfsort" and "pci=nobfsort" I still get the Kernel Panic. You might be right that its the PCI BIOS that makes me headache. 

Do you think its better to contact the Mainboard vendor or to handle this issue in the sorting procedure (or changing nothing at all because there is a hack available)? 
Comment 11 Randy Dunlap 2007-10-10 08:29:34 UTC
Question:  do any earlier versions of Linux work for you on this system?
It says above:
  Most recent kernel where this bug did not occur: 2.6.22.5
Is that correct?

To answer your question, I think that you'll need to continue using
"pci=nosort" or "pci=nobios" to boot.  You can try contacting the system
board vendor and explain to them that PCI BIOS indirect calls with a
function code of PCI_FIND_DEVICE are failing to return to the caller,
but I have doubts about how responsive they would be.
Comment 12 movgp0 2007-10-10 14:21:21 UTC
Re: Most recent kernel where this bug did not occur

It is not correct, because I've misread the sentence - I'm not that kind of good english speaker, sorry. 
I have tried it with some older kernels. But they either have that problem or are not capable of detecting all the hardware (at least with 2.4.xx Kernels). I think the problem was introduced with the 2.6.0 Kernel, but it would take probably to much wasted CDs and time to test that. (I have already dozends of Linux-CDs here for which I have no use anymore; also it took me weeks on learning how linux works without ever installing it)
=========================
Re: you'll need to continue using "pci=nosort" or "pci=nobios" to boot

Its not an issue for me because I know that now. But it might be a good thing if the kernel will get some error recovery code in future releases. It would help other linux newbies, because common peoples will mostly not report problems but instead stick to windows. 
=========================
Re: I have doubts about how responsive they would be

I think that too. Because the Mainboard is probably outdated now, the vendor will not really work on it. 
Comment 13 Natalie Protasevich 2007-12-12 16:53:56 UTC
Any update on this bug? Is the workaround a sufficient solution? Sounds like nothing else can be done, is it correct Randy. I'd say this bug can be marked resolved.
Comment 14 Randy Dunlap 2007-12-12 18:39:17 UTC
Resolved would be OK with me, but it only says CODE_FIX or PATCH_ALREADY_AVAILABLE.  Actually this is still an incomplete bug report:
we don't have the function name where the current EIP is.

movgp0: can you provide us with a System.map file for the kernel that failed here
so that we can try to see where the kernel actually faulted?  Which distribution
and version number is it?  Do you recall that information?
Comment 15 movgp0 2007-12-13 03:58:15 UTC
Sorry. I do not recall. Instead I can try to test my distributions one-by-one until I get a match. But I need some time for doing so. 

Note You need to log in before you can comment on or make changes to this bug.