Bug 370 - Kernel will not boot against Asus P4T533-C
Summary: Kernel will not boot against Asus P4T533-C
Status: CLOSED CODE_FIX
Alias: None
Product: Platform Specific/Hardware
Classification: Unclassified
Component: i386 (show other bugs)
Hardware: i386 Linux
: P2 normal
Assignee: Zwane Mwaikambo
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2003-02-16 13:47 UTC by Bryan W. Headley
Modified: 2003-05-29 06:38 UTC (History)
1 user (show)

See Also:
Kernel Version: 2.5.61 (pristine)
Subsystem:
Regression: ---
Bisected commit-id:


Attachments
.config file from 2.5.69-ac1 (25.68 KB, text/plain)
2003-05-13 09:33 UTC, Kevin Jacobs
Details
lspci output (757 bytes, text/plain)
2003-05-13 09:33 UTC, Kevin Jacobs
Details
early printk (12.96 KB, patch)
2003-05-13 10:49 UTC, Zwane Mwaikambo
Details | Diff
bare config (291 bytes, patch)
2003-05-15 14:29 UTC, Zwane Mwaikambo
Details | Diff
bare'ish config (10.21 KB, patch)
2003-05-16 03:27 UTC, Zwane Mwaikambo
Details | Diff
disable busmaster event checking (720 bytes, patch)
2003-05-29 04:53 UTC, Zwane Mwaikambo
Details | Diff

Description Bryan W. Headley 2003-02-16 13:47:16 UTC
Distribution: Debian
Hardware Environment: Asus P4T533-C; 3Ghz PIV; 1GB RAM
Software Environment: Kernel 2.5.61 built without mkinitrd; boot with Lilo
Problem Description:

Nothing displayed after "Booting Linux..." (last message from Lilo)

Lilo 22.4; module-init-tools 0.9.9

Steps to reproduce:

Build-n-boot.
Comment 1 Alan 2003-03-06 13:08:27 UTC
Please attach compiler version info and .config for the kernel
Comment 2 Dave Jones 2003-04-04 13:42:50 UTC
Things to try:
1, acpi=off vga=1
2, building kernel as i386 instead of higher
Comment 3 Kevin Jacobs 2003-05-13 09:31:57 UTC
I'm having a very similar problem with 2.5.60, 2.5.65, 2.5.68, 2.5.69, and
2.5.69-ac1.  GRUB says it loads the kernel correctly, uncompresses,
signals that it is handing control to the new kernel, and then hangs 
immediately after.  Here are the relevant specs:

Motherboard:  Intel 840 Workstation board w/ Dual Intel P-III 866MHz
              (Coppermine) CPUs, 512MB RDRAM.

Linux distribution: Redhat 9 w/ latest up2dates, and Rusty's
                    module-init-tools-0.9.11a

GCC version: gcc -v
             Reading specs from /usr/lib/gcc-lib/i386-redhat-linux/3.2.2/specs
             Configured with: ../configure --prefix=/usr --mandir=/usr/share/man
           
               --infodir=/usr/share/info --enable-shared --enable-threads=posix 
               --disable-checking --with-system-zlib --enable-__cxa_atexit 
               --host=i386-redhat-linux
             Thread model: posix
             gcc version 3.2.2 20030222 (Red Hat Linux 3.2.2-5)

.config and lspci:     See attached.

The systems works just fine with 2.4 kernels (stock and Redhat).  
I've tried:
  1) Compiling with cpu=generic i386
  2) acpi=off vga=1

Any suggestions would be greatly appreciated.
Comment 4 Kevin Jacobs 2003-05-13 09:33:08 UTC
Created attachment 335 [details]
.config file from 2.5.69-ac1
Comment 5 Kevin Jacobs 2003-05-13 09:33:59 UTC
Created attachment 336 [details]
lspci output
Comment 6 Bryan W. Headley 2003-05-13 10:43:04 UTC
Discovered problem (on my box) is two-fold:

1) Until recently, SMP systems had hard time mounting root FS, when root was
EXT2, EXT3 or XFS. (subsequently fixed)
2) The kernel is not able to load modules. Modules that caused problems were
ATKBD and serial IO. Result was unable to find /dev/tty? and /dev/ttyS?. It
would boot, but you can't log in :-)
Comment 7 Zwane Mwaikambo 2003-05-13 10:49:41 UTC
Created attachment 341 [details]
early printk

Select the VGA output section in the early printk section part of 'Kernel
Hacking'
Comment 8 Kevin Jacobs 2003-05-14 12:13:48 UTC
Thanks Zwane for the suggestion, though still no joy.  I've also
recompiled as UP, hoping that may simplify things.  Anyhow, I'm
not afraid to get my hands dirty in some C code, so tell me where
to start sprinkling (early-)printk's and I'll narrow down where
things are blowing up.

It is also possible for me to attach a serial console, though I'd
have to dig up a NULL-modem cable from somewhere.

Anyhow -- just point me in a useful direction...
Comment 9 Kevin Jacobs 2003-05-14 12:25:02 UTC
Hmmm... I've read over the early-printk patch, and things don't look good.
The printk right after register_early_consoles() never makes it to the
screen.  Any other things to try?
Comment 10 Zwane Mwaikambo 2003-05-15 14:29:59 UTC
Created attachment 344 [details]
bare config

Could you try building a kernel with this configuration and trying to boot it?
Comment 11 Kevin Jacobs 2003-05-16 03:17:23 UTC
Zwane, it looks like you attached the wrong config file.
Please re-send and I'll try ASAP.
Comment 12 Zwane Mwaikambo 2003-05-16 03:27:40 UTC
Created attachment 350 [details]
bare'ish config

I attached the wrong config file previously
Comment 13 Kevin Jacobs 2003-05-16 03:59:20 UTC
Good news!  My system boots with the minimal config (though it can't
so much more than just boot).  I'll start re-adding things to the config
until something goes boom.  Any suggestions?
Comment 14 Zwane Mwaikambo 2003-05-16 04:46:31 UTC
Avoid any power management related things like ACPI, for now just turn on your
root fs and any IDE/SCSI controllers you may require
Comment 15 Zwane Mwaikambo 2003-05-27 22:55:47 UTC
How is this looking in 2.5.70?
Comment 16 Bryan W. Headley 2003-05-28 19:31:52 UTC
Works much better in 2.5.69-bk17. Have not built 2.5.70 yet...
Comment 17 Kevin Jacobs 2003-05-29 04:18:54 UTC
I'm not running 2.5.70-mm1, which works with a .config based on the
one that Zwane sent (with all the things I need added).  Things look
good except that I am getting ~66000 interrupts per second due to the
ACPI:

           CPU0       CPU1
  0:   24458828   24627983    IO-APIC-edge  timer
  1:          2         12    IO-APIC-edge  i8042
  2:          0          0          XT-PIC  cascade
  8:          1          0    IO-APIC-edge  rtc
  9: 1600921506 1600747252   IO-APIC-level  acpi
 12:          4         51    IO-APIC-edge  i8042
 14:          1          1    IO-APIC-edge  ide0
 15:          2          0    IO-APIC-edge  ide1
 17:         24         25   IO-APIC-level  aic7xxx
 19:     909103     903356   IO-APIC-level  3ware Storage Controller, eth0
NMI:    1460789    1472578
LOC:   49085767   49085766
ERR:          0
MIS:          0

OProfile while copying a large tree of hardlinks (cp -Rl x y):

vma      samples  %           symbol name               
c01caf95 1217698  41.5755     acpi_os_read_port         
c0108bf0 1200771  40.9975     default_idle              
c01d633a 150704   5.14544     acpi_hw_low_level_read    
c010d8a0 54398    1.85729     do_IRQ                    
c010b240 23926    0.816897    irq_entries_start         
c0174b40 23354    0.797368    __d_lookup                
c01d019c 20739    0.708085    acpi_ev_gpe_detect        
c01cb3ef 20002    0.682922    acpi_os_acquire_lock      
c01d60bc 16298    0.556457    acpi_hw_register_read     
c01164b0 16033    0.547409    mark_offset_tsc           
c0119980 13953    0.476393    end_level_ioapic_irq      
c01764a0 12797    0.436924    find_inode_fast           
c019e460 11037    0.376833    ext3_find_entry           
c01ce95f 8994     0.307079    acpi_ev_fixed_event_detect
c019f110 8456     0.28871     add_dirent_to_buf         
c010d560 8144     0.278058    handle_IRQ_event          
c01696b0 6286     0.214621    link_path_walk            
c0111e70 5330     0.18198     timer_interrupt           
c010bac0 5297     0.180854    common_interrupt          
c01975c0 5266     0.179795    ext3_check_dir_entry      
c010b17a 4528     0.154598    restore_all               
c0108c80 4177     0.142614    cpu_idle                  
[...]

vmstat during that copy:
[...]
   procs                      memory      swap          io     system      cpu
 r  b  w   swpd   free   buff  cache   si   so    bi    bo   in    cs us sy id
 0  1  0      0   4648 232224   9176    0    0   228     0 66668   124  1 45 55
 0  1  0      0   5288 231596   9192    0    0   332   412 66666   229  0 49 51
 2  1  2      0   4312 232292   9176    0    0   340  1940 65822   268  1 50 48
 0  1  0      0   5160 231440   9212    0    0   348   296 66538   581  5 53 43
 0  1  0      0   4648 232136   9128    0    0   472     0 66931   247  0 50 50
 0  1  0      0   5096 231688   9168    0    0   440   400 66684   286  0 52 48
 0  1  0      0   4456 232376   9160    0    0   448     0 66890   235  1 50 49
 0  1  0      0   4968 231796   9264    0    0   364  2128 65770   241  0 58 42
 0  1  0      0   4456 232300   9236    0    0   336    12 66989   184  1 47 52
 0  1  0      0   5608 231340   9244    0    0   276   392 66930   159  0 51 49
 0  1  0      0   5224 231728   9264    0    0   268     0 66825   142  0 49 51
 0  1  0      0   4712 232252   9284    0    0   356     0 66992   191  0 50 50
 0  1  0      0   4264 232776   9168    0    0   332  1368 66248   205  1 51 48
 0  1  0      0   5096 231848   9280    0    0   220   444 66642   165  0 51 49
 0  1  0      0   4648 232356   9248    0    0   344     0 67003   181  0 49 51
 0  1  0      0   4584 232536   9272    0    0   760   404 66309   578  1 51 48
 0  1  0      0   4072 232972   9244    0    0   296     0 66946   157  0 51 49
 0  1  0      0   5352 231876   9252    0    0    64  2944 65474   491  6 59 35
[...]
Comment 18 Kevin Jacobs 2003-05-29 04:19:29 UTC
er, s/not/now/
Comment 19 Zwane Mwaikambo 2003-05-29 04:53:47 UTC
Created attachment 382 [details]
disable busmaster event checking

We really should open a new bug for this... But can you try this patch and then
we'll look at opening a new bug. This patch essentially disables bus master
event monitoring in ACPI
Comment 20 Kevin Jacobs 2003-05-29 06:13:43 UTC
Thanks Zwane!  Disabled ACPI fixes the problem (obviously) and so
does your patch.  Here is /proc/interrupts with your patch:

           CPU0       CPU1
  0:      86839      70698    IO-APIC-edge  timer
  1:          0         12    IO-APIC-edge  i8042
  2:          0          0          XT-PIC  cascade
  8:          0          1    IO-APIC-edge  rtc
  9:      50012      49988   IO-APIC-level  acpi
 12:          5         50    IO-APIC-edge  i8042
 14:          0          2    IO-APIC-edge  ide0
 15:          0          2    IO-APIC-edge  ide1
 17:         24         25   IO-APIC-level  aic7xxx
 19:       1830       1862   IO-APIC-level  3ware Storage Controller, eth0
NMI:          0          0
LOC:     157392     157391
ERR:          0
MIS:          0

Let me know if there is anything else I can do to diagnose the root problem.
I wouldn't be too surprised if my motherboard has broken ACPI -- it was
one of the first RDRAM+333MHz FSB motherboards.
Comment 21 Zwane Mwaikambo 2003-05-29 06:38:33 UTC
Could you please open up a new bug and add your comments starting from #17, i
will also close this bug with resolution set to INVALID since it was a
configuration error.

Note You need to log in before you can comment on or make changes to this bug.