Bug 42680

Summary: uImage compressed kernel will not boot on many Kirkwood devices
Product: Platform Specific/Hardware Reporter: purdyd_at_wisheights
Component: ARMAssignee: linux-arm-kernel (linux-arm-kernel)
Status: NEW ---    
Severity: blocking CC: bugzilla.kernel.bpeb, jrnieder, rjw
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 3.2.x , 3.3.0-rc1 Subsystem:
Regression: No Bisected commit-id:
Attachments: .config

Description purdyd_at_wisheights 2012-01-28 19:51:25 UTC
We've found a severe/grave problem with the 3.2 & 3.3-rc1 kernels on certain Kirkwood machines.

Problem:  (compressed uImage) Kernel will not boot on many Kirkwood devices  (Dockstar, some PogoPlugs, others)

Package(s):  Linux Kernel 3.2.x and 3.3.0-rc1

Steps to reproduce:
Build a uImage kernel either natively on Debian Squeeze or Wheezy, with build-essential etc.,  with the CodeSourcery CrossCompile ToolChain, with ArchLinux or any other kernel build setup.   Attempts at booting these 3.2 and 3.3 kernels show a complete dead hang on the serial output:

====================================
## Booting kernel from Legacy Image at 00800000 ...
   Image Name:   Linux-3.3.0-rc1-kirkwood
   Image Type:   ARM Linux Kernel Image (uncompressed)
   Data Size:    1626136 Bytes = 1.6 MiB
   Load Address: 00008000
   Entry Point:  00008000
   Verifying Checksum ... OK
## Loading init Ramdisk from Legacy Image at 01100000 ...
   Image Name:   initramfs-3.3.0-rc1-kirkwood
   Image Type:   ARM Linux RAMDisk Image (gzip compressed)
   Data Size:    5778790 Bytes = 5.5 MiB
   Load Address: 00000000
   Entry Point:  00000000
   Verifying Checksum ... OK
   Loading Kernel Image ... OK
OK

Starting kernel ...

Uncompressing Linux... done, booting the kernel.
====================================



The same behavior is seen whether we use "make uImage"  or "make-kpkg --rootcmd fakeroot --arch armel  --append-to-version=-kirkwood --revision=1.0 --initrd kernel_image"

Results are the same, a non-booting uImage.

Users from some forums have noted that using gzip vs. lzma (or vice versa) for the compression changes the results sometimes, but not alway.  Behavior is unpredictable, it seems.

We have confirmed that an __uncompressed__ kernel will boot completely.   This is not the default for Debian armel packages or other Kirkwood installations, though, and won't be suitable as a longterm fix.


Two things to note and clarify:
1.  this is _not_ the  arch/arm/asm/bug.h  compile time problem that is causing problems in ARM 
2.  3.1.10  works just fine on Kirkwood - no problems there.



Here are some links that show a few of the discussions that are going on regarding this:

•Re: Linux kernels 3.2 & 3.3-rc1 are broken!  http://forum.doozan.com/read.php?2,6550,6868#msg-6868 
•new kernel 3.2 does not boot on the dockstar  http://archlinuxarm.org/forum/viewtopic.php?f=18&t=2314
Comment 1 Jonathan Nieder 2012-02-06 17:14:59 UTC
Cc-ing Raphael since this is a regression (and so should block bug 42566 to get on the list).

purdyd, can you bisect? It works somewhat like this (feel free to tweak for cross-compilation and building on a different machine from where the kernel runs as appropriate):

 0. Prerequisites:

apt-get install git build-essential

 1. Grab the kernel, with history:

git clone git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
cd linux

    Or, if you already have a git checkout of the kernel, update it:

cd linux
git fetch origin
git checkout origin/master

 2. Configure:

cp /boot/config-$(uname -r) .config; # current configuration
make localmodconfig; # minimal configuration
make nconfig; # tweak configuration

 3. Test a known-broken version:

make deb-pkg; # optionally with -j<num> for parallel build
dpkg -i ../<name of package>
make sure kernel is flashed, reboot, test the uncompressed and compressed versions

     Hopefully it reproduces the problem.  So:

 4. Test a known-good version:

git checkout v3.1
make silentoldconfig; # reuse configuration
make deb-pkg; # maybe with -j<num>
dpkg -i ../<name of package>
... test as usual ...

      Hopefully it works fine.  So let git know the result:

git bisect start
git bisect bad origin/master
git bisect good v3.1

 5. A version halfway between is automatically checked out to test:

make silentoldconfig
make deb-pkg; # maybe with -j<num>
dpkg -i ../<name of package>
... test ...
git bisect bad; # if it reproduces the problem
git bisect good; # if compressed kernels work fine
git bisect skip; # if some other problem makes it hard to test

 6. Repeat step 5 until bored. Eventually it will spit out the "first bad commit", or if you get bored before that, you can run "git bisect log" to get a summary of the tests you have run, which is almost as good. If the gitk package is installed, you can run "git bisect visualize" at any step to watch the regression range narrowing.
Comment 2 Jonathan Nieder 2012-02-06 17:27:20 UTC
From Ian Campbell, at <http://bugs.debian.org/658759>:

> I suspect this is due to the lack of this u-boot patch:
> http://lists.denx.de/pipermail/u-boot/2012-February/117020.html
>
> I found that without this my 3.2 dreamplug kernel would not boot (with
> the 2011.12-2 package from debian). It's related to
> CONFIG_ARM_PATCH_PHYS_VIRT.

That would point to c1becedc8871 (ARM: enable ARM_PATCH_PHYS_VIRT by default, v3.2-rc1~189^2~1^6~2) as the first bad commit. One can check if this is the cause by disabling ARM_PATCH_PHYS_VIRT to see if that helps.

Is there anything the kernel could do to continue to work with old (well, current today ;-)) versions of u-boot, too?
Comment 3 Jonathan Nieder 2012-02-07 20:58:41 UTC
From Nico Pitre, at [1]:

> You really do want to have uboot patched.  Who knows what other latent 
> issues are there that you don't know about.

I wonder if the kernel should read the extra features register at some
appropriate moment and quietly disable L2 or panic with a hint that the
bootloader has screwed up.  This is very early in the boot sequence so
it might be tricky.

Hints for the novice:
 - enabling/disabling L2: arch/arm/mm/cache-feroceon-l2.c
 - booting a compressed kernel: arch/arm/boot/compressed/head.S

[1] http://thread.gmane.org/gmane.linux.ports.arm.kernel/127951/focus=151172
Comment 4 Christoph Biedl 2012-09-05 05:33:11 UTC
I'm attaching my trouble here although I'm not sure whether it's the
same thing.

Sympton: Dockstar does not boot since the upgrade from 3.0-longterm to
3.4-longterm. No message after "Uncompressing Linux... done, booting
the kernel.", not even on the serial console.

Bisecting lead to v3.0-rc6-6-g3835d69:

3835d69a6c7048a28d0aea3cb8403d5e83a0f867 is the first bad commit
commit 3835d69a6c7048a28d0aea3cb8403d5e83a0f867
Author: Russell King <rmk+kernel@arm.linux.org.uk>
Date:   Wed Jul 6 10:39:34 2011 +0100

    ARM: vmlinux.lds: move init sections between text and data sections

That is in contradiction to "3.1.10 works just fine on Kirkwood".

Using an uncompressed image ("make Image") did not help either.


More shutgun debugging (on 3.6-rc4):

Disabling CONFIG_ARM_UNWIND - no avail.

Disabling CONFIG_CACHE_FEROCEON_L2 lead to

(...)
  CC      init/version.o
  LD      init/built-in.o
arch/arm/mach-kirkwood/built-in.o: In function `kirkwood_l2_init':
cpuidle.c:(.init.text+0x1d4): undefined reference to `feroceon_l2_init'
make: *** [vmlinux] Error 1

And I'd happily disable ARM_PATCH_PHYS_VIRT at least for test but I
have no idea what to enter for CONFIG_PHYS_OFFSET.


So, I'm out of ideas at the moment. Do you have some more?

The .config used to build at the guilty commit is attached.
Comment 5 Christoph Biedl 2012-09-05 05:34:16 UTC
Created attachment 79311 [details]
.config
Comment 6 Jonathan Nieder 2012-09-05 07:41:50 UTC
Hi Christoph,

(In reply to comment #4)
> I'm attaching my trouble here although I'm not sure whether it's the
> same thing.

Yes, it isn't. Could you file a separate bug (or even better, write to linux-arm-kernel@ directly and file a bug with a link to a mailing list archive with your message)?

Thanks much,
Jonathan
Comment 7 Jonathan Nieder 2012-09-05 07:43:59 UTC
(In reply to comment #6)
> (In reply to comment #4)
> > I'm attaching my trouble here although I'm not sure whether it's the
> > same thing.
> 
> Yes, it isn't. Could you file a separate bug (or even better, write to
> linux-arm-kernel@ directly and file a bug with [...]

Sorry to make things complicated.  Simpler, if you prefer: a message to linux-arm-kernel@lists.infradead.org, cc-ing me (jrnieder@gmail.com).
Comment 8 Jonathan Nieder 2012-09-06 19:24:10 UTC
*** Bug 47071 has been marked as a duplicate of this bug. ***