Bug 9229

Summary: with CONFIG_NO_HZ and/or CONFIG_HPET_TIMER set kernel 2.6.23 doesn't boot
Product: Platform Specific/Hardware Reporter: Sergey Smirnov (svs1957)
Component: ARMAssignee: Russell King (rmk)
Status: CLOSED PATCH_ALREADY_AVAILABLE    
Severity: normal CC: john.stultz, protasnb, tglx
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 2.6.23 Subsystem:
Regression: --- Bisected commit-id:
Attachments: PXA one shot timer fix

Description Sergey Smirnov 2007-10-26 07:02:51 UTC
Most recent kernel where this bug did not occur:-
Distribution:Debian Armel
Hardware Environment: Sharp Zaurus 750 (pxa255)
Software Environment:Debian Armel
Problem Description:
If I set either CONFIG_NO_HZ or CONFIG_HPET_TIMER
kernel 2.6.23 doesn't boot. Nothing on screen.
If I unset both kernel boot normal. 
Steps to reproduce:
Set any of CONFIG_NO_HZ or CONFIG_HPET_TIMER and boot on pxa255 box.
Comment 1 Natalie Protasevich 2007-11-12 19:24:10 UTC
Sergey,
Can you attach your dmesg from a good boot, and maybe /proc/interrupts.

Thanks.
Comment 2 Thomas Gleixner 2007-11-13 06:24:34 UTC
CONFIG_HPET_TIMER on ARM ?? There is no such thing.

You mean CONFIG_HIGH_RES_TIMERS perhaps ?

    tglx
Comment 3 Sergey Smirnov 2007-11-13 07:08:13 UTC
Yes
CONFIG_NO_HZ=y
CONFIG_HIGH_RES_TIMERS=y

interrupts
=================================
           CPU0
  9:          0      GPIO-l  AC Input Detect
 11:       1069          SC  pxa2xx_udc
 18:         71          SC  pxa_i2c-i2c.0
 23:      29864          SC  pxa2xx-mci
 24:          0          SC  SSP
 25:          0          SC  DMA
 26:      19892          SC  ost0
 69:          0        GPIO  ts
 73:          0        GPIO  MMC card detect
 75:          0        GPIO  Battery Cover
 80:          0        GPIO  CO
122:          0        GPIO  corgikbd
123:          0        GPIO  corgikbd
124:          0        GPIO  corgikbd
125:          0        GPIO  corgikbd
126:          0        GPIO  corgikbd
127:          0        GPIO  corgikbd
128:          0        GPIO  corgikbd
129:          0        GPIO  corgikbd
Err:          0
===========================
dmesg:
==========================
Linux version 2.6.23-corgi (root@ssmirnov) (gcc version 4.1.1) #13 PREEMPT Tue Nov 13 17:54:42 MSK 2007
CPU: XScale-PXA255 [69052d06] revision 6 (ARMv5TE), cr=0000397f
Machine: SHARP Shepherd
Ignoring unrecognised tag 0x00000000
Ignoring unrecognised tag 0x00000000
Ignoring unrecognised tag 0x00000000
Ignoring unrecognised tag 0x00000000
Memory policy: ECC disabled, Data cache writeback
On node 0 totalpages: 16384
  DMA zone: 128 pages used for memmap
  DMA zone: 0 pages reserved
  DMA zone: 16256 pages, LIFO batch:3
  Normal zone: 0 pages used for memmap
  Movable zone: 0 pages used for memmap
Memory clock: 99.53MHz (*27)
Run Mode clock: 199.07MHz (*2)
Turbo Mode clock: 398.13MHz (*2.0, active)
CPU0: D VIVT undefined 5 cache
CPU0: I cache: 32768 bytes, associativity 32, 32 byte lines, 32 sets
CPU0: D cache: 32768 bytes, associativity 32, 32 byte lines, 32 sets
Built 1 zonelists in Zone order.  Total pages: 16256
Kernel command line: console=ttyS0,115200 console=tty1 noinitrd root=/dev/mmcblk0p1 rootfstype=ext3 rootdelay=5 mtdparts=sharpsl-nand:10240k@0k(smf),55296k@10240k(root),-(home)
PID hash table entries: 256 (order: 8, 1024 bytes)
Console: colour dummy device 80x30
console [tty1] enabled
console [ttyS0] enabled
Dentry cache hash table entries: 8192 (order: 3, 32768 bytes)
Inode-cache hash table entries: 4096 (order: 2, 16384 bytes)
Memory: 64MB = 64MB total
Memory: 62208KB available (2288K code, 306K data, 76K init)
Calibrating delay loop... 397.31 BogoMIPS (lpj=1986560)
Mount-cache hash table entries: 512
CPU: Testing write buffer coherency: ok
NET: Registered protocol family 16
Sharp Scoop Device found at 0x10800000 -> 0xc4800000
NET: Registered protocol family 2
Time: oscr0 clocksource has been installed.
IP route cache hash table entries: 1024 (order: 0, 4096 bytes)
TCP established hash table entries: 2048 (order: 2, 16384 bytes)
TCP bind hash table entries: 2048 (order: 1, 8192 bytes)
TCP: Hash tables configured (established 2048 bind 2048)
TCP reno registered
pxa25x: CPU frequency change support initialized (powersave tables)
NetWinder Floating Point Emulator V0.97 (double precision)
JFFS2 version 2.2. (NAND) (SUMMARY)  © 2001-2006 Red Hat, Inc.
io scheduler noop registered
io scheduler anticipatory registered (default)
io scheduler deadline registered
io scheduler cfq registered
Corgi Backlight Driver Initialized.
Found w100 at 0x08000000.
Console: switching to colour frame buffer device 80x30
fb0: w100fb frame buffer device
pxa2xx-uart.0: ttyS0 at MMIO 0x40100000 (irq = 22) is a FFUART
pxa2xx-uart.1: ttyS1 at MMIO 0x40200000 (irq = 21) is a BTUART
pxa2xx-uart.2: ttyS2 at MMIO 0x40700000 (irq = 20) is a STUART
pxa2xx-uart.3: ttyS3 at MMIO 0x41600000 (irq = 7) is a HWUART
loop: module loaded
Uniform Multi-Platform E-IDE driver Revision: 7.00alpha2
ide: Assuming 50MHz system bus speed for PIO modes; override with idebus=xx
Sharp SL series flash device: 800000 at 0
Using static partision definition
Creating 1 MTD partitions on "sharpsl-flash":
0x00120000-0x007f0000 : "Boot PROM Filesystem"
NAND device: Manufacturer ID: 0x98, Chip ID: 0x76 (Toshiba NAND 64MiB 3,3V 8-bit)
Scanning device for bad blocks
3 cmdlinepart partitions found on MTD device sharpsl-nand
Creating 3 MTD partitions on "sharpsl-nand":
0x00000000-0x00a00000 : "smf"
0x00a00000-0x04000000 : "root"
0x04000000-0x04000000 : "home"
mtd: partition "home" is out of reach -- disabled
pxa2xx_udc: version 30-June-2007
input: Corgi Keyboard as /class/input/input0
input: Corgi Touchscreen as /class/input/input1
power.c: Adding power management to input layer
sa1100-rtc sa1100-rtc: rtc core: registered sa1100-rtc as rtc0
I2C: i2c-0: PXA I2C adapter
SA1100/PXA2xx Watchdog Timer: timer margin 60 sec
Registered led device: corgi:amber
Registered led device: corgi:green
TCP cubic registered
NET: Registered protocol family 1
NET: Registered protocol family 17
XScale DSP coprocessor detected.
sa1100-rtc sa1100-rtc: setting the system clock to 1970-01-01 00:00:07 (7)
Waiting 5sec before mounting root device...
mmc0: new SD card at address b368
mmcblk0: mmc0:b368       3921920KiB
 mmcblk0: p1 p2 p3
Not charging: temperature out of limits.
sharpsl-pm sharpsl-pm: Charging Error!
kjournald starting.  Commit interval 5 seconds
EXT3 FS on mmcblk0p1, internal journal
EXT3-fs: mounted filesystem with ordered data mode.
VFS: Mounted root (ext3 filesystem).
Freeing init memory: 76K
mice: PS/2 mouse device common for all mice
Adding 130688k swap on /dev/mmcblk0p2.  Priority:-1 extents:1 across:130688k
EXT3 FS on mmcblk0p1, internal journal
ether gadget: using random self ethernet address
ether gadget: using random host ethernet address
usb0: Ethernet Gadget, version: May Day 2005
usb0: using pxa2xx_udc, OUT ep2out-bulk IN ep1in-bulk STATUS ep6in-bulk
usb0: MAC e6:33:86:87:ad:92
usb0: HOST MAC e6:8f:c3:da:05:20
usb0: RNDIS ready
usb0: full speed config #1: 100 mA, Ethernet Gadget, using CDC Ethernet Subset
usb0: full speed config #1: 100 mA, Ethernet Gadget, using CDC Ethernet Subset
ASoC version 0.13.1
wm8731: WM8731 Audio Codec 0.13
asoc: WM8731 <-> pxa2xx-i2s mapping ok
kjournald starting.  Commit interval 5 seconds
EXT3 FS on mmcblk0p3, internal journal
EXT3-fs: mounted filesystem with ordered data mode.
Bluetooth: Core ver 2.11
NET: Registered protocol family 31
Bluetooth: HCI device and connection manager initialized
Bluetooth: HCI socket layer initialized
Bluetooth: L2CAP ver 2.8
Bluetooth: L2CAP socket layer initialized
Bluetooth: RFCOMM socket layer initialized
Bluetooth: RFCOMM TTY layer initialized
Bluetooth: RFCOMM ver 1.8
w100fb: Using fast system clock (if possible)
===============
I got to much wakeups on 2.6.23 (http://linuxpowertop.org/)
Output from powertop:

Top causes for wakeups:
  88,5% (100,2)       <interrupt> : ost0
Comment 4 Sergey Smirnov 2007-11-13 07:30:49 UTC
This output on kernel 2.6.23  with CONFIG_NO_HZ and CONFIG_HIGH_RES_TIMERS unset.
If I set any of its kernel hangs on booting.
Comment 5 Russell King 2007-11-13 10:55:09 UTC
First point, Sergey, please don't assign bug reports to yourself unless you're
planning to resolve them yourself.

Secondly, we've recently found a bug with the PXA one shot timer code which
produces the exact symptoms you're describing.  It's fixed in the latest kernel.
However, I'll attach the fix to this report.  Please confirm whether this patch
fixes the bug.
Comment 6 Russell King 2007-11-13 10:57:10 UTC
Created attachment 13532 [details]
PXA one shot timer fix

From git commit 91bc51d8a10b00d8233dd5b6f07d7eb40828b87d

    [ARM] pxa: fix one-shot timer mode

    One-shot timer mode on PXA has various bugs which prevent kernels
    build with NO_HZ enabled booting.  They end up spinning on a
    permanently asserted timer interrupt because we don't properly
    clear it down - clearing the OIER bit does not stop the pending
    interrupt status.  Fix this in the set_mode handler as well.

    Moreover, the code which sets the next expiry point may race with
    the hardware, and we might not set the match register sufficiently
    in the future.  If we encounter that situation, return -ETIME so
    the generic time code retries.
Comment 7 Sergey Smirnov 2007-11-13 23:38:44 UTC
Russell,
Thank you for patch. It help. But  new problem occurred.
PDA doesn't resume from suspend.
--
Sergey
Comment 8 Sergey Smirnov 2007-11-14 06:27:30 UTC
If I unset NO_TZ suspend/resume works.
If I set it suspend/resume doesn't works.
Comment 9 Sergey Smirnov 2007-11-14 06:28:20 UTC
sorry I mean NO_HZ
Comment 10 Russell King 2007-11-14 11:49:44 UTC
Great news.  Bug #9275 might describe your current problem.

As the original bug report stands, it's been solved, so I think this report should
be closed.  A new report (or maybe monitoring 9275) would be a good idea?