Bug 15873 - Panic: Compress_read returned -22 on some resume attempts
Summary: Panic: Compress_read returned -22 on some resume attempts
Status: CLOSED INSUFFICIENT_DATA
Alias: None
Product: Power Management
Classification: Unclassified
Component: Hibernation/Suspend (show other bugs)
Hardware: All Linux
: P1 normal
Assignee: Nigel Cunningham
URL: http://www.tuxonice.net:80/
Keywords:
Depends on:
Blocks:
 
Reported: 2010-04-29 07:33 UTC by Martin Steigerwald
Modified: 2012-01-18 08:29 UTC (History)
4 users (show)

See Also:
Kernel Version: 2.6.33.2-tp42-toi-3.1-lowmem-free-991-992-04964-gf00c7ec-dirty
Subsystem:
Regression: Yes
Bisected commit-id:


Attachments
First of the mentioned patches from Nigel, version 2 of it (3.64 KB, patch)
2010-04-29 07:35 UTC, Martin Steigerwald
Details | Diff
Second of the mentioned patches from Nigel (1.31 KB, patch)
2010-04-29 07:39 UTC, Martin Steigerwald
Details | Diff
tuxonice config for hibernate script (1.34 KB, text/plain)
2010-04-29 07:49 UTC, Martin Steigerwald
Details
happened again with 2.6.34.1 and tuxonice 3.1.1.1, photo of backtrace (304.68 KB, image/jpeg)
2010-07-12 19:10 UTC, Martin Steigerwald
Details
error still occurs with 2.6.36-rc3 + TOI 3.2-rc1, photo of backtrace (69.38 KB, image/jpeg)
2010-09-09 16:17 UTC, Martin Steigerwald
Details

Description Martin Steigerwald 2010-04-29 07:33:34 UTC
Beware, this could well be an TuxOnIce issue. I am posting here due to a comment of Raphael in bug #15685 (https://bugzilla.kernel.org/show_bug.cgi?id=15685#c24). Please advise whether to continue doing so. I can report in the TuxOnIce bugtracker as well. I will write a note to tuxonice-devel mailing list, also. I added a CC to Nigel. Please reassign bug to Nigel as you see fit. Maybe there should be a TuxOnIce compoment in Power Management to track those bugs for now?

Last kernel version known to work: 2.6.32.8-tp42-toi-3.0.99.49

In some rare cases on resuming I get a:

Compress_read returned -22.
Kernel panic - not synching: Read chunk returned (-22).

I attach a screenshot.

When it happens it happens consistently. Means I can try again and it happens again. I have to press SPACE to remove the image and do a regular boot.

martin@shambhala:~> cat /sys/power/tuxonice/debug_info 
TuxOnIce debugging info:
- TuxOnIce core  : 3.1
- Kernel Version : 2.6.33.2-tp42-toi-3.1-lowmem-free-991-992-04964-gf00c7ec-dirty
- Compiler vers. : 4.4
- Attempt number : 0
- Parameters     : 0 667648 0 0 0 0
- Overall expected compression percentage: 0.
- Checksum method is 'md4'.
  0 pages resaved in atomic copy.
- Compressor is 'lzo'.
- Block I/O active.
- Max outstanding reads 1. Max writes 1.
  Memory_needed: 1024 x (4096 + 200 + 76) = 4476928 bytes.
  Free mem throttle point reached 0.
- Swap Allocator enabled.
  Swap available for image: 732955 pages.
- File Allocator active.
  Storage available for image: 0 pages.
- No I/O speed stats available.
- Extra pages    : 0 used/500.
- Result         : No hibernation attempts so far.

Some information on memory layout (after this new boot):

martin@shambhala:~> cat /proc/meminfo
MemTotal:        2072596 kB
MemFree:           67200 kB
Buffers:          374968 kB
Cached:           601312 kB
SwapCached:            0 kB
Active:           927984 kB
Inactive:         712928 kB
Active(anon):     613460 kB
Inactive(anon):    52300 kB
Active(file):     314524 kB
Inactive(file):   660628 kB
Unevictable:           4 kB
Mlocked:               4 kB
HighTotal:       1187144 kB
HighFree:          10540 kB
LowTotal:         885452 kB
LowFree:           56660 kB
SwapTotal:       2931820 kB
SwapFree:        2931820 kB
Dirty:               128 kB
Writeback:             0 kB
AnonPages:        664652 kB
Mapped:           141896 kB
Shmem:              1128 kB
Slab:             142728 kB
SReclaimable:     116476 kB
SUnreclaim:        26252 kB
KernelStack:        2808 kB
PageTables:         7984 kB
NFS_Unstable:          0 kB
Bounce:                0 kB
WritebackTmp:          0 kB
CommitLimit:     3968116 kB
Committed_AS:    1744488 kB
VmallocTotal:     122880 kB
VmallocUsed:       16212 kB
VmallocChunk:     100196 kB
DirectMap4k:      294904 kB
DirectMap4M:      614400 kB

That kernel contains to patches by Nigel to improve handling of lowmem pages. I have no idea how the relate to this regression. I will attach them as well. Except these patches tree is:

git://git.kernel.org/pub/scm/linux/kernel/git/nigelc/tuxonice-2.6.33.git

as of f00c7ecd068a14c9bd2dd1f237aa9a2e6db0c48f.

This is happening on a ThinkPad T42:

martin@shambhala:~> lspci -nn
00:00.0 Host bridge [0600]: Intel Corporation 82855PM Processor to I/O Controller [8086:3340] (rev 03)
00:01.0 PCI bridge [0604]: Intel Corporation 82855PM Processor to AGP Controller [8086:3341] (rev 03)
00:1d.0 USB Controller [0c03]: Intel Corporation 82801DB/DBL/DBM (ICH4/ICH4-L/ICH4-M) USB UHCI Controller #1 [8086:24c2] (rev 01)
00:1d.1 USB Controller [0c03]: Intel Corporation 82801DB/DBL/DBM (ICH4/ICH4-L/ICH4-M) USB UHCI Controller #2 [8086:24c4] (rev 01)
00:1d.2 USB Controller [0c03]: Intel Corporation 82801DB/DBL/DBM (ICH4/ICH4-L/ICH4-M) USB UHCI Controller #3 [8086:24c7] (rev 01)
00:1d.7 USB Controller [0c03]: Intel Corporation 82801DB/DBM (ICH4/ICH4-M) USB2 EHCI Controller [8086:24cd] (rev 01)
00:1e.0 PCI bridge [0604]: Intel Corporation 82801 Mobile PCI Bridge [8086:2448] (rev 81)
00:1f.0 ISA bridge [0601]: Intel Corporation 82801DBM (ICH4-M) LPC Interface Bridge [8086:24cc] (rev 01)
00:1f.1 IDE interface [0101]: Intel Corporation 82801DBM (ICH4-M) IDE Controller [8086:24ca] (rev 01)
00:1f.3 SMBus [0c05]: Intel Corporation 82801DB/DBL/DBM (ICH4/ICH4-L/ICH4-M) SMBus Controller [8086:24c3] (rev 01)
00:1f.5 Multimedia audio controller [0401]: Intel Corporation 82801DB/DBL/DBM (ICH4/ICH4-L/ICH4-M) AC'97 Audio Controller [8086:24c5] (rev 01)
00:1f.6 Modem [0703]: Intel Corporation 82801DB/DBL/DBM (ICH4/ICH4-L/ICH4-M) AC'97 Modem Controller [8086:24c6] (rev 01)
01:00.0 VGA compatible controller [0300]: ATI Technologies Inc RV350 [Mobility Radeon 9600 M10] [1002:4e50]
02:00.0 CardBus bridge [0607]: Texas Instruments PCI4520 PC card Cardbus Controller [104c:ac46] (rev 01)
02:00.1 CardBus bridge [0607]: Texas Instruments PCI4520 PC card Cardbus Controller [104c:ac46] (rev 01)
02:01.0 Ethernet controller [0200]: Intel Corporation 82540EP Gigabit Ethernet Controller (Mobile) [8086:101e] (rev 03)
02:02.0 Network controller [0280]: Intel Corporation PRO/Wireless 2200BG [Calexico2] Network Connection [8086:4220] (rev 05)
Comment 1 Martin Steigerwald 2010-04-29 07:35:27 UTC
Created attachment 26178 [details]
First of the mentioned patches from Nigel, version 2 of it
Comment 2 Martin Steigerwald 2010-04-29 07:39:50 UTC
Created attachment 26179 [details]
Second of the mentioned patches from Nigel

Reason for these are Radeon KMS induced problems to free enough low memory pages:
http://lists.tuxonice.net/pipermail/tuxonice-devel/2010-April/006075.html
Comment 3 Martin Steigerwald 2010-04-29 07:49:50 UTC
Created attachment 26180 [details]
tuxonice config for hibernate script

I am using 1.99-1.1 of debian hibernate package for squeeze/sid, without text user interface and with no_console_suspend kernel parameter to gather any messages of the kernel during snapshot cycles.
Comment 4 Nigel Cunningham 2010-04-29 07:53:07 UTC
It's definitely a TuxOnIce issue - a known one. I just haven't managed to find the time to put into find into finding the cause (sorry!)
Comment 5 Martin Steigerwald 2010-07-09 11:13:16 UTC
Now I had this on my ThinkPad T42 with:

martin@deepdance:~> cat /proc/version
Linux version 2.6.34.1-tp23-toi-3.1.1.1-04990-g3a7d1f4 (martin@deepdance) (gcc version 4.4.4 (Debian 4.4.4-5) ) #2 PREEMPT Tue Jul 6 20:27:13 CEST 2010

I didn't see this yet on my ThinkPad T42, and with 2.6.33 I never saw it on my T23 as far as I remember.
Comment 6 Martin Steigerwald 2010-07-09 11:15:59 UTC
Hmm, I might should add, that I tried two more resumes of the same image and the machine always just rebooted after having read 150 MB of the caches.
Comment 7 Martin Steigerwald 2010-07-12 19:10:24 UTC
Created attachment 27078 [details]
happened again with 2.6.34.1 and tuxonice 3.1.1.1, photo of backtrace

It happened again with that kernel. This time I have a photo of the backtrace. Hope it helps. Thanks.
Comment 8 Martin Steigerwald 2010-07-17 08:13:31 UTC
After 3-4 attempts and about 7 days uptime it happened again with my T23. It definately happens more often with 2.6.34 than with 2.6.33.

I now switched compressor to LZF. Maybe that helps.
Comment 9 Martin Steigerwald 2010-07-29 11:49:37 UTC
I switched compression on my ThinkPad T23 from LZO to LZF. And since then I didn't get the error anymore, but with only 5 attempts so far, so I am not sure whether switching to LZF "fixed" it:

deepdance:~> cat /sys/power/tuxonice/debug_info 
TuxOnIce debugging info:
- TuxOnIce core  : 3.1.1.1
- Kernel Version : 2.6.34.1-tp23-toi-3.1.1.1-04990-g3a7d1f4
- Compiler vers. : 4.4
- Attempt number : 5
- Parameters     : 0 667656 0 1 0 0
- Overall expected compression percentage: 0.
- Checksum method is 'md4'.
  0 pages resaved in atomic copy.
- Compressor is 'lzf'.
  Compressed 776593408 bytes into 359897499 (53 percent compression).
- Block I/O active.
- Max outstanding reads 714. Max writes 5.
  Memory_needed: 1024 x (4096 + 200 + 76) = 4476928 bytes.
  Free mem throttle point reached 983.
- Swap Allocator enabled.
  Swap available for image: 229016 pages.
- File Allocator active.
  Storage available for image: 0 pages.
- I/O speed: Write 28 MB/s, Read 33 MB/s.
- Extra pages    : 26 used/500.
- Result         : Succeeded.

I will have this one running for at least 10 or 15 attempts, let's see how it goes.
Comment 10 Martin Steigerwald 2010-07-29 11:52:06 UTC
Pedro has some more information on this bug:

Date  Thu, 29 Jul 2010 02:30:37 +0100
Subject  kcryptd oops when resuming with TuxOnIce with KDB oops afterwards
From  Pedro Ribeiro

I hit a bug when resuming with TuxOnIce. At the middle of a resume, it
says Compress Read -22 and locks up. I caught the stack trace with kdb
and took photos of that.
I'm running 2.6.35-rc6 on a Lenovo T400. I have an encrypted LUKS
partition (aes-cbc-essiv-128) which contains an LVM2 with my root,
swap and home partitions inside.

http://lkml.org/lkml/2010/7/28/478
Comment 11 Martin Steigerwald 2010-08-18 15:25:38 UTC
Today I had this with a Dell Dimension 5100 workstation. I switched this one from lzo to lzf compression as well, as I didn't have another failure on my ThinkPad T23 after the switch.

So it might really be something related to lzo compression in combination with TuxOnIce.
Comment 12 Nigel Cunningham 2010-08-19 04:06:41 UTC
I found the cause yesterday - it was a locking issue in TuxOnIce. I'm not sure why it's triggered more easily with LZO, but LZO isn't the cause. The fix hasn't yet been committed to my git trees, but will be soon.
Comment 13 Nigel Cunningham 2010-09-06 06:10:36 UTC
This should be fixed with the current release (3.2-rc2).
Comment 14 Martin Steigerwald 2010-09-09 16:17:33 UTC
Created attachment 29472 [details]
error still occurs with 2.6.36-rc3 + TOI 3.2-rc1, photo of backtrace

Hi Nigel, unfortunately this still happens with 2.6.36-rc3-tp42-toi-3.2-rc1-vmembase-0-05032-g60140c1-dirty.

In the interest of having a stable kernel I will switch to LZF as so far with LZF it didn't happen to me. In case you have any idea, on what debug information to gather, please tell me. I am on the jump, thus this time I haven't taken time for additional steps.
Comment 15 Zhang Rui 2012-01-18 02:03:35 UTC
It's great that kernel bugzilla is back.

can you please verify if the problem still exists in the latest upstream
kernel?
Comment 16 Martin Steigerwald 2012-01-18 08:29:43 UTC
Thanks for the reminder, Zhang. This is soo old and not even related to upstream code. I just close it. I did not use TuxOnIce for the last time, I think I like to use it again, but then I can have a look again.

Note You need to log in before you can comment on or make changes to this bug.