Bug 10175

Summary: Can't resume kernel after a "suspend to ram".
Product: IO/Storage Reporter: Ilya (dik_again)
Component: Serial ATAAssignee: Tejun Heo (htejun)
Status: CLOSED OBSOLETE    
Severity: blocking CC: alan
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 2.6.22.9 Subsystem:
Regression: No Bisected commit-id:
Attachments: kernel boot log

Description Ilya 2008-03-05 11:46:06 UTC
Latest working kernel version:
Earliest failing kernel version: 2.6.22.9
Distribution:
Hardware Environment:VT8237, A7V600 Motherboard, DVDRW drive Samsung SH-S203B
Software Environment:
Problem Description: kernel don't go back after suspend to ram mode.

Steps to reproduce: go to "suspend to ram" and then go back.
try to mount the drive with no disk. mount lock's.

ata2.00: qc timeout (cmd 0xa0)
ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 frozen
ata2.00: cmd a0/00:00:00:00:20/00:00:00:00:00/a0 tag 0 cdb 0x0 data 0
         res 51/20:03:00:00:00/00:00:00:00:00/a0 Emask 0x5 (timeout)
ata2: port is slow to respond, please be patient (Status 0xd0)
ata2: soft resetting port
ata2.00: qc timeout (cmd 0xa1)
ata2.00: failed to IDENTIFY (I/O error, err_mask=0x4)
ata2.00: revalidation failed (errno=-5)
ata2: failed to recover some devices, retrying in 5 secs
ata2: port is slow to respond, please be patient (Status 0x80)
ata2: soft resetting port
ata2.00: failed to IDENTIFY (device reports invalid type, err_mask=0x0)
ata2.00: revalidation failed (errno=-22)
ata2.00: limiting speed to UDMA/100:PIO3
ata2: failed to recover some devices, retrying in 5 secs
ata2: soft resetting port
ata2.00: configured for UDMA/100
ata2: EH complete
ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 frozen
ata2.00: cmd a0/00:00:00:00:20/00:00:00:00:00/a0 tag 0 cdb 0x0 data 0
         res 40/00:03:00:00:00/00:00:00:00:00/a0 Emask 0x4 (timeout)
ata2: port is slow to respond, please be patient (Status 0xd0)
ata2: soft resetting port

But if there were a dvd disk kernel can't start after a "suspend to ram".


tail /var/log/kernel/errors:

Mar  5 10:22:13 localhost kernel: pci_set_power_state(): 0000:00:00.0: state=3, current state=5
Mar  5 10:22:13 localhost kernel: pnp: Failed to activate device 00:0c.
Mar  5 10:22:13 localhost kernel: pnp: Failed to activate device 00:0d.
Mar  5 10:30:38 localhost kernel: FAT: utf8 is not a recommended IO charset for FAT filesystems, filesystem will be case sen
sitive!
Mar  5 10:30:39 localhost kernel: codec_read: codec 0 is not valid [0xfe0000]
Mar  5 10:30:39 localhost last message repeated 3 times

lspci

00:00.0 Host bridge: VIA Technologies, Inc. VT8377 [KT400/KT600 AGP] Host Bridge (rev 80)
00:01.0 PCI bridge: VIA Technologies, Inc. VT8237 PCI Bridge
00:09.0 Ethernet controller: 3Com Corporation 3c940 10/100/1000Base-T [Marvell] (rev 12)
00:0a.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL-8139/8139C/8139C+ (rev 10)
00:0b.0 Multimedia audio controller: Creative Labs SB Live! EMU10k1 (rev 07)
00:0b.1 Input device controller: Creative Labs SB Live! Game Port (rev 07)
00:0e.0 Multimedia video controller: Brooktree Corporation Bt878 Video Capture (rev 11)
00:0e.1 Multimedia controller: Brooktree Corporation Bt878 Audio Capture (rev 11)
00:0f.0 RAID bus controller: VIA Technologies, Inc. VIA VT6420 SATA RAID Controller (rev 80)
00:0f.1 IDE interface: VIA Technologies, Inc. VT82C586A/B/VT82C686/A/B/VT823x/A/C PIPC Bus Master IDE (rev 06)
00:10.0 USB Controller: VIA Technologies, Inc. VT82xxxxx UHCI USB 1.1 Controller (rev 81)
00:10.1 USB Controller: VIA Technologies, Inc. VT82xxxxx UHCI USB 1.1 Controller (rev 81)
00:10.2 USB Controller: VIA Technologies, Inc. VT82xxxxx UHCI USB 1.1 Controller (rev 81)
00:10.3 USB Controller: VIA Technologies, Inc. VT82xxxxx UHCI USB 1.1 Controller (rev 81)
00:10.4 USB Controller: VIA Technologies, Inc. USB 2.0 (rev 86)
00:11.0 ISA bridge: VIA Technologies, Inc. VT8237 ISA bridge [KT600/K8T800/K8T890 South]
00:11.5 Multimedia audio controller: VIA Technologies, Inc. VT8233/A/8235/8237 AC97 Audio Controller (rev 60)
00:11.6 Communication controller: VIA Technologies, Inc. AC'97 Modem Controller (rev 80)
01:00.0 VGA compatible controller: nVidia Corporation NV25 [GeForce4 Ti 4200] (rev a3)
Comment 1 Tejun Heo 2008-03-05 19:48:35 UTC
Can you please post the full result of 'dmesg' after failure and the result of 'lspci -nn'?
Comment 2 Ilya 2008-03-07 06:33:42 UTC
(In reply to comment #1)
> Can you please post the full result of 'dmesg' after failure and the result
> of
> 'lspci -nn

00:00.0 Host bridge [0600]: VIA Technologies, Inc. VT8377 [KT400/KT600 AGP] Host Bridge [1106:3189] (rev 80)
00:01.0 PCI bridge [0604]: VIA Technologies, Inc. VT8237 PCI Bridge [1106:b198]
00:09.0 Ethernet controller [0200]: 3Com Corporation 3c940 10/100/1000Base-T [Marvell] [10b7:1700] (rev 12)
00:0a.0 Ethernet controller [0200]: Realtek Semiconductor Co., Ltd. RTL-8139/8139C/8139C+ [10ec:8139] (rev 10)
00:0b.0 Multimedia audio controller [0401]: Creative Labs SB Live! EMU10k1 [1102:0002] (rev 07)
00:0b.1 Input device controller [0980]: Creative Labs SB Live! Game Port [1102:7002] (rev 07)
00:0e.0 Multimedia video controller [0400]: Brooktree Corporation Bt878 Video Capture [109e:036e] (rev 11)
00:0e.1 Multimedia controller [0480]: Brooktree Corporation Bt878 Audio Capture [109e:0878] (rev 11)
00:0f.0 RAID bus controller [0104]: VIA Technologies, Inc. VIA VT6420 SATA RAID Controller [1106:3149] (rev 80)
00:0f.1 IDE interface [0101]: VIA Technologies, Inc. VT82C586A/B/VT82C686/A/B/VT823x/A/C PIPC Bus Master IDE [1106:0571] (rev 06)
00:10.0 USB Controller [0c03]: VIA Technologies, Inc. VT82xxxxx UHCI USB 1.1 Controller [1106:3038] (rev 81)
00:10.1 USB Controller [0c03]: VIA Technologies, Inc. VT82xxxxx UHCI USB 1.1 Controller [1106:3038] (rev 81)
00:10.2 USB Controller [0c03]: VIA Technologies, Inc. VT82xxxxx UHCI USB 1.1 Controller [1106:3038] (rev 81)
00:10.3 USB Controller [0c03]: VIA Technologies, Inc. VT82xxxxx UHCI USB 1.1 Controller [1106:3038] (rev 81)
00:10.4 USB Controller [0c03]: VIA Technologies, Inc. USB 2.0 [1106:3104] (rev 86)
00:11.0 ISA bridge [0601]: VIA Technologies, Inc. VT8237 ISA bridge [KT600/K8T800/K8T890 South] [1106:3227]
00:11.5 Multimedia audio controller [0401]: VIA Technologies, Inc. VT8233/A/8235/8237 AC97 Audio Controller [1106:3059] (rev 60)
00:11.6 Communication controller [0780]: VIA Technologies, Inc. AC'97 Modem Controller [1106:3068] (rev 80)
01:00.0 VGA compatible controller [0300]: nVidia Corporation NV25 [GeForce4 Ti 4200] [10de:0253] (rev a3)

And that was the full result of dmesg ("bug"'s lines). 

But I thinked, that it's easy the bug to reproduce..... It is not so easy.
Today reads the drive the disk after suspend too.
I try to reproduce the bug with disk in the drive before suspend.

And that comes periodically in dmesg (sr0 is that drive):

Buffer I/O error on device sr0, logical block 0
Buffer I/O error on device sr0, logical block 1
Buffer I/O error on device sr0, logical block 2
Buffer I/O error on device sr0, logical block 3
Buffer I/O error on device sr0, logical block 0
Buffer I/O error on device sr0, logical block 8
Buffer I/O error on device sr0, logical block 8
Buffer I/O error on device sr0, logical block 16
Buffer I/O error on device sr0, logical block 16
Comment 3 Tejun Heo 2008-03-08 21:36:56 UTC
Hmmm... that was all of dmesg, weird.  Can you please post /var/log/boot.msg then?  I wanna see kernel boot messages.
Comment 4 Ilya 2008-03-14 00:25:02 UTC
Created attachment 15260 [details]
kernel boot log
Comment 5 Tejun Heo 2008-03-17 23:51:01 UTC
Can you please do the following.

* suspend and resume (sr0 becomes dead)
* disconnect power from the drive
* reconnect power to the drive
* "echo - - - > /sys/class/scsi_host/host1/scan"
* kernel will go and try to probe the port again, wait for it to finish and post the resulting kernel log here.
Comment 6 Ilya 2008-03-22 02:39:05 UTC
(In reply to comment #5)
> Can you please do the following.
> 
> * suspend and resume (sr0 becomes dead)

done.
But it becomes dead, only by a long pause between suspend and resume, but I'm not sure. I tryed it 3 times and was all ok. But then I have suspend for a night. And now, when I resumed the PC, sr0 became dead.

> * disconnect power from the drive
> * reconnect power to the drive
> * "echo - - - > /sys/class/scsi_host/host1/scan"
> * kernel will go and try to probe the port again, wait for it to finish and
> post the resulting kernel log here.
> 

After that it works ok. hier is the log. (Buffer I/O error....... hier have I tryed to mount a drive).

Buffer I/O error on device sr0, logical block 1
Buffer I/O error on device sr0, logical block 2
Buffer I/O error on device sr0, logical block 3
Buffer I/O error on device sr0, logical block 0
Buffer I/O error on device sr0, logical block 8
Buffer I/O error on device sr0, logical block 8
Buffer I/O error on device sr0, logical block 16
Buffer I/O error on device sr0, logical block 16
FAT: utf8 is not a recommended IO charset for FAT filesystems, filesystem will be case sensitive!
FAT: unable to read boot sector
Buffer I/O error on device sr0, logical block 0
Buffer I/O error on device sr0, logical block 1
Buffer I/O error on device sr0, logical block 2
Buffer I/O error on device sr0, logical block 3
FAT: utf8 is not a recommended IO charset for FAT filesystems, filesystem will be case sensitive!
FAT: unable to read boot sector
printk: 5 messages suppressed.
Buffer I/O error on device sr0, logical block 0
Buffer I/O error on device sr0, logical block 1
Buffer I/O error on device sr0, logical block 2
Buffer I/O error on device sr0, logical block 3
Buffer I/O error on device sr0, logical block 0
Buffer I/O error on device sr0, logical block 8
Buffer I/O error on device sr0, logical block 8
Buffer I/O error on device sr0, logical block 16
Buffer I/O error on device sr0, logical block 16
FAT: utf8 is not a recommended IO charset for FAT filesystems, filesystem will be case sensitive!
FAT: unable to read boot sector
ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 frozen
ata1.00: cmd ca/00:08:9e:83:be/00:00:00:00:00/e1 tag 0 cdb 0x0 data 4096 out
         res 40/00:00:01:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout)
ata1: soft resetting port
ata1.00: configured for UDMA/133
ata1: EH complete
sd 0:0:0:0: [sda] 156312576 512-byte hardware sectors (80032 MB)
sd 0:0:0:0: [sda] Write Protect is off
sd 0:0:0:0: [sda] Mode Sense: 00 3a 00 00
sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
ata2: soft resetting port
ata2.00: ATAPI: TSSTcorp CDDVDW SH-S203B, SB00, max UDMA/100
ata2.00: configured for UDMA/100
ata2: EH complete
ata2.00: detaching (SCSI 1:0:0:0)
scsi 1:0:0:0: CD-ROM            TSSTcorp CDDVDW SH-S203B  SB00 PQ: 0 ANSI: 5
sr0: scsi3-mmc drive: 48x/48x writer dvd-ram cd/rw xa/form2 cdda tray
sr 1:0:0:0: Attached scsi CD-ROM sr0
sr 1:0:0:0: Attached scsi generic sg1 type 5
Comment 7 Tejun Heo 2008-03-23 00:28:48 UTC
That means that the controller is alive and kicking.  It's probably that the drive requires stronger persuasion than SRST but SCR access on vt6420 is rather unreliable and it hardresets only on the first try. :-(

Hmmmmm......
Comment 8 Ilya 2008-03-23 02:13:52 UTC
(In reply to comment #7)
> That means that the controller is alive and kicking.  It's probably that the

Of course the controller is alive. My harddrive lives on that. There is only one controller on motherboard, or not?

> drive requires stronger persuasion than SRST but SCR access on vt6420 is
> rather
> unreliable and it hardresets only on the first try. :-(
> 
> Hmmmmm......
> 

The suspend mode is S3 (suspend to ram, with +5V online). Can this be a hardware problem? (when the drive have +12V offline, and +5V online it goes crazy :))
Comment 9 Tejun Heo 2008-03-23 02:24:51 UTC
(In reply to comment #8)
> (In reply to comment #7)
> > That means that the controller is alive and kicking.  It's probably that
> the
> 
> Of course the controller is alive. My harddrive lives on that. There is only
> one controller on motherboard, or not?

Yeah, there's only one controller.  I meant that the port was working properly.

> > drive requires stronger persuasion than SRST but SCR access on vt6420 is
> rather
> > unreliable and it hardresets only on the first try. :-(
> > 
> > Hmmmmm......
> > 
> 
> The suspend mode is S3 (suspend to ram, with +5V online). Can this be a
> hardware problem? (when the drive have +12V offline, and +5V online it goes
> crazy :))

Yeah, it seems like a hardware problem, at least partially.  Your cdrom drive can't wake up from S3 only with the current reset sequence.  I think it might work properly if the driver can perform hardreset properly but I've seen optical drives which lock up after certain things (entering SATA link power saving mode) and require power removal and re-apply.

It's a bit difficult to work on vt6420 reset sequence because it covers large number of controllers of different generations and some of them tend to be quite cranky.  Plus, I don't have access to docs.  :-(