Bug 11735

Summary: delkin_cb fails after resume
Product: IO/Storage Reporter: bumble.bee
Component: IDEAssignee: Bartlomiej Zolnierkiewicz (bzolnier)
Status: CLOSED CODE_FIX    
Severity: normal CC: alan
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 2.6.27-rc9 Subsystem:
Regression: --- Bisected commit-id:
Attachments: add PM support to delkin_cb
add PM support to delkin_cb (with minor fix)

Description bumble.bee 2008-10-10 13:45:55 UTC
Distribution:
gentoo
Hardware Environment:
x86 (Dell laptop, yenta cardbus), "02:00.0 Mass storage controller [0180]: Workbit Corporation Device [1145:f021] (rev 01)"
Problem Description:
The delkin_cb driver works with ~3 mb/s performance, top shows the CPU in 98% wait state with heavy I/O. If you suspend to ram or disk, the system comes up but any further I/O to a CF card in the adapter will not complete, top shows ~100% wait state but commands to read from the card do not complete. pata_ninja32 not tested because it performs much worse than delkin_cb (~1 mb/s).
Steps to reproduce: 
1. boot, mount a CF card through the adapter (or have root on the CF card, as I do)
2. echo mem > /sys/power/state
3. resume from sleep
4. do I/O from the card, say "ls /mnt/card"
5. <crickets chirp> ... top shows 100% wait state
Comment 1 Alan 2008-10-10 13:52:46 UTC
Well delkin_cb doesn't really support suspend/resume so no suprise whatsoever. pata_ninja32 does so it might be more useful to investigate why the delkin driver is slower for you. < 1MB/sec sounds like the Delkin is selecting PIO0 not PIO4
Comment 2 bumble.bee 2008-10-10 17:20:51 UTC
Hi Alan, I'd love to do just that. Maybe if this thing can perform in PIO mode it won't matter that there's no DMA.
I've built a kernel and initrd for both delkin_cb and pata_ninja32.

# hdparm -t /dev/hde
/dev/hde:
 Timing buffered disk reads:   10 MB in  3.35 seconds =   2.98 MB/sec
# hdparm -t /dev/sdb
/dev/sdb:
 Timing buffered disk reads:    6 MB in  3.27 seconds =   1.84 MB/sec

The dmesg seems to indicate PIO4:
pccard: CardBus card inserted into slot 0
PCI: 0000:02:00.0 reg 10 io port: [0, 1f]
PCI: 0000:02:00.0 reg 14 32bit mmio: [0, fff]
pata_ninja32 0000:02:00.0: enabling device (0000 -> 0003)
pata_ninja32 0000:02:00.0: PCI INT A -> Link[LNKD] -> GSI 11 (level, low) -> IRQ 11
pata_ninja32 0000:02:00.0: setting latency timer to 64
scsi2 : pata_ninja32
ata3: PATA max PIO4 irq 11
ata3.00: ATA-0: TRANSCEND, 20070831, max UDMA/66
ata3.00: 31326208 sectors, multi 0: LBA 
ata3.00: configured for PIO4
ata3.00: configured for PIO4
ata3: EH complete
scsi 2:0:0:0: Direct-Access     ATA      TRANSCEND        2007 PQ: 0 ANSI: 5

I'm afraid pata_ninja32 doesn't do much better with suspend.

After resuming, top shows me kacpi_notify using about 50% cpu, kacpid using ~42%, scsi_eh_1 about 0.3% and the rest is in "wa" state. Reads from the disk still don't return.
Comment 3 Alan 2008-10-10 17:30:05 UTC
What is logged in "dmesg" when this is occuring ?

The kacpi usage also seems bizarrely high but I assume you don't see that with then card removed when resuming ?
Comment 4 bumble.bee 2008-10-10 19:01:37 UTC
To clarify, in addition to the CF card this laptop has a "regular" HDD. I had switched to an all-libata setup for both the HDD and the ninja32, which is when the kacpid cpu hog shows up.
I've now also built a kernel with the old ata driver for the HDD and only the ninja32 in libata. That shows the same behavior as the delkin_cb driver (100% wait state on resume). With that kernel, I've managed to capture this in the syslog after resume:

ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
ala1.00: cmd 30/00:30:90:12:ea/00:00:00:00:00/e0 tag 0 pio 24576 out
res 40/00:00:00:00:00/00:00:00:00:00/e0 Emask 0x4 (timeout)
ata1.00: status: { DRDY }
ata1: soft resetting link
ata1.00: failed to IDENTIFY (INIT_DEV_PARAMS failed, err_mask=0x80)
ata1.00: revalidation failed (errno=-5)
ata1: soft resetting link
ata1.00: failed to IDENTIFY (INIT_DEV_PARAMS failed, err_mask=0x80)
ata1.00: revalidation failed (errno=-5)
ata1: soft resetting link
ata1.00: failed to IDENTIFY (INIT_DEV_PARAMS failed, err_mask=0x80)
ata1.00: revalidation failed (errno=-5)
ata1.00: disabled
ata1.00: EH complete
sd 0:0:0:0: [sda] Result: hostbyte=0x04 driverbyte=0x00
end_request: I/O error, dev sda, sector 15340176
sd 0:0:0:0: [sda] Result: hostbyte=0x04 driverbyte=0x00
end_request: I/O error, dev sda, sector 18719176
Buffer I/O error on device dm-1, logical block 2323832
lost page write due to I/O error on dm-1

It's dm-1 because the filesystem on the card is on an encrypted block device. After this error the system goes back to 100% idle, and if I try to access the filesystem I get I/O errors from ls.

To answer your question, I've not had problems with resuming on this machine before (with only the HDD, using the old IDE driver). I can't really test resume without the card right now, since it holds the root filesystem, but I could dd it back to the HDD if that's necessary.
Comment 5 Alan 2008-10-15 03:29:39 UTC
Ok that is hopeful. It is getting stuck trying to transfer data and then fails to recover. I've got a general test patch about to appear for cleaning up those cases.
Comment 6 Bartlomiej Zolnierkiewicz 2008-10-15 14:11:25 UTC
Created attachment 18330 [details]
add PM support to delkin_cb

Please try this patch against current Linus' tree (should also apply 2.6.28-rc1 once it is released).
Comment 7 Bartlomiej Zolnierkiewicz 2008-10-15 14:13:51 UTC
Created attachment 18331 [details]
add PM support to delkin_cb (with minor fix)
Comment 8 bumble.bee 2008-10-15 17:36:25 UTC
Holy smokes, I had no idea you guys served a la carte! I remeber seeing a mail from Mark Lord saying that he had no further interest in delkin_cb, thanks for maintaining it Bart.
I applied your patch to Linus' git, it seems to resume just fine. It doesn't hang in a wait state, and I/O that I start before resuming completes after it.

Seeing this in dmesg:
Delkin-ASKA-Workbit Cardbus IDE 0000:02:00.0: PCI INT A disabled
ACPI handle has no context!

Then, later:
Delkin-ASKA-Workbit Cardbus IDE 0000:02:00.0: enabling device (0000 -> 0003)
Delkin-ASKA-Workbit Cardbus IDE 0000:02:00.0: PCI INT A -> Link[LNKD] -> GSI 11 (level, low) -> IRQ 11
Delkin-ASKA-Workbit Cardbus IDE 0000:02:00.0: restoring config space at offset 0xf (was 0x4040100, writing 0x404010b)
Delkin-ASKA-Workbit Cardbus IDE 0000:02:00.0: restoring config space at offset 0x5 (was 0x0, writing 0x90000000)
Delkin-ASKA-Workbit Cardbus IDE 0000:02:00.0: restoring config space at offset 0x4 (was 0x1, writing 0xe001)
Delkin-ASKA-Workbit Cardbus IDE 0000:02:00.0: setting latency timer to 64

I'm awaiting Alan's general patch to see if it will help with libata.
It'd be great if it could be made to perform at >PIO0 speeds (but surprisingly even as-is it's not that bad as a system drive).

Thanks!
Comment 9 Alan 2008-10-16 06:14:26 UTC
Nice.. and I think that explains why the standard PM recovery in libata isn't enough so I'll just go fix up pata_ninja32 to match
Comment 10 Bartlomiej Zolnierkiewicz 2008-10-29 11:19:12 UTC
I'm closing this bug since fixes are Linus' tree now:

commit 8c061a40c293660793dab83b75223e6eaaa04f8b (delkin_cb)

commit e7c0d217cdaa837d30bc265eddac4d176969fd68 (pata_ninja32)