Bug 8627

Summary: randomly a stuttering in the music (caused by HSM violation?)
Product: IO/Storage Reporter: Bjoern Olausson (lkmlist)
Component: Serial ATAAssignee: Tejun Heo (htejun)
Status: CLOSED CODE_FIX    
Severity: normal CC: albertcc, htejun, zackki13597
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 2.6.21.5 Subsystem:
Regression: --- Bisected commit-id:
Attachments: Patch for not stopping DMA if the device is busy
Patch for not stopping DMA if the device is busy
NCQ-blacklist.patch
Patch to limit ATAPI DMA to R/W only for Plextor PX-130A"
libata-core.c_bug_8627.diff
proper libata-core.c_bug_8627.diff
my libata.h

Description Bjoern Olausson 2007-06-14 02:50:31 UTC
Most recent kernel where this bug did not occur: 
Distribution: Gentoo
Hardware Environment:
00:00.0 Host bridge: Intel Corporation 82975X Memory Controller Hub (rev c0)
00:01.0 PCI bridge: Intel Corporation 82975X PCI Express Root Port (rev c0)
00:1b.0 Audio device: Intel Corporation 82801G (ICH7 Family) High Definition Audio Controller (rev 01)
00:1c.0 PCI bridge: Intel Corporation 82801G (ICH7 Family) PCI Express Port 1 (rev 01)
00:1c.3 PCI bridge: Intel Corporation 82801G (ICH7 Family) PCI Express Port 4 (rev 01)
00:1c.5 PCI bridge: Intel Corporation 82801GR/GH/GHM (ICH7 Family) PCI Express Port 6 (rev 01)
00:1d.0 USB Controller: Intel Corporation 82801G (ICH7 Family) USB UHCI Controller #1 (rev 01)
00:1d.1 USB Controller: Intel Corporation 82801G (ICH7 Family) USB UHCI Controller #2 (rev 01)
00:1d.2 USB Controller: Intel Corporation 82801G (ICH7 Family) USB UHCI Controller #3 (rev 01)
00:1d.3 USB Controller: Intel Corporation 82801G (ICH7 Family) USB UHCI Controller #4 (rev 01)
00:1d.7 USB Controller: Intel Corporation 82801G (ICH7 Family) USB2 EHCI Controller (rev 01)
00:1e.0 PCI bridge: Intel Corporation 82801 PCI Bridge (rev e1)
00:1f.0 ISA bridge: Intel Corporation 82801GB/GR (ICH7 Family) LPC Interface Bridge (rev 01)
00:1f.1 IDE interface: Intel Corporation 82801G (ICH7 Family) IDE Controller (rev 01)
00:1f.2 SATA controller: Intel Corporation 82801GR/GH (ICH7 Family) SATA AHCI Controller (rev 01)
00:1f.3 SMBus: Intel Corporation 82801G (ICH7 Family) SMBus Controller (rev 01)
01:00.0 Multimedia audio controller: Creative Labs SB X-Fi
01:01.0 Multimedia video controller: Brooktree Corporation Bt878 Video Capture (rev 11)
01:01.1 Multimedia controller: Brooktree Corporation Bt878 Audio Capture (rev 11)
01:02.0 Ethernet controller: Atheros Communications, Inc. AR5212 802.11abg NIC (rev 01)
01:03.0 FireWire (IEEE 1394): Texas Instruments TSB43AB22/A IEEE-1394a-2000 Controller (PHY/Link)
02:00.0 SATA controller: JMicron Technologies, Inc. JMicron 20360/20363 AHCI Controller (rev 02)
02:00.1 IDE interface: JMicron Technologies, Inc. JMicron 20360/20363 AHCI Controller (rev 02)
03:00.0 Ethernet controller: Marvell Technology Group Ltd. 88E8053 PCI-E Gigabit Ethernet Controller (rev 20)
05:00.0 VGA compatible controller: nVidia Corporation G70 [GeForce 7600 GT] (rev a1)

Software Environment:
Portage 2.1.2.7 (default-linux/amd64/2006.1, gcc-4.1.2, glibc-2.5-r3, 2.6.21.5 x86_64)
=================================================================
System uname: 2.6.21.5 x86_64 Intel(R) Core(TM)2 CPU          6600  @ 2.40GHz
Gentoo Base System release 1.12.9
Timestamp of tree: Wed, 13 Jun 2007 12:20:01 +0000
distcc 2.18.3 x86_64-pc-linux-gnu (protocols 1 and 2) (default port 3632) [disabled]
ccache version 2.4 [enabled]
dev-java/java-config: 1.3.7, 2.0.32
dev-lang/python:     2.4.4-r4
dev-python/pycrypto: 2.0.1-r5
dev-util/ccache:     2.4-r7
sys-apps/sandbox:    1.2.17
sys-devel/autoconf:  2.13, 2.61
sys-devel/automake:  1.4_p6, 1.5, 1.6.3, 1.7.9-r1, 1.8.5-r3, 1.9.6-r2, 1.10
sys-devel/binutils:  2.16.1-r3
sys-devel/gcc-config: 1.3.16
sys-devel/libtool:   1.5.22
virtual/os-headers:  2.6.17-r2
ACCEPT_KEYWORDS="amd64"
AUTOCLEAN="yes"
CBUILD="x86_64-pc-linux-gnu"
CFLAGS="-march=nocona -O2 -pipe -fomit-frame-pointer"
CHOST="x86_64-pc-linux-gnu"
CONFIG_PROTECT="/etc /usr/kde/3.5/env /usr/kde/3.5/share/config /usr/kde/3.5/shutdown /usr/shar
e/X11/xkb /usr/share/config"
CONFIG_PROTECT_MASK="/etc/env.d /etc/env.d/java/ /etc/gconf /etc/revdep-rebuild /etc/terminfo /
etc/texmf/web2c"
CXXFLAGS="-march=nocona -O2 -pipe -fomit-frame-pointer"
DISTDIR="/usr/portage/distfiles"
FEATURES="ccache distlocks metadata-transfer parallel-fetch sandbox sfperms strict"
GENTOO_MIRRORS="ftp://ftp.belnet.be/mirror/rsync.gentoo.org/gentoo/ ftp://ftp.easynet.nl/mirror
/gentoo/ http://distfiles.gentoo.org http://www.ibiblio.org/pub/Linux/distributions/gentoo"
LANG="de_DE.utf8"
LC_ALL="de_DE.utf8"
LINGUAS="de sv"
MAKEOPTS="-j3"
PKGDIR="/usr/portage/packages"
PORTAGE_RSYNC_OPTS="--recursive --links --safe-links --perms --times --compress --force --whole
-file --delete --delete-after --stats --timeout=180 --exclude=/distfiles --exclude=/local --exc
lude=/packages --filter=H_**/files/digest-*"
PORTAGE_TMPDIR="/var/tmp"
PORTDIR="/usr/portage"
PORTDIR_OVERLAY="/usr/portage/local/layman/xeffects /usr/portage/local/layman/sunrise /usr/loca
l/portage"
SYNC="rsync://rsync.europe.gentoo.org/gentoo-portage"
USE="X a52 aac aalib acpi additions aiglx alsa amd64 ares asf bash-completion berkdb bitmap-fon
ts bittorrent bluetooth bzip2 cairo cdparanoia cli connectionstatus cpudetection cracklib crypt
 css cups curl daap dbus dga divx4linux dri dts dv dvd dvdr dvdread edl emovix encode exif fam
fbcon ffmpeg flac foomaticdb fortran gdbm gif gimp glitz gnutls gpm gtk gtk2 hal highlight hist
ory iconv imagemagick isdnlog java jpeg jpeg2k kde kqemu libg++ lirc live lm_sensors logitech-m
ouse lzo mad madwifi matroska metalink midi mjpeg modplug mp3 mpeg mplayer mudflap musepack mus
icbrainz ncurses network nfs nls nptl nptlonly nsplugin nvidia ogg openal opengl openmp pam pam                              _console pcre pda pdf perl png ppds pppd python qt3 qt3support qt4 quicktime readline reflectio                              n rtc samba scanner sdk sdl sensord session sndfile spell spl ssl svg tcltk tcpd theora tiff tk                               transcode transparency truetype truetype-fonts type1-fonts unicode usb userlocales utempter v4                              l v4l2 vcd vditool vidcap vorbis wxwindows x264 xine xinerama xml xorg xvid yahoo zlib" ALSA_CA                              RDS="hda-intel intel8x0" ALSA_PCM_PLUGINS="adpcm alaw asym copy dmix dshare dsnoop empty extplu                              g file hooks iec958 ioplug ladspa lfloat linear meter mulaw multi null plug rate route share sh                              m softvol" ELIBC="glibc" INPUT_DEVICES="keyboard mouse evdev" KERNEL="linux" LCD_DEVICES="bayra                              d cfontz cfontz633 glk hd44780 lb216 lcdm001 mtxorb ncurses text" LINGUAS="de sv" LIRC_DEVICES=                              "asusdh" USERLAND="GNU" VIDEO_CARDS="nvidia vesa fbdev nv"
Unset:  CTARGET, EMERGE_DEFAULT_OPTS, INSTALL_MASK, LDFLAGS, PORTAGE_COMPRESS, PORTAGE_COMPRESS                              _FLAGS, PORTAGE_RSYNC_EXTRA_OPTS

Problem Description:
Every time I get this "HSM violation" the pc is stuttering for a second.
I was listening to music and noticed randomly a stuttering in the music so I 
checked dmesg everytime I noticed such a stutter. And yes, everytime the music 
stutters I got this:

ata9.01: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2
ata9.01: (BMDMA stat 0x4)
ata9.01: cmd a0/01:00:00:00:00/00:00:00:00:00/b0 tag 0 cdb 0x4a data 8 in
         res 00/00:00:00:00:00/00:00:00:00:00/00 Emask 0x2 (HSM violation)
ata9: soft resetting port
ata9.00: configured for UDMA/33
ata9.01: configured for UDMA/100
ata9: EH complete

Steps to reproduce:
Listen to music and just wait....
Comment 1 Albert Lee 2007-06-14 03:15:32 UTC
This is a minor problem found when working on bug 8259.

More description of the problem:
The problem is related to two Plextor drives connected to the PATA port of JMicron controller. The port is driven by pata_jmicron. Sometimes the drive has HSM violation and caused noticeable jitter during music play.

Bjoern has collected the detailed libata trace. It seems the HSM violation only happens on the slave drive (Px-130a) when doing the 0x4a (GET_EVENT_STATUS_NOTIFICATION) command.

The HSM violation seems to be caused by the drive interrupts when the ATAPI DMA is still on going. 

Currently Bjoern is helping to narrow down the problem by removing the medium from the master drive (px-708a) and place it to the slave drive (px-130a).

Hopefully we can know if this is the problem of the JMircon controller or the Plextor px-130a drive...


======================
(transaction with HSM violation)

Jun  8 10:53:09 freax ata_scsi_dump_cdb: CDB (9:0,1,0) 4a 01 00 00 10 00 00 00
08
Jun  8 10:53:09 freax ata_scsi_translate: ENTER
Jun  8 10:53:09 freax ata_sg_setup: ENTER, ata9
Jun  8 10:53:09 freax ata_sg_setup: 1 sg elements mapped
Jun  8 10:53:09 freax ata_fill_sg: PRD[0] = (0x54B8A000, 0x8)
Jun  8 10:53:09 freax ata9: ata_dev_select: ENTER, device 1, wait 1
Jun  8 10:53:09 freax ata_tf_load: feat 0x1 nsect 0x0 lba 0x0 0x0 0x0
Jun  8 10:53:09 freax ata_tf_load: device 0xB0
Jun  8 10:53:09 freax ata_exec_command: ata9: cmd 0xA0
Jun  8 10:53:09 freax ata_scsi_translate: EXIT
Jun  8 10:53:09 freax ata_host_intr: ata9: protocol 7 task_state 1
Jun  8 10:53:09 freax ahci_interrupt: ENTER
Jun  8 10:53:09 freax ata_hsm_move: ata9: protocol 7 task_state 1 (dev_stat
0x58)
Jun  8 10:53:09 freax atapi_send_cdb: send cdb
Jun  8 10:53:09 freax ahci_interrupt: ENTER
Jun  8 10:53:09 freax ata_host_intr: ata9: protocol 7 task_state 3
Jun  8 10:53:09 freax ata_host_intr: ata9: host_stat 0x5
Jun  8 10:53:09 freax ahci_interrupt: ENTER
Jun  8 10:53:09 freax ahci_interrupt: ENTER
Jun  8 10:53:09 freax ata_host_intr: ata9: protocol 7 task_state 3
Jun  8 10:53:09 freax ata_host_intr: ata9: host_stat 0x4
Jun  8 10:53:09 freax ata_hsm_move: ata9: protocol 7 task_state 3 (dev_stat
0x0)
Jun  8 10:53:09 freax ata_hsm_move: ata9: protocol 7 task_state 4 (dev_stat
0x0)
Jun  8 10:53:09 freax ata_scsi_timed_out: ENTER
Jun  8 10:53:09 freax ata_scsi_timed_out: EXIT, ret=0
Jun  8 10:53:09 freax ata_scsi_error: ENTER
Jun  8 10:53:09 freax ata_port_flush_task: ENTER
Jun  8 10:53:09 freax ahci_interrupt: ENTER
Jun  8 10:53:09 freax ata_port_flush_task: flush #1
Jun  8 10:53:09 freax ata9: ata_port_flush_task: flush #2
Jun  8 10:53:09 freax ata9: ata_port_flush_task: EXIT
Jun  8 10:53:09 freax ata_eh_autopsy: ENTER
Jun  8 10:53:09 freax ata_eh_autopsy: EXIT
Jun  8 10:53:09 freax ata9.01: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2
Jun  8 10:53:09 freax ata9.01: (BMDMA stat 0x4)
Jun  8 10:53:09 freax ata9.01: cmd a0/01:00:00:00:00/00:00:00:00:00/b0 tag 0
cdb 0x4a data 8 in
Jun  8 10:53:09 freax res 00/00:00:00:00:00/00:00:00:00:00/00 Emask 0x2 (HSM
violation)
Jun  8 10:53:09 freax ata_eh_recover: ENTER
Jun  8 10:53:09 freax ata_eh_prep_resume: ENTER
Jun  8 10:53:09 freax ata_eh_prep_resume: EXIT
Jun  8 10:53:09 freax __ata_port_freeze: ata9 port frozen

====================
(normal transaction)

Jun  8 10:53:12 freax ata_scsi_dump_cdb: CDB (9:0,1,0) 4a 01 00 00 10 00 00 00
08
Jun  8 10:53:12 freax ata_scsi_translate: ENTER
Jun  8 10:53:12 freax ata_sg_setup: ENTER, ata9
Jun  8 10:53:12 freax ata_sg_setup: 1 sg elements mapped
Jun  8 10:53:12 freax ata_fill_sg: PRD[0] = (0x5430C000, 0x8)
Jun  8 10:53:12 freax ata9: ata_dev_select: ENTER, device 1, wait 1
Jun  8 10:53:12 freax ata_tf_load: feat 0x1 nsect 0x0 lba 0x0 0x0 0x0
Jun  8 10:53:12 freax ata_tf_load: device 0xB0
Jun  8 10:53:12 freax ata_exec_command: ata9: cmd 0xA0
Jun  8 10:53:12 freax ata_scsi_translate: EXIT
Jun  8 10:53:12 freax ata_hsm_move: ata9: protocol 7 task_state 1 (dev_stat
0x58)
Jun  8 10:53:12 freax atapi_send_cdb: send cdb
Jun  8 10:53:12 freax ata_host_intr: ata9: protocol 7 task_state 3
Jun  8 10:53:12 freax ata_host_intr: ata9: host_stat 0x1
Jun  8 10:53:12 freax ahci_interrupt: ENTER
Jun  8 10:53:12 freax ahci_interrupt: ENTER
Jun  8 10:53:12 freax ata_host_intr: ata9: protocol 7 task_state 3
Jun  8 10:53:12 freax ata_host_intr: ata9: host_stat 0x1
Jun  8 10:53:12 freax ata_host_intr: ata9: protocol 7 task_state 3
Jun  8 10:53:12 freax ata_host_intr: ata9: host_stat 0x4
Jun  8 10:53:12 freax ata_hsm_move: ata9: protocol 7 task_state 3 (dev_stat
0x50)
Jun  8 10:53:12 freax ata_hsm_move: ata9: dev 1 command complete, drv_stat 0x50
Jun  8 10:53:12 freax ata_sg_clean: unmapping 1 sg elements
Jun  8 10:53:12 freax atapi_qc_complete: ENTER, err_mask 0x0
Comment 2 Bartlomiej Zolnierkiewicz 2007-06-14 03:18:51 UTC
This is a libata issue.
Comment 3 Albert Lee 2007-06-14 03:41:19 UTC
Hi Bjoern,

After removing the medium from px-708a and placing medium into px-130a, does the shuttering still happens?
Comment 4 Bjoern Olausson 2007-06-14 04:08:01 UTC
I have a logfile with debug enabled. While removing and adding media to the devices. I echoed comments to the logfile.

use grep to find them:
grep -n "<----" messages_-_2007-06-14_12.29.59_-_with_debug.log

http://olausson.de/temp/messages_-_2007-06-14_12.29.59_-_with_debug.log.bz2

size     packed:   1872533 bytes
size   unpacked: 437679017 bytes
md5sum   packed: 0473581df658100a7a479619af206fd1
md5sum unpacked: d8f701c6a94cbed6b39a8fbf52451306

Ther was stuttering but with debugging enabled it was hard to find. I tried to echo a "<---- Stuttering occured above ---->" asa. I noticed a stuttering. So now I'll try without debug.... for me the output is more easy to grep and nail it to the stuttering.

Right now I am running without any of these devices attached. and compiling a kernel without debugging. No stuttering so far. But maybe you as pro will find some interesting stuff in that log.


>After removing the medium from px-708a and placing medium into px-130a, does
>the shuttering still happens?

I'll answer this when I booted the kernel without debuging

Thanks for your help
Bjoern
Comment 5 Bjoern Olausson 2007-06-14 04:26:14 UTC
Now here's the log without the two drives.

grep -n "<----" messages_-_2007-06-14_13.07.23_-_with_debub_without_drives.log
to see comments

The X-760A I not attached to the jmicron controller. IMHO it is attached to the Intel ICH7 pata port.

http://olausson.de/temp/messages_-_2007-06-14_13.07.23_-_with_debub_without_drives.log.bz2

size     packed:    922667 bytes
size   unpacked: 244104001 bytes
md5sum   packed: 1e867109840acdb914b1fff091197e66
md5sum unpacked: 8cf8a2672556cc7e45e4c8cc4fb2b0e6

Running now the kernel without debugging and wayting for someting to happen. (first without any media in drives, than I'll insert media into 130A and see what happens. After this I'll instert media into 708A)

regards
Bjoern
Comment 6 Bjoern Olausson 2007-06-14 05:49:44 UTC
Thing are getting more and more wired.

Every time I compile the kernel with debuging and recompile it without debuging a noticable events of the bug vanishes. 

Only thing I noticed while burning a DVD ISO image

<---- Starting to burn an other DVD iso with PX760A ---->
Jun 14 14:27:36 freax cdrom: This disc doesn't have any tracks I recognize!
Jun 14 14:30:01 freax cron[19764]: (root) CMD (test -x /usr/sbin/run-crons && /usr/sbin/run-crons )
Jun 14 14:35:54 freax wpa_cli: interface ath0 DISCONNECTED
Jun 14 14:35:54 freax wpa_cli: interface ath0 CONNECTED
Jun 14 14:35:59 freax ata7.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 frozen
Jun 14 14:35:59 freax ata7.00: cmd a0/01:00:00:00:00/00:00:00:00:00/a0 tag 0 cdb 0xad data 4 in
Jun 14 14:35:59 freax res 40/00:03:00:00:00/00:00:00:00:00/a0 Emask 0x4 (timeout)
Jun 14 14:36:06 freax ata7: port is slow to respond, please be patient (Status 0xd8)
Jun 14 14:36:11 freax ata7: soft resetting port
Jun 14 14:36:12 freax ata7.00: configured for UDMA/66
Jun 14 14:36:12 freax ata7: EH complete
<---- DVD burning is finished ---->

Thats it.

No more stuttering.....
Bu I'll continue to listen music and watch dmsg for hsm violation....

I'ts crazy

regards
Bjoern
Comment 7 Bjoern Olausson 2007-06-14 05:51:19 UTC
I shuld mention that currently ther is no differnece having a media loaded or not, drive independend.
Comment 8 Bjoern Olausson 2007-06-14 05:59:30 UTC
Okay, when talking about the beast... it occures:

ata9.01: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2
ata9.01: (BMDMA stat 0x4)
ata9.01: cmd a0/01:00:00:00:00/00:00:00:00:00/b0 tag 0 cdb 0x4a data 8 in
         res 00/00:00:00:00:00/00:00:00:00:00/00 Emask 0x2 (HSM violation)
ata9: soft resetting port
ata9.00: configured for UDMA/33
ata9.01: configured for UDMA/100
ata9: EH complete

while having a media in PX130A 
/dev/sr2 on /media/VirtualBox-WinXP type udf (ro,noexec,nosuid,nodev,uid=1001,gid=100,umask=000,iocharset=utf8)

but I didn't notice a stuttering... maybe I was to focused on learning Enzymekinetics...

regards
Bjoern
Comment 9 Bjoern Olausson 2007-06-14 06:16:19 UTC
I removed px-708A from the bus rebooted and we'll se what happens.

One thing I noticed in dmesg on line 343:

ATA: abnormal status 0x7F on port 0x0000000000010177

Anything to be concerned about?

full dmesg output here:
http://olausson.de/temp/dmsg-without_px708A

waiting for HSM violoation now ;-)

Thanks
Bjoern
Comment 10 Bjoern Olausson 2007-06-14 08:11:04 UTC
Now I was waiting for a long time for a HSM violation.
None occured. So I decidet to insert a DVD in the px-130A.

After some time a HSM-Violation accured.

But it does not longer target the DVD...
When I am not wrong it targets my second Harddrive (ata1.00).
But at the end of the error message it tells me something about sda... confusing.

root@freax $ cat /sys/bus/scsi/devices/1\:0\:0\:0/model
External Disk 0

This disc is a WDC WD740ADFD-00 connected on a EZ-Raid chip (not in Raid mode) which is IMHO bridged to one port of the intel ICH7. I guess you'd beter hav a look at the Asus P5W-DH Delux specs...

another one is connected directly to the Intel controller (my root boot and swap is on this one) The other disc is send to sleep wiht "hdparm -s 1 -S 120" on boot

root@freax $ cat /sys/bus/scsi/devices/0\:0\:0\:0/model
WDC WD740ADFD-00


Jun 14 16:36:59 freax UDF-fs: Partition marked readonly; forcing readonly mount
Jun 14 16:36:59 freax UDF-fs INFO UDF 0.9.8.1 (2004/29/09) Mounting volume 'Road_to_Guantanamo.TNPG', timestamp 2007/05/06 1
Jun 14 16:40:01 freax cron[6802]: (root) CMD (test -x /usr/sbin/run-crons && /usr/sbin/run-crons )
Jun 14 16:50:01 freax cron[10944]: (root) CMD (test -x /usr/sbin/run-crons && /usr/sbin/run-crons )
Jun 14 16:50:21 freax ata1.00: exception Emask 0x2 SAct 0xf800009 SErr 0x0 action 0x2 frozen
Jun 14 16:50:21 freax ata1.00: (spurious completions during NCQ issue=0x0 SAct=0xf800009 FIS=004040a1:00400000)
Jun 14 16:50:21 freax ata1.00: cmd 61/08:00:80:20:57/00:00:08:00:00/40 tag 0 cdb 0x0 data 4096 out
Jun 14 16:50:21 freax res 40/00:00:80:20:57/00:00:08:00:00/40 Emask 0x2 (HSM violation)
Jun 14 16:50:21 freax ata1.00: cmd 61/30:18:80:60:1f/00:00:00:00:00/40 tag 3 cdb 0x0 data 24576 out
Jun 14 16:50:21 freax res 40/00:00:80:20:57/00:00:08:00:00/40 Emask 0x2 (HSM violation)
Jun 14 16:50:21 freax ata1.00: cmd 61/08:b8:10:65:53/00:00:08:00:00/40 tag 23 cdb 0x0 data 4096 out
Jun 14 16:50:21 freax res 40/00:00:80:20:57/00:00:08:00:00/40 Emask 0x2 (HSM violation)
Jun 14 16:50:21 freax ata1.00: cmd 61/10:c0:38:65:53/00:00:08:00:00/40 tag 24 cdb 0x0 data 8192 out
Jun 14 16:50:21 freax res 40/00:00:80:20:57/00:00:08:00:00/40 Emask 0x2 (HSM violation)
Jun 14 16:50:21 freax ata1.00: cmd 61/18:c8:60:e3:53/00:00:08:00:00/40 tag 25 cdb 0x0 data 12288 out
Jun 14 16:50:21 freax res 40/00:00:80:20:57/00:00:08:00:00/40 Emask 0x2 (HSM violation)
Jun 14 16:50:21 freax ata1.00: cmd 61/08:d0:68:23:56/00:00:08:00:00/40 tag 26 cdb 0x0 data 4096 out
Jun 14 16:50:21 freax res 40/00:00:80:20:57/00:00:08:00:00/40 Emask 0x2 (HSM violation)
Jun 14 16:50:21 freax ata1.00: cmd 61/08:d8:40:12:57/00:00:08:00:00/40 tag 27 cdb 0x0 data 4096 out
Jun 14 16:50:21 freax res 40/00:00:80:20:57/00:00:08:00:00/40 Emask 0x2 (HSM violation)
Jun 14 16:50:22 freax ata1: soft resetting port
Jun 14 16:50:22 freax ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
Jun 14 16:50:22 freax ata1.00: configured for UDMA/133
Jun 14 16:50:22 freax ata1: EH complete
Jun 14 16:50:22 freax SCSI device sda: 145226112 512-byte hdwr sectors (74356 MB)
Jun 14 16:50:22 freax sda: Write Protect is off
Jun 14 16:50:22 freax sda: Mode Sense: 00 3a 00 00
Jun 14 16:50:22 freax SCSI device sda: write cache: enabled, read cache: enabled, doesn't support DPO or FUA


regards
blubbi
Comment 11 Bjoern Olausson 2007-06-14 10:43:01 UTC
here anoter HSM violation (stil with media in px-130A



ata1.00: exception Emask 0x2 SAct 0x6 SErr 0x0 action 0x2 frozen
ata1.00: (spurious completions during NCQ issue=0x0 SAct=0x6 FIS=004040a1:00000001)
ata1.00: cmd 61/08:08:d0:60:e3/00:00:07:00:00/40 tag 1 cdb 0x0 data 4096 out
         res 40/00:10:e8:64:ff/00:00:07:00:00/40 Emask 0x2 (HSM violation)
ata1.00: cmd 61/08:10:e8:64:ff/00:00:07:00:00/40 tag 2 cdb 0x0 data 4096 out
         res 40/00:10:e8:64:ff/00:00:07:00:00/40 Emask 0x2 (HSM violation)
ata1: soft resetting port
ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
ata1.00: configured for UDMA/133
ata1: EH complete
SCSI device sda: 145226112 512-byte hdwr sectors (74356 MB)
sda: Write Protect is off
sda: Mode Sense: 00 3a 00 00
SCSI device sda: write cache: enabled, read cache: enabled, doesn't support DPO or FUA
Comment 12 Bjoern Olausson 2007-06-14 10:44:45 UTC
sory, the last violation above was with a media in the PX-760A (not attached to jmicron)

now I'll remove the medium and wait for another violation.
Comment 13 Albert Lee 2007-06-14 22:51:52 UTC
A summary of the IDE/SATA ports on the machine:
- ata1 to ata4: ICH7R in ACHI mode (1f.2, irq 1275)
  WD740A drive is connected to ata1 and ata2 is bridged to EZ-Raid chip.
- ata5 and ata6: JMicron JMB363? (irq 17)
  No drives connected
- ata7 and ata8: Intel ICH7 in legacy port address (irq 14/15)
  Plextor PX-760a is connected to ata7
- ata9: JMicron IDE (irq 16)
  Plextor px-708a as master and px-130a as slave.

 
Comment 14 Albert Lee 2007-06-14 22:53:04 UTC
For the following HSM violation:

ata1.00: cmd 61/08:10:e8:64:ff/00:00:07:00:00/40 tag 2 cdb 0x0 data 4096 out
         res 40/00:10:e8:64:ff/00:00:07:00:00/40 Emask 0x2 (HSM violation)

It is related to the WD drive. cmd61 is FPDMA_WRITE. Maybe Tejun knows better about it...
Comment 15 Tejun Heo 2007-06-14 22:56:10 UTC
Yeah, that's because faulty NCQ implementation in the WD740ADFD.  Please post the result of 'hdparm -I /dev/sda'.  I'll add it to blacklist.  This is a separate problem from the ATAPI HSM violation tho.
Comment 16 Albert Lee 2007-06-14 23:02:07 UTC
<---- Starting to burn an other DVD iso with PX760A ---->
Jun 14 14:27:36 freax cdrom: This disc doesn't have any tracks I recognize!
Jun 14 14:30:01 freax cron[19764]: (root) CMD (test -x /usr/sbin/run-crons &&
/usr/sbin/run-crons )
Jun 14 14:35:54 freax wpa_cli: interface ath0 DISCONNECTED
Jun 14 14:35:54 freax wpa_cli: interface ath0 CONNECTED
Jun 14 14:35:59 freax ata7.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2
frozen
Jun 14 14:35:59 freax ata7.00: cmd a0/01:00:00:00:00/00:00:00:00:00/a0 tag 0
cdb 0xad data 4 in
Jun 14 14:35:59 freax res 40/00:03:00:00:00/00:00:00:00:00/a0 Emask 0x4
(timeout)
Jun 14 14:36:06 freax ata7: port is slow to respond, please be patient (Status
0xd8)
Jun 14 14:36:11 freax ata7: soft resetting port
Jun 14 14:36:12 freax ata7.00: configured for UDMA/66
Jun 14 14:36:12 freax ata7: EH complete
<---- DVD burning is finished ---->


==========================

cdb 0xad is READ_DVD_STRUCTURE. I guess we can ignore this timeout during DVD burning at this moment.
Comment 17 Albert Lee 2007-06-14 23:07:24 UTC
Okay, when talking about the beast... it occures:

ata9.01: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2
ata9.01: (BMDMA stat 0x4)
ata9.01: cmd a0/01:00:00:00:00/00:00:00:00:00/b0 tag 0 cdb 0x4a data 8 in
         res 00/00:00:00:00:00/00:00:00:00:00/00 Emask 0x2 (HSM violation)
ata9: soft resetting port
ata9.00: configured for UDMA/33
ata9.01: configured for UDMA/100
ata9: EH complete

while having a media in PX130A 
/dev/sr2 on /media/VirtualBox-WinXP type udf
(ro,noexec,nosuid,nodev,uid=1001,gid=100,umask=000,iocharset=utf8)

but I didn't notice a stuttering... maybe I was to focused on learning
Enzymekinetics...

=================================

Again this HSM violation only occurs with the px-130a drive and cdb 4a (GET_EVENT_STATUS_NOTIFICATION). I guess this is the problem of px-130a when doing ATAPI DMA for the specific command. Maybe we can limit this drive to ATAPI_DMA_RW_ONLY... But GET_EVENT_STATUS_NOTIFICATION mostly works ok. The HSM violation is rare and EH recoverrd it nicely. So, I am wondering if limiting to ATAPI_DMA_RW_ONLY is necesary...

For the NCQ HSM violation, maybe Tejun has better idea.
Comment 18 Bjoern Olausson 2007-06-15 01:57:57 UTC
>Yeah, that's because faulty NCQ implementation in the WD740ADFD.  Please post
>the result of 'hdparm -I /dev/sda'.  I'll add it to blacklist.  This is a
>separate problem from the ATAPI HSM violation tho.

hdparm -I /dev/sda

/dev/sda:

ATA device, with non-removable media
        Model Number:       WDC WD740ADFD-00NLR1
        Serial Number:      WD-WMANS1333464
        Firmware Revision:  20.07P20
Standards:
        Used: ATA/ATAPI-7 published, ANSI INCITS 397-2005
        Supported: 7 6 5 4
Configuration:
        Logical         max     current
        cylinders       16383   16383
        heads           16      16
        sectors/track   63      63
        --
        CHS current addressable sectors:   16514064
        LBA    user addressable sectors:  145226112
        LBA48  user addressable sectors:  145226112
        device size with M = 1024*1024:       70911 MBytes
        device size with M = 1000*1000:       74355 MBytes (74 GB)
Capabilities:
        LBA, IORDY(can be disabled)
        Queue depth: 32
        Standby timer values: spec'd by Standard, with device specific minimum
        R/W multiple sector transfer: Max = 16  Current = 16
        Recommended acoustic management value: 128, current value: 254
        DMA: mdma0 mdma1 mdma2 udma0 udma1 udma2 udma3 udma4 udma5 *udma6
             Cycle time: min=120ns recommended=120ns
        PIO: pio0 pio1 pio2 pio3 pio4
             Cycle time: no flow control=120ns  IORDY flow control=120ns
Commands/features:
        Enabled Supported:
           *    SMART feature set
                Security Mode feature set
           *    Power Management feature set
           *    Write cache
           *    Look-ahead
           *    Host Protected Area feature set
           *    WRITE_BUFFER command
           *    READ_BUFFER command
           *    NOP cmd
           *    DOWNLOAD_MICROCODE
                Power-Up In Standby feature set
           *    SET_FEATURES required to spinup after power up
                SET_MAX security extension
           *    Automatic Acoustic Management feature set
           *    48-bit Address feature set
           *    Device Configuration Overlay feature set
           *    Mandatory FLUSH_CACHE
           *    FLUSH_CACHE_EXT
           *    SMART error logging
           *    SMART self-test
           *    General Purpose Logging feature set
           *    SATA-I signaling speed (1.5Gb/s)
           *    Native Command Queueing (NCQ)
           *    Host-initiated interface power management
           *    Phy event counters
                DMA Setup Auto-Activate optimization
           *    Software settings preservation
           *    SMART Command Transport (SCT) feature set
           *    SCT Long Sector Access (AC1)
           *    SCT LBA Segment Access (AC2)
           *    SCT Error Recovery Control (AC3)
           *    SCT Features Control (AC4)
           *    SCT Data Tables (AC5)
                unknown 206[12]
Security:
        Master password revision code = 65534
                supported
        not     enabled
        not     locked
                frozen
        not     expired: security count
        not     supported: enhanced erase
Checksum: correct

Is this a big drawback in the drives speed?
Souldn't WD be notified about the bad implementaion so they can release an nes BIOS for the drive (if possible)

regards
Bjoern
Comment 19 Bjoern Olausson 2007-06-15 02:02:03 UTC
Anything more I can do to help you?

Thanks a lot
Bjoern
Comment 20 Albert Lee 2007-06-15 02:13:08 UTC
Created attachment 11760 [details]
Patch for not stopping DMA if the device is busy

After checking the trace, maybe we should not stop DMA if the device is still busy.

Hi Bjoern,

Could you please try the attached patch and see if the "ata9.01 HSM violation" still occurs, thanks.
Comment 21 Albert Lee 2007-06-15 02:31:28 UTC
Created attachment 11761 [details]
Patch for not stopping DMA if the device is busy

Hi Bjoern,

Sorry, please ignore my previous patch and use this instead.
Could you please try the attached revised patch and see if the "ata9.01 HSM violation" still occurs, thanks.
Comment 22 Bjoern Olausson 2007-06-15 02:39:43 UTC
just booted the revised patch...

I'll redo the first patch and try the new one.

Thanks
Bjoern
Comment 23 Bjoern Olausson 2007-06-15 05:24:55 UTC
So far no "ata9.01 HSM violation" has occured. Seems as if your 02_jmicron_irq.diff patch did it.

So all these errors are results from bugy software impelementation  in the drives (WD and Plextor)? Any you guys now have to work around thmem, am I right?
Or did I get something wrong?

Thanks for the help
Bjoern
Comment 24 Bjoern Olausson 2007-06-15 05:47:17 UTC
By the way, is there a way to upgrade the firmware of the harddrives?

Thanks
Bjoern
Comment 25 Bjoern Olausson 2007-06-15 07:45:04 UTC
Stil no "ata9.01 HSM violation" but 

ata1.00: exception Emask 0x2 SAct 0x1fe00 SErr 0x0 action 0x2 frozen
ata1.00: (spurious completions during NCQ issue=0x0 SAct=0x1fe00 FIS=004040a1:00040000)
ata1.00: cmd 61/18:48:d0:4e:6d/00:00:05:00:00/40 tag 9 cdb 0x0 data 12288 out
         res 40/00:90:28:a6:6c/00:00:05:00:00/40 Emask 0x2 (HSM violation)
ata1.00: cmd 61/10:50:f0:4e:6d/00:00:05:00:00/40 tag 10 cdb 0x0 data 8192 out
         res 40/00:90:28:a6:6c/00:00:05:00:00/40 Emask 0x2 (HSM violation)
ata1.00: cmd 61/08:58:48:9c:6d/00:00:05:00:00/40 tag 11 cdb 0x0 data 4096 out
         res 40/00:90:28:a6:6c/00:00:05:00:00/40 Emask 0x2 (HSM violation)
ata1.00: cmd 61/08:60:b0:9c:6d/00:00:05:00:00/40 tag 12 cdb 0x0 data 4096 out
         res 40/00:90:28:a6:6c/00:00:05:00:00/40 Emask 0x2 (HSM violation)
ata1.00: cmd 61/28:68:90:9d:6d/00:00:05:00:00/40 tag 13 cdb 0x0 data 20480 out
         res 40/00:90:28:a6:6c/00:00:05:00:00/40 Emask 0x2 (HSM violation)
ata1.00: cmd 61/08:70:50:a1:6d/00:00:05:00:00/40 tag 14 cdb 0x0 data 4096 out
         res 40/00:90:28:a6:6c/00:00:05:00:00/40 Emask 0x2 (HSM violation)
ata1.00: cmd 61/08:78:a8:a1:6d/00:00:05:00:00/40 tag 15 cdb 0x0 data 4096 out
         res 40/00:90:28:a6:6c/00:00:05:00:00/40 Emask 0x2 (HSM violation)
ata1.00: cmd 61/08:80:b0:a1:6d/00:00:05:00:00/40 tag 16 cdb 0x0 data 4096 out
         res 40/00:90:28:a6:6c/00:00:05:00:00/40 Emask 0x2 (HSM violation)
ata1: soft resetting port
ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
ata1.00: configured for UDMA/133
ata1: EH complete
SCSI device sda: 145226112 512-byte hdwr sectors (74356 MB)
sda: Write Protect is off
sda: Mode Sense: 00 3a 00 00
SCSI device sda: write cache: enabled, read cache: enabled, doesn't support DPO or FUA

is there a way to turn of NCQ?

regards
blubbi
Comment 26 Tejun Heo 2007-06-17 22:35:42 UTC
Created attachment 11774 [details]
NCQ-blacklist.patch

NCQ will be turned off automatically after a few such incidents and here's a patch to blacklist NCQ for the drive.  I'll submit the patch upstream soon.  Thanks.
Comment 27 Bjoern Olausson 2007-06-18 12:39:22 UTC
That did the trick.
no more "spurious completions during NCQ" issues in dmesg.

Thanks a lot.

Do you know if the reason will be fixed? Cause disabeing is just a workaround IMHO.

regards
blubbi
Comment 28 Albert Lee 2007-06-18 16:50:43 UTC
> So far no "ata9.01 HSM violation" has occured.

Hi Bjoern,

Before submitting the workaround patch for the JMicron, I would like to make sure whether the problem is specific to JMicron or might affect other controllers. Could you please disconnect the px-130a from the JMicron IDE port and reconnect it to the Intel IDE port (that is, connect the px-130a to the same port as PX-760a). 

Please see if the px-130a causes any "HSM violation" with the Intel port, both with/without medium in the px-130a drive. Thanks.
Comment 29 Bjoern Olausson 2007-06-18 17:14:45 UTC
I'll test it. and post my results.

Should I test it with the patched kernel or unpatched?
Comment 30 Albert Lee 2007-06-18 17:45:46 UTC
> Should I test it with the patched kernel or unpatched?

Both are ok, but unpatched kernel preferred.
Comment 31 Tejun Heo 2007-06-18 20:38:48 UTC
Regarding spurious completion: The firmware is faulty and violates the NCQ protocol so there's nothing much more to do from the driver side than not using it.  You can scream at the vendor for firmware upgrade tho.  :-)
Comment 32 Bjoern Olausson 2007-06-19 03:09:52 UTC
Tejun Heo:

If you tell me what evidence I should throw at WD, I'll do it.
Do you thing this thread is evidence enough?

regards
blubbi
Comment 33 Bjoern Olausson 2007-06-19 03:57:53 UTC
Albert Lee:
Here is the requested information. I disconnected the PX-760a and connected the px-130a to its port. So now here is the HSM-Violation again.

This was done with the patched Kernel.

ata7.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 frozen
ata7.00: (BMDMA stat 0x24)
ata7.00: cmd a0/01:00:00:00:00/00:00:00:00:00/a0 tag 0 cdb 0x4a data 8 in
         res 7f/7f:7f:7f:7f:7f/00:00:00:00:00/7f Emask 0x2 (HSM violation)
ata7: soft resetting port
ATA: abnormal status 0x7F on port 0x00000000000101f7
ata7.00: configured for UDMA/33
ata7: EH complete
ata7.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 frozen
ata7.00: (BMDMA stat 0x24)
ata7.00: cmd a0/01:00:00:00:00/00:00:00:00:00/a0 tag 0 cdb 0x4a data 8 in
         res 7f/7f:7f:7f:7f:7f/00:00:00:00:00/7f Emask 0x2 (HSM violation)
ata7: soft resetting port
ATA: abnormal status 0x7F on port 0x00000000000101f7
ata7.00: configured for UDMA/33
ata7: EH complete
Comment 34 Tejun Heo 2007-06-19 23:00:08 UTC
(In reply to comment #32)
> If you tell me what evidence I should throw at WD, I'll do it.
> Do you thing this thread is evidence enough?

I think quoting the error message and telling them that their drives are blacklisted for NCQ should do the trick.
Comment 35 Albert Lee 2007-06-20 01:33:40 UTC
Created attachment 11810 [details]
Patch to limit ATAPI DMA to R/W only for Plextor PX-130A"

Hi Bjoern,

Hmm, the problem also happens with the Intel controller. Maybe we should blacklist the Plextor drive instead of workaround from the controller side.

Could you please keep the px-130a drive connected to the Intel ICH7 port and try the attached "Limit ATAPI DMA to R/W only for Plextor PX-130A" patch, better with ATA_DEBUG/ATA_DEBUG turned off.

Please check if the px-130a ever causes any "HSM violation" with the new patch. Thanks.
Comment 36 Bjoern Olausson 2007-06-20 02:52:42 UTC
Albert Lee: Okay, I'll check it.

Tejun Heo: Request to WD is out. Let's see what happens.

Thanks
Bjoern
Comment 37 Bjoern Olausson 2007-06-20 04:43:39 UTC
Created attachment 11816 [details]
libata-core.c_bug_8627.diff

mmh, I trie to combine the two patches "Limit ATAPI DMA to R/W only for Plextor PX-130A" and "NCQ-blacklist.patch"

but I get a error during compilation:
  CC      drivers/ata/libata-core.o
drivers/ata/libata-core.c: In Funktion »ata_dev_configure«:
drivers/ata/libata-core.c:1788: Warnung: in Vergleich verschiedener Zeigertypen fehlt Typkonvertierung
drivers/ata/libata-core.c: Auf höchster Ebene:
drivers/ata/libata-core.c:3390: Fehler: expected »}« before »ata_device_blacklist«
make[2]: *** [drivers/ata/libata-core.o] Fehler 1
make[1]: *** [drivers/ata] Fehler 2
make: *** [drivers] Fehler 2

So here's the diff I have created to patch the lates 2.6.21.5 kernel.
Where's the problem? Where's my mistake?

--- libata-core.c_org   2007-06-20 12:56:45.000000000 +0200
+++ libata-core.c_patched       2007-06-20 13:37:31.000000000 +0200
@@ -3362,6 +3362,7 @@ static const struct ata_blacklist_entry
        /* Weird ATAPI devices */
        { "TORiSAN DVD-ROM DRD-N216", NULL,     ATA_HORKAGE_MAX_SEC_128 |
                                                ATA_HORKAGE_DMA_RW_ONLY },
+       { "PLEXTOR DVD-ROM PX-130A", NULL,      ATA_HORKAGE_DMA_RW_ONLY },

        /* Devices we expect to fail diagnostics */

@@ -3379,11 +3380,14 @@ static const struct ata_blacklist_entry
        { "HTS541060G9SA00",    "MB3OC60D",     ATA_HORKAGE_NONCQ, },
        { "HTS541080G9SA00",    "MB4OC60D",     ATA_HORKAGE_NONCQ, },
        { "HTS541010G9SA00",    "MBZOC60D",     ATA_HORKAGE_NONCQ, },
-
+       /* Drives which do spurious command completion */
+       { "HTS541612J9SA00",    "SBDIC7JP",     ATA_HORKAGE_NONCQ, },
+       { "WDC WD740ADFD-00NLR1", NULL,         ATA_HORKAGE_NONCQ, },
+
        /* Devices with NCQ limits */

        /* End Marker */
-       { }
+       { }ata_device_blacklist
 };

 unsigned long ata_device_blacklisted(const struct ata_device *dev)
Comment 38 Bjoern Olausson 2007-06-20 04:51:43 UTC
Created attachment 11817 [details]
proper libata-core.c_bug_8627.diff

Sry found the mistake:
+       { }ata_device_blacklist

attached the corrected version

Regards
blubbi
Comment 39 Bjoern Olausson 2007-06-20 04:54:13 UTC
but sill I get:

drivers/ata/libata-core.c: In Funktion »ata_dev_configure«:
drivers/ata/libata-core.c:1788: Warnung: in Vergleich verschiedener Zeigertypen fehlt Typkonvertierung

Something to worry about?
Comment 40 Bjoern Olausson 2007-06-20 10:07:14 UTC
Albert Lee:
Okay, no more "HSM violation" with your latest patch.
neither on the Intel nor on the Jmicron controller.

Thanks
Bjoern
Comment 41 Albert Lee 2007-06-20 23:38:15 UTC
Hi Bjoern,

Tejun has a new patch for ATAPI DMA. Could you please drop my previous patches and try if the px-130a ever timeout after Tejun's new patch applied?
(Please test with px-130a connected to ich7 and jmicron.)

The new patch to test:
https://bugzilla.novell.com/attachment.cgi?id=147389
(The original bug: https://bugzilla.novell.com/show_bug.cgi?id=229260)

Thanks for your help/patience.
Comment 42 Bjoern Olausson 2007-06-21 03:17:36 UTC
No problem.

But I guess I'll still have to apply the NCQ blacklist patch.

I have to thank you for your help.

Best regards
Bjoern
Comment 43 Bjoern Olausson 2007-06-21 03:44:36 UTC
I can't apply the last part of the patch:

diff --git a/include/linux/libata.h b/include/linux/libata.h
index 745c4f9..e9659ff 100644
--- a/include/linux/libata.h
+++ libata.h
@@ -298,7 +298,6 @@ enum {
 	ATA_HORKAGE_NODMA	= (1 << 1),	/* DMA problems */
 	ATA_HORKAGE_NONCQ	= (1 << 2),	/* Don't use NCQ */
 	ATA_HORKAGE_MAX_SEC_128	= (1 << 3),	/* Limit max sects to 128 */
-	ATA_HORKAGE_DMA_RW_ONLY	= (1 << 4),	/* ATAPI DMA for RW only */
 };
 
 enum hsm_task_states {

cause these lines do not appear in the vanilla sources. 2.6.21.5

regards
blubbi
Comment 44 Bjoern Olausson 2007-06-21 03:46:28 UTC
Tought the line

ATA_HORKAGE_DMA_RW_ONLY = (1 << 4),     /* ATAPI DMA for RW only */

is just removed, I am wondering if I need to apply the patch.
But do I need the other three lines above?

thanks
Bjoern
Comment 45 Albert Lee 2007-06-21 06:44:37 UTC
Yes, Tejun's patch is against 2.6.22-rc5. If convinient, please test it with 2.6.22-rc5; otherwise maybe apply the following part of the patch manually.

 /**
  *	ata_check_atapi_dma - Check whether ATAPI DMA can be supported
  *	@qc: Metadata associated with taskfile to check
@@ -4124,33 +4120,19 @@ static void ata_fill_sg(struct ata_queued_cmd *qc)
 int ata_check_atapi_dma(struct ata_queued_cmd *qc)
 {
 	struct ata_port *ap = qc->ap;
-	int rc = 0; /* Assume ATAPI DMA is OK by default */

+
+	/* Don't allow DMA if it isn't multiple of 16 bytes.  Quite a
+	 * few ATAPI devices choke on such DMA requests.
+	 */
+	if (unlikely(qc->nbytes & 15))
+		return 1;
 
 	if (ap->ops->check_atapi_dma)
-		rc = ap->ops->check_atapi_dma(qc);
+		return ap->ops->check_atapi_dma(qc);
 
-	return rc;
+	return 0;
 }
+
 /**
  *	ata_qc_prep - Prepare taskfile for submission
  *	@qc: Metadata associated with taskfile to be prepared
diff --git a/drivers/ata/libata-scsi.c b/drivers/ata/libata-scsi.c
index c228df2..4ddf00c 100644
--- a/drivers/ata/libata-scsi.c
+++ b/drivers/ata/libata-scsi.c
@@ -2384,11 +2384,6 @@ static unsigned int atapi_xlat(struct ata_queued_cmd *qc)
 	int using_pio = (dev->flags & ATA_DFLAG_PIO);
 	int nodata = (scmd->sc_data_direction == DMA_NONE);
 
-	if (!using_pio)
-		/* Check whether ATAPI DMA is safe */
-		if (ata_check_atapi_dma(qc))
-			using_pio = 1;
-
 	memset(qc->cdb, 0, dev->cdb_len);
 	memcpy(qc->cdb, scmd->cmnd, scmd->cmd_len);
 
@@ -2401,19 +2396,22 @@ static unsigned int atapi_xlat(struct ata_queued_cmd *qc)
 	}
 
 	qc->tf.command = ATA_CMD_PACKET;
+	qc->nbytes = scmd->request_bufflen;
+
+	/* check whether ATAPI DMA is safe */
+	if (!using_pio && ata_check_atapi_dma(qc))
+		using_pio = 1;
 
-	/* no data, or PIO data xfer */
 	if (using_pio || nodata) {
+		/* no data, or PIO data xfer */
 		if (nodata)
 			qc->tf.protocol = ATA_PROT_ATAPI_NODATA;
 		else
 			qc->tf.protocol = ATA_PROT_ATAPI;
 		qc->tf.lbam = (8 * 1024) & 0xff;
 		qc->tf.lbah = (8 * 1024) >> 8;
-	}
-
-	/* DMA data xfer */
-	else {
+	} else {
+		/* DMA data xfer */
 		qc->tf.protocol = ATA_PROT_ATAPI_DMA;
 		qc->tf.feature |= ATAPI_PKT_DMA;
 
@@ -2422,8 +2420,6 @@ static unsigned int atapi_xlat(struct ata_queued_cmd *qc)
 			qc->tf.feature |= ATAPI_DMADIR;
 	}
 
-	qc->nbytes = scmd->request_bufflen;
-
 	return 0;
 }
Comment 46 Bjoern Olausson 2007-06-21 07:07:43 UTC
Created attachment 11845 [details]
my libata.h

Still no go using 2.6.22-rc5

the libata.h file seems to differ completely from the one Tejun is using:

libata.h.rej

***************
*** 298,304 ****
        ATA_HORKAGE_NODMA       = (1 << 1),     /* DMA problems */
        ATA_HORKAGE_NONCQ       = (1 << 2),     /* Don't use NCQ */
        ATA_HORKAGE_MAX_SEC_128 = (1 << 3),     /* Limit max sects to 128 */
-       ATA_HORKAGE_DMA_RW_ONLY = (1 << 4),     /* ATAPI DMA for RW only */
  };

  enum hsm_task_states {
--- 298,303 ----
        ATA_HORKAGE_NODMA       = (1 << 1),     /* DMA problems */
        ATA_HORKAGE_NONCQ       = (1 << 2),     /* Don't use NCQ */
        ATA_HORKAGE_MAX_SEC_128 = (1 << 3),     /* Limit max sects to 128 */
  };

  enum hsm_task_states {
Comment 47 Bjoern Olausson 2007-06-21 07:10:04 UTC
STOP... my fault.... tried to patch the wrong libata.h in /driver/ata/

SRY

Bjoern
Comment 48 Bjoern Olausson 2007-06-21 07:34:56 UTC
okay, applied against 2.6.21.5 too.

root@freax $ patch -p0 < bug-229260_update-ata_check_atapi_dma.patch
patching file ../linux-2.6.21.5/drivers/ata/libata-core.c
Hunk #1 succeeded at 1787 with fuzz 2 (offset -259 lines).
Hunk #2 succeeded at 3356 with fuzz 1 (offset -420 lines).
Hunk #3 succeeded at 3670 (offset -432 lines).
Hunk #4 succeeded at 3688 (offset -432 lines).
patching file ../linux-2.6.21.5/drivers/ata/libata-scsi.c
Hunk #1 succeeded at 2445 (offset 61 lines).
Hunk #2 succeeded at 2457 (offset 61 lines).
Hunk #3 succeeded at 2481 (offset 61 lines).
patching file ../linux-2.6.21.5/include/linux/libata.h
Hunk #1 succeeded at 312 (offset 14 lines).

in 2.6.22 madwifi-ng is not working for me. So I'd rather test with 2.6.21 until the wlan drivers are included in the kernel and I don't have to hasel around with madwifi-ng.

regards
Bjoern
Comment 49 Bjoern Olausson 2007-06-21 11:45:51 UTC
So far no errors on the jmicron controller.

Now I'll test the intel controllern.

regards
Bjoern
Comment 50 DE 2007-06-24 02:58:08 UTC
Hi Bjoern,

There is one way to upgrade the firmware of harddrives, provided that its manufacturer will release a newer firmware ONLY if the original firmware is f**ked up!

The ATA spec itself describes a firmware update mechanism with the DOWNLOAD MICROCODE command.Microcode is not in the serial flash ROM.There is just a small boot code.The firmware that affects you is actually on the magnetic media of the disk at an inaccessible by end users area (part of the maintanance tracks?).Updating it is like overwriting a file in your disk.What is worse, ATA reports that the firmware can be up to 33,553,920 bytes which is 31,99MB.This is
 a huge space to fix any error if the controller logic is not completely broken and if the controller processor has the processing power to meet a slightly increased overhead.But if the disk becomes faster,more responsive then you will not buy their latest and greatest product.Well they do not want that...

Note here that the ATA spec states that if a device receives a firmware modification, all error log data shall be discarded and the device error count for the life of the device shall be reset to zero.So a copy of those should be taken in an update if an update is provided as errors sometimes occur at specific LBA and it for your interest to have this LBA in mind.

In the past IBM,SEAGATE and WD have given firmware for some drives.The most famous were the IBM deathstar firmware updates.So far WD does not supply such a firmware for your drive...Their current firmware updates are hidden in their site and given through their knowledge base articles, where the firmware and its proprietary programmer are in the form of DOS executables...Firmware updates have been given for some WD drives such as the first WD scorpios which had problems with power management,for some 2500KS SATA disks with wrong 
buffer size (8MB instead of 16MB!!) and of course for the YS series regarding RAID issues (WD has given a firmware for RAID problems in the past again as the huge recovery time made many PATA drives fall out of RAID arrays).As you can understand they give updates only if there is a major issue that will force WD to replace its drives.WD has gone over the last couple of years from bad to worse as they have started making frequent firmware errors!

In my turn I want to express my sympathy for your crappy hardware!I had both a bad plextor DVD and a bad raptor...Both were returned back though.The plextor replacement DVD did the same problems with the initial after a while and it went to the garbage.I also had a WD360GD-00FNA0 and I still have a WD360GD-00FLCO(bought a couple of years after the first).After buying them, I saw that the first one was limited to UDMA 5 transfers in Linux as I got "applying bridge limits" because of the Marvell 88i8030-TBC SATA bridge chip.Also it 
could not accept SATA latch cables as the SATA data connector does not have the required rails. So SATA latch cables cannot fit to the connector and only normal cables can be installed...In contrast, the second one (FLC0) has UDMA6 with the updated Marvell 88i8030-TBC1 bridge and has the rails for latch cables.

Both raptors were bought for SATA I 150MB/s support but the first only supported the legacy PATA UDMA 5 at 100MB/s.Raptors were supposed to kill SCSI but with their UDMA 5 for the first 360GD and no NCQ for the 740ADFD the only thing they killed were our pockets. Luckily this time in contrast with the DVD I was able to do the last act of dignity remaining, I killed the first raptor on warranty and got a SEAGATE replacement!.To be honest I got two SATAII 160GB WDs but one of them worked in UDMA 5!Yes a 2007 native(no bridge) SATAII drive (WD1600JS-60NCB1) only gave UDMA5.So it was replaced again with a SEAGATE...

Despite the replacement I got so angry because WD fooled me(UDMA 6 $$$ price for UDMA 5 product) for some time and I have named the WD360GD-00FNA0 and WD740ADFD-00NLR1, Traptor because they were build to trap computer users.

I was thinking seriously to buy a 740ADFD disk in the past,but for some reason I did not.So no WD again for me!I guess that is what you should do if they do not give a firmware update that solves this problem.WD should have already released a firmware for such an expensive drive with four features(three now!), 10000RPM & 16MB buffer & NCQ & RAFF.If they give a fix (I would not sleep on that side though) then remember to blacklist only WDC WD740ADFD-00NLR1 with firmware 20.07P20 .

Finally your hard disk serial number is useless to anyone(noone needs the full S/N as a proof that you actually own the drive,the firmware number is already enough).Next time do not sumbit personal data such as full S/N or full MAC addresses on bug reports, instead do it like this: WD-WMANS1******.
Comment 51 Bjoern Olausson 2007-06-25 23:15:04 UTC
Thanks for this reply. I appreciate your tipps and your explanations.

So far WD did not answere to my request... but I'll continue to nag them.

Thanks a lot.

best regards
Bjoern
Comment 52 Albert Lee 2007-06-25 23:58:09 UTC
> Now I'll test the intel controllern.

Hi Bjoern,

Any news with the intel controller?
Comment 53 Bjoern Olausson 2007-06-26 01:44:14 UTC
Sory, I forgot to post the results:

none

Everything works great... intel and jmicron.
No more HSM violations in dmesg

Just the one after burning a DVD.
Comment #16
( http://bugzilla.kernel.org/show_bug.cgi?id=8627#c16 )

Sory for the late answer.

Regards & thanks for your help!
Bjoern
Comment 54 Albert Lee 2007-07-02 01:43:31 UTC
Hi Tejun,

Since both of your NCQ-blacklist and limit ATAPI to multiple of 16-bytes patches are accepted, maybe we can close this bug...
Comment 55 Bjoern Olausson 2007-07-02 03:02:36 UTC
I would agree ;-)

Thanks for the help guys!

regards
Bjoern Olausson
Comment 56 Tejun Heo 2007-07-02 03:12:45 UTC
Thanks a lot for driving this, Albert.  Closing.