Bug 14981 - Kernel 2.6.32 defeats external FireWire disk power management
Summary: Kernel 2.6.32 defeats external FireWire disk power management
Status: CLOSED OBSOLETE
Alias: None
Product: Drivers
Classification: Unclassified
Component: IEEE1394 (show other bugs)
Hardware: All Linux
: P1 high
Assignee: drivers_ieee1394
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2010-01-03 17:11 UTC by Timothy Miller
Modified: 2012-05-14 15:32 UTC (History)
2 users (show)

See Also:
Kernel Version: 2.6.32
Subsystem:
Regression: No
Bisected commit-id:


Attachments
/var/log/messages portion pertaining to firewire (47.64 KB, text/plain)
2010-01-05 02:55 UTC, Timothy Miller
Details

Description Timothy Miller 2010-01-03 17:11:13 UTC
I have a 500GB Western Digital MyBook external Firewire hard drive.  This drive's on-board controller has a power management feature that will spin down the disk if it is idle for 10 minutes and spin it up again on demand.  

- If I connect this drive via Firewire to either a Mac or a Windows PC (and mount it), and I don't access files on that volume for 10 minutes, the drive will spin down appropriately.  Note that I can tell that on the Mac, the volume is NOT being unmounted, yet it still spins down.  Thus, the problem is not with the drive but something the host is doing.
- If I connect it to my Gentoo Linux (2.6.32-gentoo-r1) box and DON'T mount the volume, it will spin down after 10 minutes.  I'm guessing that at this point, nothing has the device open; it's merely recognized.
- If I mount it under Linux but do not otherwise access any files on the volume (no reads or writes besides what's necessary to mount it), it NEVER spins down.  From this, I infer that either something firewire-related or filesystem-related is at fault.
- If I connect it to the Linux but via USB instead, mount it, access it, and then leave it alone for 10 minutes, the drive will spin down properly.  Thus, the problem isn't with ext3.

Other info:
- I'm running Linux kernel 2.6.32-gentoo-r1, Gentoo genkernel.
- No desktop environment is running, doing accesses behind my back.
- While there are processes that access the internal disks, nothing is pointed at the external drive, so nothing should access it.  And things like updatedb only run once per day on cron.
- The format is ext3 with ordered journaling, but since I'm not writing to the disk, the journal manager should have nothing to do. 
- Besides, the drive has an activity light that blinks when it's being accessed even the tiniest bit, and I never see it blink.
- There appears to be no way via Firewire to affect the drive's power management.  According to my research, this is all in the drive's internal Firewire-to-SATA controller.  No host involvement is required (or allowed) to cause the disk to spin down.

I have run "lm-profiler" that spends 600 seconds looking for disk access to any drive.  It found only two processes with disk activity during that time.  The first is "md3_raid1", which only applies to the internal RAID1 array.  The other is "flush-9:3".  This appears to be a kernel thread, but I cannot find out what it does.  This is what I find when I grep the process table:

root     18222     2  0 Dec30 ?        00:00:30 [flush-9:3]

I've used lsof and fuser to look to see if any process has either the device node or the mounted volume.  I find nothing.

I think I've ruled out any user process being responsible for this.  Nothing is doing any reads or writes to the drive.  Additionally, since the drive works fine under Windows and MacOS, it's not the drive.  And since it works fine using USB, it's not the filesystem layer.  So the only thing I can conceive of is that the kernel firewire driver is performing some OTHER (non read/write) access to the drive that is keeping the controller from powering down the disk.  (Or it's doing nothing but continually rereading some small bit of data cached in the controller, although I'd expect that to make the activity light blink.)

Linux appears to have a bug that prevents Firewire drives (or this one anyhow) from performing power management.  This is a major problem that has the potential to shorten the lifespan of the drive.  I'm using this external drive for periodic backups, so I expect it to only be active during the backup process, which is why I got an external drive with power management.  Unfortunately, as far as I can tell, the Linux kernel is doing something actively to defeat it.

This bug report is basically a repeat of what I posted here:
http://bugs.gentoo.org/show_bug.cgi?id=299168
Comment 1 Timothy Miller 2010-01-03 18:27:40 UTC
BTW, I just want to make it clear that I've done a lot of research on this.  Lots of people have reported problems with external drives not spinning down.  But most of these seem to be cases where the external drive contains the root partition.  In that case, it's important to make sure of certain things like setting noatime, configuring cron, and a variety of other things that prevent user (daemon) processes from accessing the disk.  In my case, the drive is a just an external drive with data files on it, mounted on a mount point of my choosing.  There's literally no system activity to the disk, demonstrated by the fact that it spins down fine when connected via USB (and it gets the same device node and mount point when I use USB, so the conditions are the same).
Comment 2 Anonymous Emailer 2010-01-04 09:04:26 UTC
Reply-To: stefanr@s5r6.in-berlin.de

bugzilla-daemon@bugzilla.kernel.org wrote:
[...]
> since it works fine
> using USB, it's not the filesystem layer.  So the only thing I can conceive
> of
> is that the kernel firewire driver is performing some OTHER (non read/write)
> access to the drive that is keeping the controller from powering down the
> disk.
[...]
> Unfortunately, as far as I can tell, the Linux kernel is doing something
> actively to defeat it.

Preliminary reply before I look further into it:

No, the kernel most definitely does not do anything like that.  Not the
SCSI subsystem, and less so the FireWire subsystem.  The latter is only
a fairly thin layer that encapsulates SCSI requests into FireWire
packets on command of the SCSI subsystem.  There is some asynchronism at
the kernel's filesystem layer (write caching and cache flushing,
read-ahead...), but this layer on the other hand is, for all practical
purposes, unaware of the particular hardware buses, transport protocols,
even command sets that are used to access a disk.

At most, the kernel is doing something "passively", i.e. is doing
something differently from Windows (AFAIK current OS X versions spin
FireWire disks down _actively_, so this is no 1:1 comparison), and is
presenting the disk to userspace differently when attached at USB in
contrast to FireWire.

I.e. the issue is caused either by userland or by firmware.  We need to
find out whether the kernel

[...]
> Lots of people have reported problems with external drives not spinning down. 

Ironically, a number of other people have reported problems with some
FireWire disks that do implement their own auto-spin-down scheme because
their firmware is buggy WRT spin up at next access.  My impression is
that a majority of firmware programmers who implement this get it _wrong_.

I do not have a WD FireWire disk enclosure myself.  I will go through my
stash of FireWire disks that I remember to spin down themselves (two
based on the rather rare Texas Instruments StorageLynx bridge chip,
perhaps one or two others based on other FireWire-to-IDE bridge chips
which I use rarely nowadays).  Perhaps there was some kind of regression
in newer kernel/userspace which I missed.  (I use Gentoo with vanilla
kernel.)

[...]
> There's literally no system activity to the disk, demonstrated by
> the fact that it spins down fine when connected via USB (and it gets the same
> device node and mount point when I use USB, so the conditions are the same).

The conditions are not 100% the same:
  - On the Linux host, sysfs attributes and consequently information
    provided by hal to userspace differ between USB disks and FireWire
    disks.
  - On the bridge in the disk enclosure, the USB part and the FireWire
    part are run by entirely different firmwares.

Timothy,
what is the exact model number from WD?  (If you reply by mail, include
"Cc: linux1394-devel@lists.sourceforge.net" please.)

Also, one test which you can do on your side:  Check whether
auto-spin-down behaviour differs between old FireWire kernel drivers and
new ones, i.e. the modules
    ohci1394 + ieee1394 + sbp2 ( = old 1394 stack)
versus
    firewire-ohci + firewire-core + firewire-sbp2 ( = new 1394 stack).
You can see the old and new stack as separate options in the kernel
configurator's IEEE 1394 menu.  You can build and install both stacks
and explicitly load the desired one with
	# modprobe -r ohci1394 firewire-ohci
	# modprobe ohci1394
or
	# modprobe -r ohci1394 firewire-ohci
	# modprobe firewire-ohci
respectively.  You can also prevent automatic module loading of one or
both stacks, as hinted in the configuration help texts or described
here:  http://ieee1394.wiki.kernel.org/index.php/Juju_Migration
Or install only one of the two stacks at a time if you are unsure about
a dual stack installation.

Furthermore, when you have the new drivers running, please
# echo 3 > /sys/module/firewire_ohci/parameters/debug
before an idle period and watch /var/log/messages or dmesg.  That way
you can check whether there is any FireWire traffic at all at that time.
# echo 0 > /sys/module/firewire_ohci/parameters/debug
disables this logging when you don't need it anymore.  (The old 1394
drivers do not have such a logging facility that can be switched on and
off at runtime.)
Comment 3 Timothy Miller 2010-01-04 13:56:45 UTC
Thanks for the reply.  I will try the older driver stack as you requested.  

BTW, while it's true that MacOS likes to be aggressive about power management, short of knowing some protocol special to the chipset that WD uses, there is no way to affect the timeout interval or instruct the drive to power down immediately.  I get the impression that people are bothered by the fact that this drive doesn't expose these interfaces and that you have to rely on the drive's firmware to do the right thing.  For instance, some people might like a shorter interval.  (There are some Windows tools for SOME WD enclosures that can adjust the timeout.)  So while there are lots of things we can do to _prevent_ spindown, there is no open interface we can use to FORCE it to spin down.  Either that, or lots of web pages I read are wrong (yes, I believe everything I read on the interwebs).

I'm sure it's entirely possible that we've hit some kind of bug in the firmware that rears its head in FireWire mode.  Why it doesn't happen with MacOS or Windows is a mystery we might want to solve.  (WD MyBooks are fairly popular, and we wouldn't its users to find that Linux falls behind the other OS's.)  One ordering of legal ops is fine; another set puts the drive in a funny state.  It's interesting that this doesn't become a problem until the volume is mounted.

If we can't solve this, I'll just use the drive in USB mode.  It's faster in FireWire mode, but I'm just going to use it for backups that happen at like 4AM.  I just thought we might want to poke and prod at this a bit to see if we can solve it for everyone.

Sorry for the ramble.  :)
Comment 4 Anonymous Emailer 2010-01-04 14:20:33 UTC
Reply-To: stefanr@s5r6.in-berlin.de

(Quoting in full for linux1394-devel)

bugzilla-daemon@bugzilla.kernel.org wrote:
> http://bugzilla.kernel.org/show_bug.cgi?id=14981
> 
> 
> --- Comment #3 from Timothy Miller <theosib@gmail.com>  2010-01-04 13:56:45
> ---
> Thanks for the reply.  I will try the older driver stack as you requested.  
> 
> BTW, while it's true that MacOS likes to be aggressive about power
> management,

I guess OS X sends START STOP UNIT commands to stop the motor after idle
periods.  Anyhow, it stops one of my FireWire/USB disks which never
stops on Linux (FireWire/USB) or Windows 2000 (USB).

> short of knowing some protocol special to the chipset that WD uses, there is
> no
> way to affect the timeout interval or instruct the drive to power down
> immediately.  I get the impression that people are bothered by the fact that
> this drive doesn't expose these interfaces and that you have to rely on the
> drive's firmware to do the right thing.  For instance, some people might like
> a
> shorter interval.  (There are some Windows tools for SOME WD enclosures that
> can adjust the timeout.)  So while there are lots of things we can do to
> _prevent_ spindown, there is no open interface we can use to FORCE it to spin
> down.  Either that, or lots of web pages I read are wrong (yes, I believe
> everything I read on the interwebs).

Indeed; auto-spin-down in disk enclosures¹ which implement this can only
be configured by overwriting firmware parameters, which (if at all) is
only possible with detailed knowledge of the firmware.

¹) FireWire, probably USB too, except if ATA pass-through is implemented
thus that the drive itself can be directly configured like an IDE disk.
I am not aware of FireWire-to-IDE or FireWire-to-SATA bridges which
implement ATA pass-through.

> I'm sure it's entirely possible that we've hit some kind of bug in the
> firmware
> that rears its head in FireWire mode.  Why it doesn't happen with MacOS or
> Windows is a mystery we might want to solve.  (WD MyBooks are fairly popular,
> and we wouldn't its users to find that Linux falls behind the other OS's.) 
> One

Alas there is a range of MyBook models and firmware revisions.  We can
support firmware quirks only to the extent that those who own MyBooks
can find experimentally, unless we approach WD directly and get a good
contact at them.

> ordering of legal ops is fine; another set puts the drive in a funny state. 
> It's interesting that this doesn't become a problem until the volume is
> mounted.
> 
> If we can't solve this, I'll just use the drive in USB mode.

Another possible workaround is the scsi-idle script or a similar
daemon-like solution which is based on polling the device stats from
procfs or sysfs and sending a START STOP UNIT command if an idle period
was detected.  Effectively, this implements what is presumably built in
into OS X.

> It's faster in
> FireWire mode, but I'm just going to use it for backups that happen at like
> 4AM.  I just thought we might want to poke and prod at this a bit to see if
> we
> can solve it for everyone.
> 
> Sorry for the ramble.  :)

BTW, I tested one of the disks of which I was sure that it spun down
itself in the past (TI StorageLynx based MomoBay CX-1) on Gentoo with
current HAL daemon, LXDE desktop, and KDE 4.3.3's Dolphin file manager
running in an LXDE session (disk mounted via Dolphin, keeping a tab with
this disk open in Dolphin all the time) on Linux 2.6.32.2.  This still
works like it always did, i.e. spins down the disk.

Please remember to check with firewire-ohci's debug logging turned on to
investigate whether there is FireWire traffic when the disk is supposed
to be idle.  (That's more reliable than having to watch the enclosure's
activity LED all the time.)
Comment 5 Stefan Richter 2010-01-04 14:33:33 UTC
Please also remember to provide the exact model name of the MyBook or better yet WD's model number.
Comment 6 Anonymous Emailer 2010-01-04 17:57:48 UTC
Reply-To: billfink@mindspring.com

Not as a fix, but a possible workaround:

I have the following in my /etc/rc.local file to cause my external
Firewire disk to spin down on inactivity:

/usr/local/sbin/autospindown sdb 3600 300 &

This checks every 5 minutes to see if the disk has been inactive
for 1 hour, and if so spins the disk down.  The autospindown
script requires the sdparm utility (which is part of the sdparm
RPM on FC11).

Here is the autospindown shell script (I don't remember where
I got it from):

--------------------------------------------------------------------------------
#!/bin/sh

disk=$1
interval=$2
delta=$3

state=`grep -w $disk /proc/diskstats`
count=$interval
up=1

while [ true ]; do
    sleep $delta
    count=$(($count-$delta))
    newstate=`grep -w $disk /proc/diskstats`
    if [ "$state" = "$newstate" ]; then
	if [ $count -le 0 ]; then
	    count=$interval
	    if [ $up = 1 ]; then
		#echo -e "spin-down\t" `date`
		sync
		state=`grep -w $disk /proc/diskstats`
		sdparm --command=stop -f /dev/$disk > /dev/null 2>&1
		up=0
	    fi
	fi
    else
	#echo -e "drive is up\t" `date`
	count=$interval
	state="$newstate"
	up=1
    fi
done
--------------------------------------------------------------------------------

I hope this helps.

						-Bill



On Mon, 04 Jan 2010, Stefan Richter wrote:

> bugzilla-daemon@bugzilla.kernel.org wrote:
> [...]
> > since it works fine
> > using USB, it's not the filesystem layer.  So the only thing I can conceive
> of
> > is that the kernel firewire driver is performing some OTHER (non
> read/write)
> > access to the drive that is keeping the controller from powering down the
> disk.
> [...]
> > Unfortunately, as far as I can tell, the Linux kernel is doing something
> > actively to defeat it.
> 
> Preliminary reply before I look further into it:
> 
> No, the kernel most definitely does not do anything like that.  Not the
> SCSI subsystem, and less so the FireWire subsystem.  The latter is only
> a fairly thin layer that encapsulates SCSI requests into FireWire
> packets on command of the SCSI subsystem.  There is some asynchronism at
> the kernel's filesystem layer (write caching and cache flushing,
> read-ahead...), but this layer on the other hand is, for all practical
> purposes, unaware of the particular hardware buses, transport protocols,
> even command sets that are used to access a disk.
> 
> At most, the kernel is doing something "passively", i.e. is doing
> something differently from Windows (AFAIK current OS X versions spin
> FireWire disks down _actively_, so this is no 1:1 comparison), and is
> presenting the disk to userspace differently when attached at USB in
> contrast to FireWire.
> 
> I.e. the issue is caused either by userland or by firmware.  We need to
> find out whether the kernel
> 
> [...]
> > Lots of people have reported problems with external drives not spinning
> down. 
> 
> Ironically, a number of other people have reported problems with some
> FireWire disks that do implement their own auto-spin-down scheme because
> their firmware is buggy WRT spin up at next access.  My impression is
> that a majority of firmware programmers who implement this get it _wrong_.
> 
> I do not have a WD FireWire disk enclosure myself.  I will go through my
> stash of FireWire disks that I remember to spin down themselves (two
> based on the rather rare Texas Instruments StorageLynx bridge chip,
> perhaps one or two others based on other FireWire-to-IDE bridge chips
> which I use rarely nowadays).  Perhaps there was some kind of regression
> in newer kernel/userspace which I missed.  (I use Gentoo with vanilla
> kernel.)
> 
> [...]
> > There's literally no system activity to the disk, demonstrated by
> > the fact that it spins down fine when connected via USB (and it gets the
> same
> > device node and mount point when I use USB, so the conditions are the
> same).
> 
> The conditions are not 100% the same:
>   - On the Linux host, sysfs attributes and consequently information
>     provided by hal to userspace differ between USB disks and FireWire
>     disks.
>   - On the bridge in the disk enclosure, the USB part and the FireWire
>     part are run by entirely different firmwares.
> 
> Timothy,
> what is the exact model number from WD?  (If you reply by mail, include
> "Cc: linux1394-devel@lists.sourceforge.net" please.)
> 
> Also, one test which you can do on your side:  Check whether
> auto-spin-down behaviour differs between old FireWire kernel drivers and
> new ones, i.e. the modules
>     ohci1394 + ieee1394 + sbp2 ( = old 1394 stack)
> versus
>     firewire-ohci + firewire-core + firewire-sbp2 ( = new 1394 stack).
> You can see the old and new stack as separate options in the kernel
> configurator's IEEE 1394 menu.  You can build and install both stacks
> and explicitly load the desired one with
>       # modprobe -r ohci1394 firewire-ohci
>       # modprobe ohci1394
> or
>       # modprobe -r ohci1394 firewire-ohci
>       # modprobe firewire-ohci
> respectively.  You can also prevent automatic module loading of one or
> both stacks, as hinted in the configuration help texts or described
> here:  http://ieee1394.wiki.kernel.org/index.php/Juju_Migration
> Or install only one of the two stacks at a time if you are unsure about
> a dual stack installation.
> 
> Furthermore, when you have the new drivers running, please
> # echo 3 > /sys/module/firewire_ohci/parameters/debug
> before an idle period and watch /var/log/messages or dmesg.  That way
> you can check whether there is any FireWire traffic at all at that time.
> # echo 0 > /sys/module/firewire_ohci/parameters/debug
> disables this logging when you don't need it anymore.  (The old 1394
> drivers do not have such a logging facility that can be switched on and
> off at runtime.)
Comment 7 Anonymous Emailer 2010-01-04 20:01:47 UTC
Reply-To: stefanr@s5r6.in-berlin.de

Bill Fink wrote:
> Not as a fix, but a possible workaround:
> 
> I have the following in my /etc/rc.local file to cause my external
> Firewire disk to spin down on inactivity:
> 
> /usr/local/sbin/autospindown sdb 3600 300 &
[...]
> (I don't remember where I got it from)

A variant of this script is for example listed here:
http://www.nslu2-linux.org/wiki/FAQ/SpinDownUSBHarddisks

[...]
>               sdparm --command=stop -f /dev/$disk > /dev/null 2>&1

Some firmwares expect a variation of the stop command (more precisely,
of the Start Stop Unit command with start bit off):

		sg_start --stop --pc=3 /dev/$disk > /dev/null 2>&1

Otherwise they don't stop (Prolific PL-3507) or worse, hang later (TI
StorageLynx).  On many distributions, sg_start is part of a package
called sg3_utils.  --pc=2 may work for spin-down too.  The default would
be --pc=0 which is accepted by spec-compliant firmwares.

------------------

Timothy,

there is one more thing which you should check and report please:
$ cat /sys/class/block/${the_proper_device_name}/removable

Do this
  a) when the disk is attached at FireWire and
  b) when it is attached at USB.

(The "removable" flag is populated according to a flag in a SCSI
device's Inquiry response.  Its meaning is not whether the drive can be
disconnected, but whether the medium can be pulled out of the drive. If
--- and only if --- this flag is on, the sd driver module issues a
Prevent Allow Medium Removal command with a "prohibit medium removal"
bit to the disk whenever the disk device is opened, e.g. by mount.)
Comment 8 Timothy Miller 2010-01-05 02:54:30 UTC
I apologize in advance if I leave anything out.  Don't hesitate to ask.

The model name is "WD5000H1CS-00"

I did:
# echo 3 > /sys/module/firewire_ohci/parameters/debug
Then connected the drive and mounted the volume.  Then I set a timer.  Lo and behold after about 11 minutes 45 seconds, it spun down.  It never did that before connected to the Linux box.  The last time I connected it by FireWire, I came back 2 hours later, and it was still spinning.  (And it was hot, so it hadn't just spun up.)  I tested this a few times, so I don't know what's changed.  I guess I would say that whatever is causing the trouble isn't 100% consistent.  I'll add the relevant parts of the log as an attachment.  I'll try a few more times later to see if I can get it to spin down or not.  But that script that was kindly provided may be my work-around to this inconsistency.

If I get it to not spin down again with the newer stack, then I'll try the old stack.


When connected by FireWire:
# cat /sys/class/block/sdc/removable
0 

When connected by USB:
# cat /sys/class/block/sdc/removable
0


I did:
# sdparm --command=stop -f /dev/sdc

The first time, the drive just made a funny noise but kept spinning.  So I tried it again.  It spun down.  It would appear that some of my research is wrong.  This drive DOES support spinning down via FireWire command.

I tried:
# sg_start --stop --pc=3 /dev/sdc

I got the same results.  Funny noise the first time, success the second.
Comment 9 Timothy Miller 2010-01-05 02:55:13 UTC
Created attachment 24437 [details]
/var/log/messages portion pertaining to firewire
Comment 10 Timothy Miller 2010-01-05 15:00:28 UTC
Ok.  I was kinda baffled as to why I had found that the drive would spin down after 11.75 minutes, which was inconsistent with what I'd seen earlier.  Maybe something had changed?  But then I looked at it this morning, after no activity over night, and the disk was spinning (and accessible, so no funny hanging or anything).  Just in case, it had just spun up, I set a timer.  15 minutes later, it was still spinning, but I could find no activity.

So what I've done is manually spun it down (sdparm) and turned on debugging.  I'm not going to access the drive (intentionally), and I'll see if anything happens between now and like tomorrow.

I'm starting to feel that this isn't a Linux problem at all but really screwy firmware, and that script posted here is the solution.

Also, I had wondered why I could get it to spin down after 10 minutes initially when I wasn't able to do that before.  I had never before used this drive in USB mode.  One of you guys mentioned that the firmwares for USB and FireWire are separate.  So maybe the USB firmware is better, and when I used it for the first time, it programmed something nonvolatile into the drive that persisted when I later connected it by FireWire.  That would account for the spindown, so now we just have to figure out the cause of the spurious spinup.
Comment 11 Anonymous Emailer 2010-01-05 19:32:09 UTC
Reply-To: stefanr@s5r6.in-berlin.de

(excuse the full quote, it's for the linux1394-devel mailinglist)

bugzilla-daemon@bugzilla.kernel.org wrote:
> http://bugzilla.kernel.org/show_bug.cgi?id=14981
> 
> 
> 
> 
> 
> --- Comment #8 from Timothy Miller <theosib@gmail.com>  2010-01-05 02:54:30
> ---
> I apologize in advance if I leave anything out.  Don't hesitate to ask.
> 
> The model name is "WD5000H1CS-00"

( = MyDisk Home Edition, USB 2.0/ FireWire 400/ eSATA,
http://support.wdc.com/product/install.asp?groupid=111)

> I did:
> # echo 3 > /sys/module/firewire_ohci/parameters/debug
> Then connected the drive and mounted the volume.  Then I set a timer.  Lo and
> behold after about 11 minutes 45 seconds, it spun down.  It never did that
> before connected to the Linux box.  The last time I connected it by FireWire,
> I
> came back 2 hours later, and it was still spinning.  (And it was hot, so it
> hadn't just spun up.)  I tested this a few times, so I don't know what's
> changed.  I guess I would say that whatever is causing the trouble isn't 100%
> consistent.  I'll add the relevant parts of the log as an attachment.  I'll
> try
> a few more times later to see if I can get it to spin down or not.  But that
> script that was kindly provided may be my work-around to this inconsistency.
> 
> If I get it to not spin down again with the newer stack, then I'll try the
> old
> stack.
> 
> 
> When connected by FireWire:
> # cat /sys/class/block/sdc/removable
> 0 
> 
> When connected by USB:
> # cat /sys/class/block/sdc/removable
> 0

OK.  These are the proper values, so the issue lies somewhere else.

> I did:
> # sdparm --command=stop -f /dev/sdc
> 
> The first time, the drive just made a funny noise but kept spinning.  So I
> tried it again.  It spun down.  It would appear that some of my research is
> wrong.  This drive DOES support spinning down via FireWire command.
> 
> I tried:
> # sg_start --stop --pc=3 /dev/sdc
> 
> I got the same results.  Funny noise the first time, success the second.
> 

This is a hint at buggy firmware.  These commands should stop the disk
at their first run, not at the second.  Apparently the firmware
programmers never tested that themselves, or they hacked something up
which just so works with a different HDD mechanism but not the
particular HDD which is built into your enclosure.  Whether this is
connected with the actual bug at hand (auto spin-down not working) is an
open question though.

> 
> --- Comment #9 from Timothy Miller <theosib@gmail.com>  2010-01-05 02:55:13
> ---
> Created an attachment (id=24437)
>  --> (http://bugzilla.kernel.org/attachment.cgi?id=24437)
> /var/log/messages portion pertaining to firewire

Interesting, they implement an enclosure services logical unit, in
addition to the usual block device logical unit.  Perhaps this logical
unit is used to drive the capacity gauge which the WD5000H1CS-00 manual
mentions.

But I do not see any hint to a problem source in the log.

> 
> --- Comment #10 from Timothy Miller <theosib@gmail.com>  2010-01-05 15:00:28
> ---
> Ok.  I was kinda baffled as to why I had found that the drive would spin down
> after 11.75 minutes, which was inconsistent with what I'd seen earlier. 
> Maybe
> something had changed?  But then I looked at it this morning, after no
> activity
> over night, and the disk was spinning (and accessible, so no funny hanging or
> anything).  Just in case, it had just spun up, I set a timer.  15 minutes
> later, it was still spinning, but I could find no activity.
> 
> So what I've done is manually spun it down (sdparm) and turned on debugging. 
> I'm not going to access the drive (intentionally), and I'll see if anything
> happens between now and like tomorrow.
> 
> I'm starting to feel that this isn't a Linux problem at all but really screwy
> firmware, and that script posted here is the solution.
> 
> Also, I had wondered why I could get it to spin down after 10 minutes
> initially
> when I wasn't able to do that before.  I had never before used this drive in
> USB mode.  One of you guys mentioned that the firmwares for USB and FireWire
> are separate.  So maybe the USB firmware is better, and when I used it for
> the
> first time, it programmed something nonvolatile into the drive that persisted
> when I later connected it by FireWire.  That would account for the spindown,
> so
> now we just have to figure out the cause of the spurious spinup.
> 

Sounds all rather random.

By the way, I just now discovered another way in the SCSI specs to
control power conditions of devices.  This involves writing to the
so-called Power Condition mode page and can be done with sdparm or with
sg_wr_mode from sg3_utils.  The Power Condition mode page contains timer
fields which specify the idle period before the device shall enter a low
power state.

"sg_modes -a /dev/sdX" lists all mode pages which a device supports.
Support of Power Condition mode page is not mandatory.
Comment 12 Timothy Miller 2010-01-07 01:51:58 UTC
Since my last report, I just left the disk idle with debugging on.  The volume is mounted, but no accesses are being performed.  Since then, I've found this appear in the logs:

Jan  6 03:10:52 compute0 kernel: firewire_ohci: AT spd 2 tl 20, ffc1 -> ffc0, ack_pending , BW req, fffff0100008 8,0
Jan  6 03:10:52 compute0 kernel: firewire_ohci: AR spd 2 tl 20, ffc0 -> ffc1, ack_complete, W resp
Jan  6 03:10:52 compute0 kernel: firewire_ohci: AR spd 2 tl 20, ffc0 -> ffc1, ack_complete, BW req, 000100000000 10,0
Jan  6 03:10:52 compute0 kernel: firewire_ohci: AT spd 2 tl 21, ffc1 -> ffc0, ack_pending , BW req, fffff0100008 8,0
Jan  6 03:10:52 compute0 kernel: firewire_ohci: AR spd 2 tl 21, ffc0 -> ffc1, ack_complete, W resp
Jan  6 03:10:58 compute0 kernel: firewire_ohci: AR spd 2 tl 28, ffc0 -> ffc1, ack_complete, BW req, 000100000000 8,0
Jan  6 03:10:58 compute0 kernel: firewire_ohci: AT spd 2 tl 22, ffc1 -> ffc0, ack_pending , BW req, fffff0100008 8,0
Jan  6 03:10:58 compute0 kernel: firewire_ohci: AR spd 2 tl 22, ffc0 -> ffc1, ack_complete, W resp
Jan  6 03:10:58 compute0 kernel: firewire_ohci: AR spd 2 tl 28, ffc0 -> ffc1, ack_complete, BW req, 000100000000 8,0
Jan  6 03:10:58 compute0 kernel: firewire_ohci: AT spd 2 tl 23, ffc1 -> ffc0, ack_pending , BW req, fffff0100008 8,0
Jan  6 03:10:58 compute0 kernel: firewire_ohci: AR spd 2 tl 23, ffc0 -> ffc1, ack_complete, W resp
Jan  6 03:10:58 compute0 kernel: firewire_ohci: AR spd 2 tl 06, ffc0 -> ffc1, ack_complete, BW req, 000100000000 8,0
Jan  6 03:10:58 compute0 kernel: firewire_ohci: AT spd 2 tl 24, ffc1 -> ffc0, ack_pending , BW req, fffff0100008 8,0
Jan  6 03:10:58 compute0 kernel: firewire_ohci: AR spd 2 tl 24, ffc0 -> ffc1, ack_complete, W resp
Jan  6 03:10:58 compute0 kernel: firewire_ohci: AR spd 2 tl 10, ffc0 -> ffc1, ack_complete, BW req, 000100000000 8,0
Jan  6 03:11:31 compute0 kernel: firewire_ohci: AT spd 2 tl 25, ffc1 -> ffc0, ack_pending , BW req, fffff0100008 8,0
Jan  6 03:11:31 compute0 kernel: firewire_ohci: AR spd 2 tl 25, ffc0 -> ffc1, ack_complete, W resp
Jan  6 03:11:31 compute0 kernel: firewire_ohci: AR spd 2 tl 1a, ffc0 -> ffc1, ack_complete, BW req, 000100000000 8,0
Jan  6 03:11:31 compute0 kernel: firewire_ohci: AT spd 2 tl 26, ffc1 -> ffc0, ack_pending , BW req, fffff0100008 8,0
Jan  6 03:11:31 compute0 kernel: firewire_ohci: AR spd 2 tl 26, ffc0 -> ffc1, ack_complete, W resp
Jan  6 03:11:31 compute0 kernel: firewire_ohci: AR spd 2 tl 04, ffc0 -> ffc1, ack_complete, BW req, 000100000000 8,0

Tell you anything?
Comment 13 Anonymous Emailer 2010-01-10 12:47:52 UTC
Reply-To: stefanr@s5r6.in-berlin.de

bugzilla-daemon@bugzilla.kernel.org wrote:
> http://bugzilla.kernel.org/show_bug.cgi?id=14981
> 
> --- Comment #12 from Timothy Miller <theosib@gmail.com>  2010-01-07 01:51:58
> ---
> Since my last report, I just left the disk idle with debugging on.  The
> volume
> is mounted, but no accesses are being performed.  Since then, I've found this
> appear in the logs:
> 
> Jan  6 03:10:52 compute0 kernel: firewire_ohci: AT spd 2 tl 20, ffc1 -> ffc0,
> ack_pending , BW req, fffff0100008 8,0
> Jan  6 03:10:52 compute0 kernel: firewire_ohci: AR spd 2 tl 20, ffc0 -> ffc1,
> ack_complete, W resp
> Jan  6 03:10:52 compute0 kernel: firewire_ohci: AR spd 2 tl 20, ffc0 -> ffc1,
> ack_complete, BW req, 000100000000 10,0
> Jan  6 03:10:52 compute0 kernel: firewire_ohci: AT spd 2 tl 21, ffc1 -> ffc0,
> ack_pending , BW req, fffff0100008 8,0
> Jan  6 03:10:52 compute0 kernel: firewire_ohci: AR spd 2 tl 21, ffc0 -> ffc1,
> ack_complete, W resp
> Jan  6 03:10:58 compute0 kernel: firewire_ohci: AR spd 2 tl 28, ffc0 -> ffc1,
> ack_complete, BW req, 000100000000 8,0
[...]
> Jan  6 03:10:58 compute0 kernel: firewire_ohci: AT spd 2 tl 24, ffc1 -> ffc0,
> ack_pending , BW req, fffff0100008 8,0
> Jan  6 03:10:58 compute0 kernel: firewire_ohci: AR spd 2 tl 24, ffc0 -> ffc1,
> ack_complete, W resp
> Jan  6 03:10:58 compute0 kernel: firewire_ohci: AR spd 2 tl 10, ffc0 -> ffc1,
> ack_complete, BW req, 000100000000 8,0
> Jan  6 03:11:31 compute0 kernel: firewire_ohci: AT spd 2 tl 25, ffc1 -> ffc0,
> ack_pending , BW req, fffff0100008 8,0
> Jan  6 03:11:31 compute0 kernel: firewire_ohci: AR spd 2 tl 25, ffc0 -> ffc1,
> ack_complete, W resp
> Jan  6 03:11:31 compute0 kernel: firewire_ohci: AR spd 2 tl 1a, ffc0 -> ffc1,
> ack_complete, BW req, 000100000000 8,0
> Jan  6 03:11:31 compute0 kernel: firewire_ohci: AT spd 2 tl 26, ffc1 -> ffc0,
> ack_pending , BW req, fffff0100008 8,0
> Jan  6 03:11:31 compute0 kernel: firewire_ohci: AR spd 2 tl 26, ffc0 -> ffc1,
> ack_complete, W resp
> Jan  6 03:11:31 compute0 kernel: firewire_ohci: AR spd 2 tl 04, ffc0 -> ffc1,
> ack_complete, BW req, 000100000000 8,0
> 
> Tell you anything?

This is the part of FireWire traffic which is associated with SCSI
requests and involves CPU interrupts.  (Host ffc1 writes a notification
about a new SCSI request to ffc0.fffff0100008, target performs all
required remote DMA to fetch the command and transfer data without CPU
interrupts, finally target ffc0 writes completion status to
ffc1.000100000000.  I.e. three consecutive log lines belong to one SCSI
transaction.  Actual FireWire addresses may vary from session to
session.  Some targets handle the notification in a way that a
transaction causes only two instead of three interrupts.)

So this means that there was some regular I/O going on after 03 o'clock.
The timestamps show that the 1st command was completed in under a
second, the next one took six seconds (perhaps because this one required
the target to switch on the motor and spin up the disk), then a few more
requests followed, then the host left the disk alone for 93 seconds,
then issued another two SCSI requests.

Now, the question is what caused those requests.  This is most probably
a userland issue rather than a kernel issue, and it is obviously not a
FireWire driver issue.  The first place to look at is the crontab.

So, there are two problems:
  - Disk does not automatically spin down (if, and only if, attached via
    FireWire and on Linux and filesystem is mounted).  Can only be a
    firmware bug, but if we knew more about it, a workaround might be
    possible.
  - Disk is woken up when no access was supposed to happen.  Surely a
    userland issue.
(The autospindown script which Bill posted works around problem 1 and
somewhat mitigates problem 2.)

Note You need to log in before you can comment on or make changes to this bug.