Bug 219362 - USB SATA does not correctly shutdown the SSD upon poweroff causing data loss
Summary: USB SATA does not correctly shutdown the SSD upon poweroff causing data loss
Status: NEW
Alias: None
Product: Power Management
Classification: Unclassified
Component: Other (show other bugs)
Hardware: All Linux
: P3 normal
Assignee: Default virtual assignee for Drivers/USB
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2024-10-08 11:43 UTC by Tomas Mudrunka
Modified: 2024-10-10 07:10 UTC (History)
2 users (show)

See Also:
Kernel Version: 6.1.1
Subsystem:
Regression: No
Bisected commit-id:


Attachments

Description Tomas Mudrunka 2024-10-08 11:43:02 UTC
I have SATA SSD drive connected via TUSB9261 converter. Eventualy i experience filesystem inconsistency when i repeatedly shutdown the device like this:

echo s > /proc/sysrq-trigger;
echo u > /proc/sysrq-trigger;
echo o > /proc/sysrq-trigger;

Same problem does not occur when i do this:

echo s > /proc/sysrq-trigger;
echo u > /proc/sysrq-trigger;
for i in /sys/block/*/device/delete; do echo 1 > "$i"; done;
echo o > /proc/sysrq-trigger;

This tells me, sysrq power(o)ff does not shutdown the devices in correct order. It shuts down the USB hub before it sends SATA shutdown to the SSD, therefore failing to notify SSD to write its internal caches before PMIC shuts down the power completely.
Comment 1 gregkh 2024-10-08 11:53:30 UTC
On Tue, Oct 08, 2024 at 11:43:02AM +0000, bugzilla-daemon@kernel.org wrote:
> I have SATA SSD drive connected via TUSB9261 converter. Eventualy i
> experience
> filesystem inconsistency when i repeatedly shutdown the device like this:
> 
> echo s > /proc/sysrq-trigger;
> echo u > /proc/sysrq-trigger;
> echo o > /proc/sysrq-trigger;

Don't do that, you will end up with an unstable system.  Please always
shutdown like normal.

> Same problem does not occur when i do this:
> 
> echo s > /proc/sysrq-trigger;
> echo u > /proc/sysrq-trigger;
> for i in /sys/block/*/device/delete; do echo 1 > "$i"; done;
> echo o > /proc/sysrq-trigger;

Yes, that is because you are shutting down things in the proper order,
like a normal shutdown will do.

sysrq is not for a normal shutdown, again, do not do that and expect to
have data saved properly.

good luck!

greg k-h
Comment 2 Tomas Mudrunka 2024-10-08 12:00:33 UTC
> Don't do that, you will end up with an unstable system.

It's an emergency shutdown in embedded scenario, we have capacitors that are able to sustain the system for 1 second in case of unexpected power loss. It's plenty of time to shutdown filesystems and drives properly. But we simply cannot afford to wait for userspace. Works nicely so far.

> Yes, that is because you are shutting down things in the proper order,
like a normal shutdown will do.

Is there a reason for sysrq shutdown not to do that? I mean... c'mon... shutting down SSDs cleanly is the bare minimum we should do... if we fail to do that, we can as well just unplug the power without trying to unmount or sync anything...
Comment 3 Greg Kroah-Hartman 2024-10-08 12:08:59 UTC
On Tue, Oct 08, 2024 at 12:00:33PM +0000, bugzilla-daemon@kernel.org wrote:
> It's an emergency shutdown in embedded scenario, we have capacitors that are
> able to sustain the system for 1 second in case of unexpected power loss.
> It's
> plenty of time to shutdown filesystems and drives properly. But we simply
> cannot afford to wait for userspace. Works nicely so far.

"so far" seems to mean "data loss" so I doubt that :)

> > Yes, that is because you are shutting down things in the proper order,
> like a normal shutdown will do.
> 
> Is there a reason for sysrq shutdown not to do that? I mean... c'mon...
> shutting down SSDs cleanly is the bare minimum we should do... if we fail to
> do
> that, we can as well just unplug the power without trying to unmount or sync
> anything...

Because the sysrq code does NOT go through the normal device shutdown
process, you are circumventing it, because it is only for emergencies.
Don't rely on it for a normal shutdown, as that is not what it is
designed for at all.

good luck!

greg k-h
Comment 4 Tomas Mudrunka 2024-10-08 12:22:26 UTC
Anyway... My coworker says he triggered the same error by proper shutdown. He gets sata command fail somewhere near the end of the log when shutting down.
Comment 5 oneukum 2024-10-08 13:27:32 UTC
On 08.10.24 14:22, bugzilla-daemon@kernel.org wrote:
> https://bugzilla.kernel.org/show_bug.cgi?id=219362
> 
> --- Comment #4 from Tomas Mudrunka (harviecz@gmail.com) ---
> Anyway... My coworker says he triggered the same error by proper shutdown. He
> gets sata command fail somewhere near the end of the log when shutting down.
> 

Then please provide full dmesg. We need to know what is happening.
You should definitely get a cache flush command during the shotdown.

	Regards
		Olver
Comment 6 Alan Stern 2024-10-08 15:04:34 UTC
The 'u' sysrq code attempts to remount all mounted filesystems read-only.  Shouldn't that send a synchronize-cache command to the SSD?  If it doesn't, there's something wrong.

It's also possible that Tomas is sending the 'o' poweroff sysrq code too soon after the 'u' code, leaving the system insufficient time to flush the cache before it powers off.

Tomas, the 'o' code does not shut down devices at all, let alone give them a chance to clean up first.  As Greg said, it is meant for emergency use.  All it does is turn off the system power -- it does not perform a clean shutdown.
Comment 7 Tomas Mudrunka 2024-10-08 15:15:09 UTC
what is the point of having 'u' and 's' keys in sysrq to sync filesystems, when we're not able to sync controler of underlying harddrive?
Comment 8 Tomas Mudrunka 2024-10-08 15:17:43 UTC
And it's not just about "sync" of the harddrive. It's about stopping it wear leveling operations. Modern storages DO shuffle data between blocks even when you don't write anything to them, power loss will cause data loss even on read-only systems. (users of raspberry pi with read-only SD cards can tell you lots of stories on this topic). Therefore we need to signal the device in advance we're going to remove its power.
Comment 9 Alan Stern 2024-10-08 16:13:43 UTC
The sysrq commands were originally implemented before drive controllers for PCs contained any significant state.  Maybe they need some updating.

More generally, these questions should be posed to the SCSI maintainers, because it is the SCSI subsystem which is responsible for sending the commands you're talking about.  The USB driver does nothing more than relay the commands and their results between the SCSI subsystem and the storage device.
Comment 10 Tomas Mudrunka 2024-10-09 10:08:41 UTC
Would it be appropriate solution to add new sysrq command to call device_shutdown().
defined in drivers/base/core.c

or perhaps call it directly in 'o' and 'b' sysrq handlers?
Comment 11 gregkh 2024-10-09 10:34:38 UTC
On Wed, Oct 09, 2024 at 10:08:41AM +0000, bugzilla-daemon@kernel.org wrote:
> https://bugzilla.kernel.org/show_bug.cgi?id=219362
> 
> --- Comment #10 from Tomas Mudrunka (harviecz@gmail.com) ---
> Would it be appropriate solution to add new sysrq command to call
> device_shutdown().
> defined in drivers/base/core.c

That is the normal shutdown path (i.e. the syscall), please use that
instead if you wish to have devices shutdown correctly.

sysrq sync and shutdown is for extraordinary situations where the system
is broken and you are attempting to shut it down in a way that is better
than just pulling the power at that point in time.  Don't attempt to
turn it into something else.

thanks,

greg k-h
Comment 12 Tomas Mudrunka 2024-10-09 10:41:40 UTC
But i want to be 100% sure no userspace will be involved in the shutdown. If i do proper shutdown the init will be doing stuff and i cannot achieve the shutdown on time before the backup capacitors discharge...
Comment 13 Tomas Mudrunka 2024-10-09 10:44:55 UTC
For example manapage for "poweroff -f" says following:

       -f, --force
           Force immediate power-off, halt, or reboot. If specified, the command does not contact the init system. In most cases, filesystems are not properly unmounted before shutdown. For example, the command
           reboot -f is mostly equivalent to systemctl reboot -ff, instead of systemctl reboot -f.



when compared to sysrq... sysrq can at least unmount the filesystems...
Comment 14 Tomas Mudrunka 2024-10-09 12:54:16 UTC
Also weird thing is that when i do following:

echo s > /proc/sysrq-trigger
echo u > /proc/sysrq-trigger
echo o > /proc/sysrq-trigger

i see message printed by watchdog driver .shutdown callback. which means devices are shut down after all... unless there is some special case that only shuts down watchdog drivers during sysrq poweroff...
Comment 15 Tomas Mudrunka 2024-10-09 13:32:21 UTC
Ok, i've just went through elixir and found out that
echo o > /proc/sysrq-trigger

actualy leads to kernel_power_off() which is function described as "Shutdown everything and perform a clean system power_off." and which then calls kernel_shutdown_prepare() which calls device_shutdown()

weirdly enough this is only true for 'o' and not for 'b', because 'o' handler is not pre-registered in sysrq.c (unlike 'b') and is additionaly registered by pm_sysrq_init() in kernel/power/poweroff.c. which seems bit inconsistent at least, but i am not aware of reasons behind this design.
Comment 16 Alan Stern 2024-10-09 14:19:51 UTC
Now you're discussing issues involving power management.  These are not appropriate for either the USB or the SCSI people; you should contact the linux-pm@vger.kernel.org mailing list about these questions.
Comment 17 Michał Pecio 2024-10-09 17:44:50 UTC
These operations are asynchronous and they don't even seem to be synchronized against each other. This is what I got from s followed u:

[ 9oct 19:21] sysrq: Emergency Sync
[  +0,000028] sysrq: Emergency Remount R/O
[  +0,019142] EXT4-fs (sda2): re-mounted ef717e3d-539d-4988-9a02-ca541d643379 ro. Quota mode: disabled.
[  +0,002201] Emergency Sync complete
--- some long error message here
              Emergency Remount complete

If subsequent o would execute before the u is done, the u would only increase risk of data loss by making the disk busy right before powerdown and by exercising the remount code against a concurrent loss of power, which might be a neglected case due to its rarity.

You will never know whom to blame without hours of debugging. Myself, I would try
sync ; poweroff -f
Comment 18 Tomas Mudrunka 2024-10-10 07:10:07 UTC
But sync followed by poweroff -f  does not unmount the filesystems...

Note You need to log in before you can comment on or make changes to this bug.