Bug 219773 - External USB disk drive and SSD corruption while connected to USB 3 ports.
Summary: External USB disk drive and SSD corruption while connected to USB 3 ports.
Status: NEW
Alias: None
Product: Drivers
Classification: Unclassified
Component: USB (show other bugs)
Hardware: All Linux
: P3 blocking
Assignee: Default virtual assignee for Drivers/USB
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2025-02-11 21:51 UTC by Frederic Bezies
Modified: 2025-02-27 13:25 UTC (History)
1 user (show)

See Also:
Kernel Version:
Subsystem:
Regression: No
Bisected commit-id:


Attachments
dmesg log with errors (3.07 KB, text/plain)
2025-02-11 21:51 UTC, Frederic Bezies
Details
dmidecode log (17.22 KB, text/plain)
2025-02-11 21:52 UTC, Frederic Bezies
Details
smartctl infos (14.29 KB, text/plain)
2025-02-12 18:28 UTC, Frederic Bezies
Details
/proc/dynamic_debug/control content (799.80 KB, text/plain)
2025-02-15 15:09 UTC, Frederic Bezies
Details
dmesg log using an USB HDD (1.47 KB, text/plain)
2025-02-27 13:22 UTC, Frederic Bezies
Details
usbmon output (1.07 MB, application/gzip)
2025-02-27 13:25 UTC, Frederic Bezies
Details

Description Frederic Bezies 2025-02-11 21:51:00 UTC
Created attachment 307616 [details]
dmesg log with errors

Hello.

I noticed recently that on my motherboard - MSI A520M Pro - I got on the long run partition table corruption. When I connect an external SSD drive, My dmesg log is plagued with lines like:


[  114.674453] usb 6-2: reset SuperSpeed Plus Gen 2x1 USB device number 2 using xhci_hcd

I see it on both 6.12.13 and 6.13.1 kernel on my archlinux.

Adding both full dmesg log and dmidecode log.
Comment 1 Frederic Bezies 2025-02-11 21:52:11 UTC
Created attachment 307617 [details]
dmidecode log
Comment 2 Frederic Bezies 2025-02-12 18:27:44 UTC
I also launch both:

sudo fdisk -l /dev/sdb
sudo smartctl -x /dev/sdb

First command:

sudo fdisk -l /dev/sdb
Disk /dev/sdb: 465,76 GiB, 500107862016 bytes, 976773168 sectors
Disk model: MobileDataStar  
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: gpt
Disk identifier: 0D820E40-5858-9090-8081-828310111213

Device     Start       End   Sectors   Size Type
/dev/sdb1   2048 976773119 976771072 465,8G Microsoft basic data

Second command? See attached file.
Comment 3 Frederic Bezies 2025-02-12 18:28:05 UTC
Created attachment 307639 [details]
smartctl infos
Comment 4 Frederic Bezies 2025-02-14 10:00:13 UTC
Someone on this Archlinux forum thread - https://bbs.archlinux.org/viewtopic.php?pid=2226102 - told me to try disabling UAS for the external SSD / HHD. But without luck, still getting the usb reset error message.
Comment 5 Michał Pecio 2025-02-15 12:38:42 UTC
Hi,

1. According to your dmesg snippet UAS was already disabled back then.

2. AFAIK usb-storage uses device reset to recover from various errors, maybe this is simply a matter of poor USB link quality. Try:

echo 'func handle_tx_event +p' >/proc/dynamic_debug/control

3. Corruption sounds bad. Is it reproducible, i.e. you write more data and more problems show up? Are things still broken when the disk is read by other machines?

4. Does the same disk work any better on other machines?

5. FYI, some buggy USB SATA bridges report smaller than actual capacity, which can cause problems with reading GPT tables at the end of the disk.
Comment 6 Frederic Bezies 2025-02-15 15:09:24 UTC
Hello.

1. OK. Did not noticed it.

2. When I try this command line, I got zsh: permission denied: /proc/dynamic_debug/control

3. I tried on other computers and not corruption problem occurs.

4. Yes.

I tried less /proc/dynamic_debug/control and got an enormous output. Adding it if it helps knowing what's going on.
Comment 7 Frederic Bezies 2025-02-15 15:09:52 UTC
Created attachment 307663 [details]
/proc/dynamic_debug/control content
Comment 8 Michał Pecio 2025-02-15 15:58:10 UTC
> 2. When I try this command line, I got zsh: permission denied:
> /proc/dynamic_debug/control
You need to be root for this and sudo won't help you without extra steps:
https://stackoverflow.com/questions/82256/how-do-i-use-sudo-to-redirect-output-to-a-location-i-dont-have-permission-to-wr

Once this works please run dmesg again and see if something new shows up between those "reset USB device" messages.
Comment 9 Frederic Bezies 2025-02-15 16:51:03 UTC
Modification done. I tried with my external USB HDD copying big files. I do not have access of the previous USB peripheral .

Here is the output while copying 2 big tar.xz archives (6 Go each).


[11012.004194] sd 6:0:0:0: [sdb] 976773164 512-byte logical blocks: (500 GB/466 GiB)
[11012.004519] sd 6:0:0:0: [sdb] Write Protect is off
[11012.004523] sd 6:0:0:0: [sdb] Mode Sense: 23 00 00 00
[11012.004846] sd 6:0:0:0: [sdb] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
[11012.070815]  sdb: sdb1
[11012.070970] sd 6:0:0:0: [sdb] Attached SCSI disk
[11012.324232] xhci_hcd 0000:30:00.3: Stalled endpoint for slot 1 ep 2
[11355.077282] usb 4-1: USB disconnect, device number 2
[11355.166387] sd 6:0:0:0: [sdb] Synchronizing SCSI cache
[11355.166436] sd 6:0:0:0: [sdb] Synchronize Cache(10) failed: Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK


Weird there is no other output. I'll try another external USB SSD as soon as possible.
Comment 10 Alan Stern 2025-02-15 18:41:05 UTC
Sometimes intermittent errors are caused by a marginal or insufficient power supply.  Maybe the USB-3 ports on your computer don't provide quite enough power for the drive to work properly.

Does the SSD drive have its own power supply?  If it doesn't, have you tried putting a powered USB hub between the computer and the drive?
Comment 11 Frederic Bezies 2025-02-15 18:45:31 UTC
(In reply to Alan Stern from comment #10)
> Sometimes intermittent errors are caused by a marginal or insufficient power
> supply.  Maybe the USB-3 ports on your computer don't provide quite enough
> power for the drive to work properly.

The motherboard is new (3 months old) and same age for the power.

> 
> Does the SSD drive have its own power supply?  If it doesn't, have you tried
> putting a powered USB hub between the computer and the drive?

USB SSD was powered by the motherboard. And I do not own any powered USB hub.
Comment 12 Alan Stern 2025-02-15 20:08:18 UTC
The age doesn't matter.

When you say the SSD was powered by the motherboard, do you mean there was a separate connection to the motherboard (not part of the USB cable) providing power for the drive?  Or do you mean that the drive received its power over the USB cable, which was plugged into the motherboard?

Even if you don't own a powered USB hub, you may be able to borrow one or buy one cheaply.  (There are some available on Amazon for under $15.)

I admit there's a good chance that this is not the explanation for your problems.  But it might be.  It would explain why the drive works with other computers but not with yours.
Comment 13 Frederic Bezies 2025-02-15 20:25:32 UTC
(In reply to Alan Stern from comment #12)
[...]
> 
> When you say the SSD was powered by the motherboard, do you mean there was a
> separate connection to the motherboard (not part of the USB cable) providing
> power for the drive?  Or do you mean that the drive received its power over
> the USB cable, which was plugged into the motherboard?

No separate connection. The power was received through the USB cable.

[...]
> 
> I admit there's a good chance that this is not the explanation for your
> problems.  But it might be.  It would explain why the drive works with other
> computers but not with yours.

It could be an answer to my problem, even if I doubt it.
Comment 14 Frederic Bezies 2025-02-16 16:24:09 UTC
I made an experience. A friend of mine swapped for a test my nvme with Archlinux - on my PC - on it by one with MS-Win11. I plugged the SSD in one of the motherboard USB port and no corruption or data loss occured, plugging / unplugging it a few times and nothing wrong happened.

So it looks like it is a bug in the USB ports management of my motherboard with linux. I plugged the SSD on one of the USB port on my Pi 4 and access it with NFS. No problems at all.
Comment 15 Alan Stern 2025-02-17 14:36:23 UTC
One possibility is that the SSD doesn't like LPM.  You can disable LPM by writing

   0dd8:0562:k

to /sys/module/usbcore/parameters/quirks before plugging in the drive.

If that doesn't make any difference, you can try collecting a usbmon trace that shows the error occurring.  Warning: The usbmon output file is likely to be enormous, and the interesting part will be only the stuff that gets written when the error happens.
Comment 16 Frederic Bezies 2025-02-17 15:53:13 UTC
(In reply to Alan Stern from comment #15)
> One possibility is that the SSD doesn't like LPM.  You can disable LPM by
> writing
> 
>    0dd8:0562:k
> 
> to /sys/module/usbcore/parameters/quirks before plugging in the drive.

As I said in comment 9, I don't any have access at all to the external SSD with the usv reset spam in dmesg.

> 
> If that doesn't make any difference, you can try collecting a usbmon trace
> that shows the error occurring.  Warning: The usbmon output file is likely
> to be enormous, and the interesting part will be only the stuff that gets
> written when the error happens.

I will need to buy another external SSD and see it I still see this bug. But it will take me around two weeks to do so :/
Comment 17 Frederic Bezies 2025-02-17 17:48:14 UTC
Some more infos. I made some research - searching for A520M USB problems - and found these forums threads:

* https://forums.tomshardware.com/threads/about-amd-usb-issues.3698102/ (From April 2021). No solutions found.
* https://community.amd.com/t5/general-discussions/a520m-boards-usb/td-p/545490 (From 2022). Solution? Buying an PCI-E USB card to avoid using the ports from the motherboard

Not a big fan of avoiding USB ports from the motherboard.
Comment 18 Frederic Bezies 2025-02-20 09:32:39 UTC
Some additional infos. I thought it was an Archlinux bug. So I tried both Manjaro and Fedora live USB.

And nothing changed.Ut is really a bad management of this motherboard USB ports.

Hardware is not guilty here.
Comment 19 Frederic Bezies 2025-02-20 09:34:01 UTC
Oops. I meant USB HDD / SSD peripherals are not guilty here.
Comment 20 Frederic Bezies 2025-02-27 13:21:19 UTC
So I found an old 320 Gb USB HDD. Connected it to an USB port of my motherboard.

It is nearly empty, so corruption is not really a problem for it. I copied a 3 Gb ISO image.

I'm adding both dmesg.log and usbmon output.

Looks like the "reset" lines only occurs with USB SSD... And as I don't have one under my hand for now... Well, this annoying bug is annoying to reproduce easily.
Comment 21 Frederic Bezies 2025-02-27 13:22:39 UTC
Created attachment 307720 [details]
dmesg log using an USB HDD
Comment 22 Frederic Bezies 2025-02-27 13:25:50 UTC
Created attachment 307721 [details]
usbmon output

Note You need to log in before you can comment on or make changes to this bug.