Bug 151631
Summary: | "Synchronizing SCSI cache" fails during(and delays) reboot/shutdown | ||
---|---|---|---|
Product: | IO/Storage | Reporter: | elvis has left the building (icanrealizeum+bugzillakernelorg) |
Component: | SCSI | Assignee: | linux-scsi (linux-scsi) |
Status: | NEW --- | ||
Severity: | normal | CC: | dennyvatwork, fatalfeel, forums0, gianpaoloc, icanrealizeum+bugzillakernelorg, sdiconov |
Priority: | P1 | ||
Hardware: | All | ||
OS: | Linux | ||
Kernel Version: | <=4.7.0-g0cbbc42 4.7.0-g52ddb7e | Subsystem: | |
Regression: | Yes | Bisected commit-id: | |
Attachments: |
for systemd, /usr/lib/systemd/system-shutdown/debug.sh
Patch reverting commit 2c85025c75dfe7ddc2bb33363a998dad59383f94 |
Description
elvis has left the building
2016-08-06 17:17:15 UTC
I tried one shutdown, and after the above it did shutdown, as expected, however, on poweron, SSD was not detected by BIOS unless I did one ctrl+alt+del from POST's attempt to boot from LAN, and this second time it was detected. This happens usually after I powerdown from button(hold 4 sec) because I imagine SSD is busy recovering from sudden-power-loss internally, so I can only conclude that the kernel didn't safely powerdown the SSD(aside from the above failure to flush cache). (Was definitely ok in 4.7.0-rc6-ga99cde4) I did however have a debug.sh script in /usr/lib/systemd/system-shutdown/debug.sh which did these: sync && sdparm --command=sync /dev/sda && sleep 1 mount -o remount,ro / hdparm -F /dev/sda hdparm -f /dev/sda sleep 1 So in a way, that ssd cache was supposedly flushed regardless; and it doesn't seem to me that something else ever tried to remount,rw or do writes after this. Created attachment 227801 [details] for systemd, /usr/lib/systemd/system-shutdown/debug.sh Found workaround, thanks to this article: https://unix.stackexchange.com/questions/55281/how-to-stop-waking-all-attached-drives-on-reboot-deactivating-swap/55417#55417 If I put the following in aforementioned debug.sh file then reboot is instant(after the SCSI bug, of course): #THIS: #turn off drive cache hdparm -W0 /dev/sda #^^^^^^^^^^^^^^^^^^^^^^ #this was already here before: #flush drive cache: hdparm -F /dev/sda hdparm -f /dev/sda Ok, now that the cache error doesn't happen and reboot works, I've tested shutdown and it now fails at the "Stopping disk" part, which means this line: sd_start_stop_device(sdkp, 0); ata4.00: failed command: STANDBY IMMEDIATE screenshot: https://i.imgur.com/2yvAMSR.jpg So now we know that it did in fact not safely shutdown the drive before, when I thought it's just the drive cache flushing that failed. So the drive is doing recovery, internally, upon laptop poweron which is why it doesn't identify itself in time but only on next ctrl+alt+del boot. (see comment #1) I have worked around this issue(for me) by adding a few lines in debug.sh like: if test "$1" == "poweroff"; then hdparm -Y /dev/sda echo o > /proc/sysrq-trigger ; sleep 5 fi that puts drive to sleep and triggers shutdown via sysrq Seems to have worked just fine: no drive issues on startup anymore, as I've mentioned before with having to ctrl+alt+del once to get BIOS to detect drive; and no extra delays. Also the above require a kernel patch: //sd_start_stop_device(sdkp, 0); commenting out that call in sd_shutdown function in file: drivers/scsi/sd.c or else it will fail like in comment #3 Created attachment 243891 [details] Patch reverting commit 2c85025c75dfe7ddc2bb33363a998dad59383f94 This patch solved bug https://bugzilla.kernel.org/show_bug.cgi?id=187061 which I suspect being a duuplicate of this bug. If someone affected by this bug could test it and confirm it works (it should work both with v4.8.6 and v4.9-rc4). I can confirm exactly the same bug on my Dell Latitude E5450 with an SSD (Samsung 840) running Fedora 24 and kernel 4.8.6 (4.7.9 was fine). The patch provided by Gianpaolo solved the issue on my machine. Very similar (if not same) problem happening on my works PC - a Dell Precision T1700. I will try and get a screenshot and add it to this thread. The problem began with Kernel 4.8.6 and is still present (currently running Kernel 4.8.11). Very annoying on shutdown. I have absolutely no idea if the disks' cache is actually flushing or not - so data integrity is a concern. Its a magnetic disk and I have been running fsck on it regularly (no issues found as of yet) and been avoiding shutting the PC down. Point of this comment: Its still an issue on some PCs. Anyone else finding the same? @Rich, have a look at https://bugzilla.kernel.org/show_bug.cgi?id=187061 Bug has been resolved it 4.9-rc7. I am experiencing the Synchronize Cache problem with both 4.4.43 and 4.8.17. My dmesg yields: [ 428.593067] sd 5:0:0:0: [sdd] Synchronizing SCSI cache [ 428.716643] sd 5:0:0:0: [sdd] Synchronize Cache(10) failed: Result: hostbyte=DID_ERROR driverbyte=DRIVER_OK When I try to attach an external USB3 drive (over an USB2 interface). It work well for the first time, but later, after I unmount and disconnect it, any subsequent attempt to attacj the same hdd results in this bug. The disk cannot be mounted for the second time. Fs is reiserfs. It happens with kernel 4.4.43 too, One further observation. The drive was attached to an USB2 controller 00:1d.0 USB controller: Intel Corporation 82801DB/DBL/DBM (ICH4/ICH4-L/ICH4-M) USB UHCI Controller through an USB hub, which has both USB2 and USB3 ports. The bug occurs when I try to reattach the hdd to the same type of ports. The HDD works when I change the type of ports (USB2 > USB3 or USB3 > USB2). It happens again if I do not change the type of connection. It might be a different USB-related problem, but the error message is the same "Synchronize Cache(10) failed" I met this problem too, before reboot do follow command echo "1" > /sys/block/[sdx]/device/delete remove device can reboot good! test ok //if under linux 4.12 (include 4.12) add this patch void __scsi_remove_device(struct scsi_device *sdev) { struct device *dev = &sdev->sdev_gendev; int res; int ret; /* * This cleanup path is not reentrant and while it is impossible * to get a new reference with scsi_device_get() someone can still * hold a previously acquired one. */ if (sdev->sdev_state == SDEV_DEL) return; if (sdev->is_visible) { /* * If scsi_internal_target_block() is running concurrently, * wait until it has finished before changing the device state. */ mutex_lock(&sdev->state_mutex); /* * If blocked, we go straight to DEL and restart the queue so * any commands issued during driver shutdown (like sync * cache) are errored immediately. */ res = scsi_device_set_state(sdev, SDEV_CANCEL); //if (res != 0) { ret = scsi_device_set_state(sdev, SDEV_DEL); if ( !ret ) { scsi_start_queue(sdev); } //} mutex_unlock(&sdev->state_mutex); if (res != 0) return; bsg_unregister_queue(sdev->request_queue); device_unregister(&sdev->sdev_dev); transport_remove_device(dev); scsi_dh_remove_device(sdev); device_del(dev); } else { put_device(&sdev->sdev_dev); } /* * Stop accepting new requests and wait until all queuecommand() and * scsi_run_queue() invocations have finished before tearing down the * device. */ mutex_lock(&sdev->state_mutex); scsi_device_set_state(sdev, SDEV_DEL); mutex_unlock(&sdev->state_mutex); blk_cleanup_queue(sdev->request_queue); cancel_work_sync(&sdev->requeue_work); if (sdev->host->hostt->slave_destroy) { sdev->host->hostt->slave_destroy(sdev); } transport_destroy_device(dev); /* * Paired with the kref_get() in scsi_sysfs_initialize(). We have * remoed sysfs visibility from the device, so make the target * invisible if this was the last device underneath it. */ scsi_target_reap(scsi_target(sdev)); put_device(dev); } //if above linux 4.13 (include 4.13) void __scsi_remove_device(struct scsi_device *sdev) { struct device *dev = &sdev->sdev_gendev; int res; #ifdef MY_PATCH int ret; #endif /* * This cleanup path is not reentrant and while it is impossible * to get a new reference with scsi_device_get() someone can still * hold a previously acquired one. */ if (sdev->sdev_state == SDEV_DEL) return; if (sdev->is_visible) { /* * If scsi_internal_target_block() is running concurrently, * wait until it has finished before changing the device state. */ mutex_lock(&sdev->state_mutex); /* * If blocked, we go straight to DEL and restart the queue so * any commands issued during driver shutdown (like sync * cache) are errored immediately. */ res = scsi_device_set_state(sdev, SDEV_CANCEL); #ifdef MY_PATCH ret = scsi_device_set_state(sdev, SDEV_DEL); if ( !ret ) scsi_start_queue(sdev); #else if (res != 0) { res = scsi_device_set_state(sdev, SDEV_DEL); if (res == 0) scsi_start_queue(sdev); } #endif mutex_unlock(&sdev->state_mutex); if (res != 0) return; if (sdev->host->hostt->sdev_groups) sysfs_remove_groups(&sdev->sdev_gendev.kobj, sdev->host->hostt->sdev_groups); bsg_unregister_queue(sdev->request_queue); device_unregister(&sdev->sdev_dev); transport_remove_device(dev); device_del(dev); } else put_device(&sdev->sdev_dev); /* * Stop accepting new requests and wait until all queuecommand() and * scsi_run_queue() invocations have finished before tearing down the * device. */ mutex_lock(&sdev->state_mutex); scsi_device_set_state(sdev, SDEV_DEL); mutex_unlock(&sdev->state_mutex); blk_cleanup_queue(sdev->request_queue); cancel_work_sync(&sdev->requeue_work); if (sdev->host->hostt->slave_destroy) sdev->host->hostt->slave_destroy(sdev); transport_destroy_device(dev); /* * Paired with the kref_get() in scsi_sysfs_initialize(). We have * removed sysfs visibility from the device, so make the target * invisible if this was the last device underneath it. */ scsi_target_reap(scsi_target(sdev)); put_device(dev); } |