Bug 79901 - [BISECTED]Extremely slow boot on Promise VTrak E610f due to sd_mod RSOC usage
Summary: [BISECTED]Extremely slow boot on Promise VTrak E610f due to sd_mod RSOC usage
Status: NEW
Alias: None
Product: IO/Storage
Classification: Unclassified
Component: SCSI (show other bugs)
Hardware: x86-64 Linux
: P1 normal
Assignee: linux-scsi@vger.kernel.org
Depends on:
Reported: 2014-07-10 13:23 UTC by Janusz Dziemidowicz
Modified: 2014-07-29 20:26 UTC (History)
1 user (show)

See Also:
Kernel Version: 3.14.7
Tree: Mainline
Regression: Yes

Patch adding BLIST_NO_RSOC scsi scan flag (2.59 KB, patch)
2014-07-24 14:07 UTC, Janusz Dziemidowicz
Details | Diff

Description Janusz Dziemidowicz 2014-07-10 13:23:57 UTC
Recently I've started upgrading all of my machines to kernel 3.14 (from Debian wheezy backports to be precise). Mostly there were not problems, but I've stumbled upon weird behavior on Fibre Channel servers (QLogic cards inside HP blades) using Promise VTrak E610f arrays.

As soon as SCSI subsystem tries to detect partitions a lot of SCSI errors are reported. The system stalls (but initramfs is responsive) for about 20-30 minutes (depending on number of arrays and LUNs). After that time, disk detection finishes and system continues booting as usual. Everything works perfectly afterwards.

I've spent some time fiddling with qla2xxx driver versions, SCSI scanning options and anything else I could think of. Finally, I was able to find culprit. The problem lies in sd_mod usage of scsi_report_opcode(). This function is used to determine if the disk supports WRITE SAME command. It does so by issuing REPORT SUPPORTED OPERATION CODES command. Unfortunately, it seems Promise VTrak E610f really, really does not like RSOC. As soon as RSOC is issued the array stalls for a while, then kernel tries to abort the command and finally it must reset the port. Fortunately the array starts working again after the reset. I've also verified this behavior with sg_opcodes utility.

Commit that introduced RSOC usage in sd_mod: http://git.kernel.org/cgit/linux/kernel/git/stable/linux-stable.git/commit/?id=98dcc2946adbe4349ef1ef9b99873b912831edd4
Removing it fixes the issue.

I'm not sure what is the correct way to fix this as I'm not very familiar with SCSI spec. If RSOC support cannot be reliably determined then probably some kind of blacklist should be introduced.

As a workaround, I've modified qla2xxx driver to set 'no_write_same' flag. While not directly related it forces sd_mod not to issue RSOC and it is easier for me to ship Debian package with modified single driver (I'd prefer not to manage my own kernel packages).
Comment 1 Janusz Dziemidowicz 2014-07-24 14:07:07 UTC
Created attachment 144101 [details]
Patch adding BLIST_NO_RSOC scsi scan flag

As discussed on the list, attached simple patch that blacklist RSOC on Promise VTrak E610f
Comment 2 Alan 2014-07-29 16:05:09 UTC
See Documentation/SubmittingPatches - we need a valid email/Signed-off-by for submissions
Comment 3 Janusz Dziemidowicz 2014-07-29 19:07:51 UTC
I've read that document before creating the patch. Now, I've read it again. I must say I'm at a loss. In the patch attached to this bug entry I've already included Signed-off-by line.
Can anyone point precisely what is wrong here?
Comment 4 Alan 2014-07-29 20:26:34 UTC
It needs to go via email. If you send it to linux-scsi@vger.kernel.org then the right things should happen (feel free to cc me as well so I can help keep an eye on it)

Note You need to log in before you can comment on or make changes to this bug.