Running a fio test on an initiator system with an 8 Gb/s QLogic FC adapter revealed a bottleneck in the qla2xxx initiator driver - lock contention on ha->hardware_lock. The test that revealed this is as follows: - On a target system with 4 CPU threads (Intel i5), an 8 Gb/s QLogic FC HBA and kernel 3.12.7, download the SCST trunk r5194, build it in release mode, load the brd kernel module and configure SCST such that it exports /dev/ram[0123] via the vdisk_blockio driver. Set the vdisk_blockio parameter threads_num to 2. Export these four RAM disks as LUNs 0..3. - On an initiator system with 12 CPU threads (Intel Core i7 with hyperthreading enabled), an 8 Gb/s QLogic HBA and kernel 3.12.7, run the following fio job (where /dev/sd[cdef] corresponds to the SCST LUNs): fio --bs=4K --ioengine=libaio --rw=randrw --buffered=0 --numjobs=12 \ --iodepth=16 --iodepth_batch=8 --iodepth_batch_complete=8 \ --thread --loops=$((2**31)) --runtime=60 --group_reporting \ --gtod_reduce=1 --invalidate=1 \ $(for d in /dev/sd[cdef]; do echo --name=$d --filename=$d; done) - While this fio job is running, run the following commands: perf record -ag sleep 10 perf report –stdio >perf-report-fc.txt The perf report shows that quite some time is spent in the spin_lock_irqsave() call invoked from qla24xx_dif_start_scsi(). Does this mean that this test revealed lock contention on ha->hardware_lock ?
Created attachment 123011 [details] perf report --stdio output