Created attachment 293673 [details] sysrq-l-t-1.log linux boot sometimes hang around scsi_try_target_reset for a system with SSD/SATA 1.92T *10 console output: A start job is running for dev-disk-by\x2dlabel-OS_T604.device *** always waiting here Failed to login, so only gathered SysRq info(saved in sysrq-l-t-1.log, sysrq-l-t-2.log). - sysrq l command - sysrq t command From SysRq info, it seem a problem around scsi_try_target_reset(); [ 744.388123] scsi_eh_9 S 0 604 2 0x80004000 [ 744.388124] Call Trace: [ 744.388125] __schedule+0x285/0x6e0 [ 744.388126] ? scsi_try_target_reset+0x90/0x90 [ 744.388127] schedule+0x2f/0xa0 [ 744.388128] scsi_error_handler+0x1c4/0x500 [ 744.388129] ? scsi_eh_get_sense+0x220/0x220 [ 744.388130] kthread+0x112/0x130 [ 744.388131] ? kthread_park+0x80/0x80 [ 744.388132] ret_from_fork+0x1f/0x40 Frequency: about 10% only happened in a server with btrfs RAID0 & 10 SSD. yet not happened in another two server with the same kernel OS: centos 7.8/7.9 kernel version: 5.4.76, 5.4.74, 5.4.73, and others 5.4.x # lsscsi [14:0:0:0] disk ATA MK1920GFDKU HPG0 /dev/sda [14:0:1:0] disk ATA MK1920GFDKU HPG0 /dev/sdb [14:0:2:0] disk ATA MK1920GFDKU HPG0 /dev/sdc [14:0:3:0] disk ATA MK1920GFDKU HPG0 /dev/sdd [14:0:4:0] disk ATA MK1920GFDKU HPG0 /dev/sde [14:0:6:0] disk ATA SAMSUNG MZ7KM1T9 003Q /dev/sdf [14:0:7:0] disk ATA SAMSUNG MZ7KM1T9 003Q /dev/sdg [14:0:8:0] disk ATA SAMSUNG MZ7KM1T9 003Q /dev/sdh [14:0:9:0] disk ATA SAMSUNG MZ7KM1T9 003Q /dev/sdi [14:0:10:0] disk ATA SAMSUNG MZ7KM1T9 003Q /dev/sdj
Created attachment 293675 [details] sysrq-l-t-2.log
or maybe some problem of megaraid_sas.ko? This is a Dell PERCH730P(Broadcom / LSI MegaRAID SAS-3 3108 )with the lastest firmware from Dell. # modinfo megaraid_sas filename: /lib/modules/5.4.76-1.2.el7.x86_64/kernel/drivers/scsi/megaraid/megaraid_sas.ko.xz description: Broadcom MegaRAID SAS Driver author: megaraidlinux.pdl@broadcom.com version: 07.710.50.00-rc1
I tested it with 5.10-rc4 too. 5.10-rc4 always failed to boot on centos 7.9, but x-systemd.device-timeout=120 works, so we can login to check more. blkid show megaraid_sas works well, so it should be a problem of systemd or d-bus of centos 7.9? althrough yet not full resolved, Let's close it firstly.