Created attachment 282263 [details] dmesg file I am noticing jbd2 getting blocked on commit_transaction causing processes to go in D state for good amount of time, it comes out of that state automatically. I am dumping the sysrq-w and sysrq-l Apr 10 15:47:23 ex-etp3 kernel:sysrq: SysRq : Show Blocked State Apr 10 15:47:23 ex-etp3 kernel: task PC stack pid father Apr 10 15:47:23 ex-etp3 kernel:jbd2/sda10-8 D 0 2573 2 0x00000000 Apr 10 15:47:23 ex-etp3 kernel: ffff8818347c97c0 0000000000000000 ffff88101c9d0140 ffff8817ff898080 Apr 10 15:47:23 ex-etp3 kernel: ffff88101fd17e40 ffff8817fec47b38 ffffffff816bcf06 ffff8807ff66b040 Apr 10 15:47:23 ex-etp3 kernel: ffff8817ff898080 7fffffffffffffff ffffffff81c06930 ffffffff816bdae0 Apr 10 15:47:23 ex-etp3 kernel:Call Trace: Apr 10 15:47:23 ex-etp3 kernel: [<ffffffff816bcf06>] ? __schedule+0x1b6/0x570 Apr 10 15:47:23 ex-etp3 kernel: [<ffffffff816bdae0>] ? bit_wait+0x50/0x50 Apr 10 15:47:23 ex-etp3 kernel: [<ffffffff816bd2f4>] ? schedule+0x34/0x80 Apr 10 15:47:23 ex-etp3 kernel: [<ffffffff816bfd25>] ? schedule_timeout+0x165/0x290 Apr 10 15:47:23 ex-etp3 kernel: [<ffffffff81264366>] ? generic_make_request+0x116/0x290 Apr 10 15:47:23 ex-etp3 kernel: [<ffffffff816bdae0>] ? bit_wait+0x50/0x50 Apr 10 15:47:23 ex-etp3 kernel: [<ffffffff816bcce9>] ? io_schedule_timeout+0x99/0x100 Apr 10 15:47:23 ex-etp3 kernel: [<ffffffff816bdaf2>] ? bit_wait_io+0x12/0x60 Apr 10 15:47:23 ex-etp3 kernel: [<ffffffff816bd725>] ? __wait_on_bit+0x55/0x80 Apr 10 15:47:23 ex-etp3 kernel: [<ffffffff816bdae0>] ? bit_wait+0x50/0x50 Apr 10 15:47:23 ex-etp3 kernel: [<ffffffff816bd919>] ? out_of_line_wait_on_bit+0x79/0xa0 Apr 10 15:47:23 ex-etp3 kernel: [<ffffffff8109d970>] ? autoremove_wake_function+0x30/0x30 Apr 10 15:47:23 ex-etp3 kernel: [<ffffffff8122301f>] ? jbd2_journal_commit_transaction+0x198f/0x1b00 Apr 10 15:47:23 ex-etp3 kernel: [<ffffffff81085b41>] ? finish_task_switch+0x81/0x1f0 Apr 10 15:47:23 ex-etp3 kernel: [<ffffffff81226150>] ? kjournald2+0xc0/0x260 Apr 10 15:47:23 ex-etp3 kernel: [<ffffffff8109d940>] ? wake_up_bit+0x20/0x20 Apr 10 15:47:23 ex-etp3 kernel: [<ffffffff81226090>] ? commit_timeout+0x10/0x10 Apr 10 15:47:23 ex-etp3 kernel: [<ffffffff81080922>] ? kthread+0xd2/0xf0 Apr 10 15:47:23 ex-etp3 kernel: [<ffffffff816c5645>] ? __clear_rsb+0x25/0x3d Apr 10 15:47:23 ex-etp3 kernel: [<ffffffff81080850>] ? kthread_park+0x50/0x50 Apr 10 15:47:23 ex-etp3 kernel: [<ffffffff81068beb>] ? SyS_exit_group+0xb/0x10 Apr 10 15:47:23 ex-etp3 kernel: [<ffffffff816c0ff4>] ? ret_from_fork+0x44/0x70
Created attachment 282265 [details] kern log
Created attachment 282267 [details] kern log
Created attachment 282269 [details] kern log
What is your hardware configuration --- in particular, what sort of HDD or SSD are you using? And what are you running on the system? This looks very much like I/O is just getting very badly bottlenecked. When did this start? If it was OK on an older kernel version, which one? Are you using some Linux distribution?
Here is the information: # show raid Raid configuration: Model: LSI MegaRAID SAS 9260-4i Serial: SK72468112 Firmware Version: 2.130.383-2315 Apr 9 00:44:22 ex-etp3 kernel:scsi host0: Avago SAS based MegaRAID driver Apr 9 00:44:22 ex-etp3 kernel:scsi 0:2:0:0: Direct-Access LSI MR9260-4i 2.13 PQ: 0 ANSI: 5 Apr 9 00:44:22 ex-etp3 kernel:sd 0:2:0:0: [sda] 1171062784 512-byte logical blocks: (600 GB/558 GiB) Apr 9 00:44:22 ex-etp3 kernel:sd 0:2:0:0: Attached scsi generic sg0 type 0 Apr 9 00:44:22 ex-etp3 kernel:sd 0:2:0:0: [sda] Write Protect is off Apr 9 00:44:22 ex-etp3 kernel:sd 0:2:0:0: [sda] Mode Sense: 1f 00 10 08 Apr 9 00:44:22 ex-etp3 kernel:sd 0:2:0:0: [sda] Write cache: disabled, read cache: enabled, supports DPO and FUA [admin@ex-etp3 ~]# df -h Filesystem Size Used Avail Use% Mounted on /dev/sda5 2.0G 1.8G 106M 95% / tmpfs 63G 3.0G 60G 5% /dev/shm /dev/sda2 120M 30M 84M 27% /boot /dev/sda1 119M 1.8M 111M 2% /bootmgr /dev/sda8 182M 50M 123M 29% /config /dev/sda9 30G 22G 6.3G 78% /var /dev/sda10 453G 329G 102G 77% /data tmpfs 63G 24K 63G 1% /run tmpfs 64M 980K 64M 2% /vtmp [admin@ex-etp3 ~]# mount /dev/sda5 on / type ext3 (rw) proc on /proc type proc (rw) sysfs on /sys type sysfs (rw) devpts on /dev/pts type devpts (rw,gid=5,mode=620) tmpfs on /dev/shm type tmpfs (rw) /dev/sda2 on /boot type ext3 (ro,noatime,noexec) /dev/sda1 on /bootmgr type ext3 (ro,noatime,noexec) /dev/sda8 on /config type ext3 (rw,noatime,noexec) /dev/sda9 on /var type ext3 (rw,noatime) /dev/sda10 on /data type ext3 (rw,noatime,barrier=0) tmpfs on /dev/shm type tmpfs (rw) tmpfs on /run type tmpfs (rw,nosuid,nodev,strictatime,mode=0755) tmpfs on /vtmp type tmpfs (rw,size=64M) description: Motherboard product: H8DGU vendor: Supermicro physical id: 0 version: 1234567890 serial: NM179S600489 slot: To Be Filled By O.E.M. *-firmware description: BIOS vendor: American Megatrends Inc. physical id: 0 version: FI3.0c date: 12/31/2013 cpu:0 description: CPU product: (To Be Filled By O.E.M.) vendor: Advanced Micro Devices [AMD] physical id: 4 bus info: cpu@0 version: AMD Opteron(tm) Processor 6380 *-storage description: RAID bus controller product: MegaRAID SAS 2108 [Liberator] vendor: LSI Logic / Symbios Logic physical id: 0 bus info: pci@0000:02:00.0 logical name: scsi0 version: 05 width: 64 bits clock: 33MHz capabilities: storage pm pciexpress vpd msi msix bus_master cap_list rom configuration: driver=megaraid_sas latency=0 resources: irq:19 ioport:c000(size=256) memory:fd59c000-fd59ffff memory:fd5c0000-fd5fffff memory:fd5a0000-fd5bffff *-disk description: SCSI Disk product: MR9260-4i vendor: LSI physical id: 2.0.0 bus info: scsi@0:2.0.0 logical name: /dev/sda version: 2.13 serial: 003307152fc3005222f0e91a0db00506 size: 558GiB (599GB) kernel:Linux version 4.9.86 (gcc version 4.8.5 20150623 (Red Hat 4.8.5-4) (GCC) ) #1 SMP PREEMPT the system has lot of IO traffic, this happens intermittently. We are seeing issue mainly after upgrading from 4.1 to 4.9.75 and 4.9.86.
I have tried with noop, CFQ and deadline, same results. Io is mainly related to /data partition, which is mounted as : /dev/sda9 on /var type ext3 (rw,noatime) /dev/sda10 on /data type ext3 (rw,noatime,barrier=0) /var mainly has logging part. We have lot of db traffic (postgres) flowing under /data. I have also tried writeback instead of ordered (default) for /data.
Created attachment 282273 [details] iostat
issue is reproducible: I have use dd command to generate file in /data area to compete with the postgres operation and using iostat -x to monitor the io condition. Here is how I create the dd with variation of count value and sleep time: for i in {1..100}; do nice -n 10 dd if=/dev/zero of=/data/tmp/testonly/tmpfile$i.txt count=1024 bs=1048576; echo "done tmpfile $i"; sleep 10; done & I use postgresql utility pgbench paired with other in-house sql commands for test of transactions: su -s /bin/bash postgres -c "/usr/postgresql/bin/pgbench -T 90000 -c 8 bench2" & When I execute the pgbench and dd both..I see the processes going into D state and stuck there for long (atleast for 140 seconds).
Here is the complete information about disk: # /opt/MegaRAID/MegaCli/MegaCli64 -pdlist -aALL Adapter #0 Enclosure Device ID: 252 Slot Number: 0 Drive's position: DiskGroup: 0, Span: 0, Arm: 0 Enclosure position: N/A Device Id: 6 WWN: 5000C500A167B624 Sequence Number: 2 Media Error Count: 0 Other Error Count: 0 Predictive Failure Count: 0 Last Predictive Failure Event Seq Number: 0 PD Type: SAS Raw Size: 558.911 GB [0x45dd2fb0 Sectors] Non Coerced Size: 558.411 GB [0x45cd2fb0 Sectors] Coerced Size: 558.406 GB [0x45cd0000 Sectors] Sector Size: 0 Logical Sector Size: 0 Physical Sector Size: 0 Firmware state: Online, Spun Up Commissioned Spare : No Emergency Spare : No Device Firmware Level: N003 Shield Counter: 0 Successful diagnostics completion on : N/A SAS Address(0): 0x5000c500a167b625 SAS Address(1): 0x0 Connected Port Number: 0(path0) Inquiry Data: SEAGATE ST600MM0088 N003W420R7PJ FDE Capable: Not Capable FDE Enable: Disable Secured: Unsecured Locked: Unlocked Needs EKM Attention: No Foreign State: None Device Speed: Unknown Link Speed: 6.0Gb/s Media Type: Hard Disk Device Drive: Not Certified Drive Temperature :28C (82.40 F) PI Eligibility: No Drive is formatted for PI information: No PI: No PI Port-0 : Port status: Active Port's Linkspeed: 6.0Gb/s Port-1 : Port status: Active Port's Linkspeed: Unknown Drive has flagged a S.M.A.R.T alert : No
(In reply to Theodore Tso from comment #4) > What is your hardware configuration --- in particular, what sort of HDD or > SSD are you using? And what are you running on the system? > > This looks very much like I/O is just getting very badly bottlenecked. > When did this start? If it was OK on an older kernel version, which one? > Are you using some Linux distribution? Any update or suggestions please?
Looking at your iostat -x output, /dev/sda is completely utilized: Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s sda 0.00 1154.00 47.00 836.00 864.00 10076.00 avgrq-sz avgqu-sz await r_await w_await svctm %util 24.78 136.46 116.05 331.72 103.93 1.13 100.00 Note the 100% utilization and the average queue size of 136.46 I/O requests. So as I said, this sounds like I/O overload. Your workload is simply issuing more I/O than the disk can handle. Why is going to be a userspace issue. If you don't care about stale data (e.g., belonging to deleted files, which may have privacy issues if you care about GDPR, HIPPA, PCI, etc. mandates) you could try mounting with the mount option "data=writeback". It could be that updating to a newer version of the kernel (4.9.86 is quite antique, and you're probably missing all sorts of security and otehr bug fixes) and a newer userspace, and then setting up a new set of file systems formatted with ext4 will also help performance. Or there might be something going on with userspace writes to /dev/sda that shouldn't be happening or which could be optimized away. But I don't see anything here which is a kernel bug.