Bug 213311
Summary: | Resctrl monitoring groups do not work properly on AMD EPYC 7742 | ||
---|---|---|---|
Product: | Platform Specific/Hardware | Reporter: | Paweł Szulik (pawel.szulik) |
Component: | x86-64 | Assignee: | platform_x86_64 (platform_x86_64) |
Status: | NEW --- | ||
Severity: | normal | CC: | babu.moger, bp |
Priority: | P1 | ||
Hardware: | x86-64 | ||
OS: | Linux | ||
Kernel Version: | 5.12.7 | Subsystem: | |
Regression: | No | Bisected commit-id: | |
Attachments: | attachment-27277-0.html |
Description
Paweł Szulik
2021-06-01 15:03:33 UTC
This https://github.com/intel/intel-cmt-cat tool programs the monitoring groups via Model Specific Registers (MSR) on a hardware thread basic and it's working on that hardware. Monitoring of a group of pids feature uses a kernel interface and it's also not working. Changing NUMA node topology via bios does not help. Some of basic information: CPU family: 23 Model: 49 Model name: AMD EPYC 7742 64-Core Processor Stepping: 0 Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc cpuid extd_apicid aperfmperf pni pcl mulqdq monitor ssse3 fma cx16 sse4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs skinit wdt tce topoext perfctr_core perfctr_nb bpex t perfctr_llc mwaitx cpb cat_l3 cdp_l3 hw_pstate ssbd mba ibrs ibpb stibp vmmcall fsgsbase bmi1 avx2 smep bmi2 cqm rdt_a rdseed adx smap clflushopt clwb sha_ni xsaveopt xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_ total cqm_mbm_local clzero irperf xsaveerptr rdpru wbnoinvd amd_ppin arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold avic v_vmsave_vmload vgif umip rdpid overflow_re cov succor smca sme sev sev_es $ cat /proc/cmdline BOOT_IMAGE=/boot/vmlinuz-5.12.7-051207-generic root=UUID=36253b6e-960e-4faf-ae6e-f546225ee9b4 ro rdt=cmt,mbmtotal,mbmlocal,l3cat,l3cdp,mba Need more details. 1. Dmesg 2. .config 3. Output of "pqos -d" I will try to reproduce on my system here. Created attachment 297111 [details] attachment-8531-0.html I'm on holidays till 04.11 ! ________________________________ Pawel Szulik Cloud Solutions Engineer | Data Center Group Mobile: +48 536 873 912 Location: Gdansk, PL | Mail: pawel.szulik@intel.com<mailto:pawel.szulik@intel.com> Intel Technology Poland | intel.com<http://intel.com> (In reply to Paweł Szulik from comment #3) > Created attachment 297111 [details] > attachment-8531-0.html > > I'm on holidays till 04.11 ! > > ________________________________ > Pawel Szulik > Cloud Solutions Engineer | Data Center Group > Mobile: +48 536 873 912 > Location: Gdansk, PL | Mail: > pawel.szulik@intel.com<mailto:pawel.szulik@intel.com> > Intel Technology Poland | intel.com<http://intel.com> This message is outdated. I'm out of office. I'll be available from 8.06. Pawel, That is fine. Please update the bug with more details when you come back. I am able to create the monitor groups on my system. It is working as expected. This is how I have created the new groups. 1. #mount -t resctrl resctrl /sys/fs/resctrl/ 2. #cd /sys/fs/resctrl/ 3. #cd mon_groups/ 4. #mkdir test 5. #cd test/ 6. #cd mon_data/ 7. # ps PID TTY TIME CMD 2375 pts/0 00:00:00 bash 2690 pts/0 00:00:00 ps 8. Assigning the bash pid to tasks #echo 2375 >tasks 9. Set the task to run on cpu 0. Cpu 0 is on resource id 0. taskset -p 0x1 2375 10. cd mon_data/ 11. cd mon_L3_00/ 12 # cat mbm_local_bytes 11840 #cat mbm_total_bytes 18560 Also https://github.com/intel/intel-cmt-cat is working fine. 1. git clone https://github.com/intel/intel-cmt-cat.git 2. make; make install 3. export LD_LIBRARY_PATH=/usr/local/lib 4. pqos -d Hardware capabilities Monitoring Cache Monitoring Technology (CMT) events: LLC Occupancy (LLC) Memory Bandwidth Monitoring (MBM) events: Total Memory Bandwidth (TMEM) Local Memory Bandwidth (LMEM) Remote Memory Bandwidth (RMEM) (calculated) Allocation Cache Allocation Technology (CAT) L3 CAT CDP: disabled Num COS: 16 Memory Bandwidth Allocation (MBA) Num COS: 16 Please give us more information what is not working. Thanks Created attachment 297121 [details]
attachment-27277-0.html
I'm out of office. I'll be available from 8.06
If I run "pqos", stopped it, and made mon group I got values instead of the "Unavailable". I dont see that problem. Can you please list the steps to re-create the issue? 1. #mount -t resctrl resctrl /sys/fs/resctrl/ 2. #cd /sys/fs/resctrl/ 3. #cd mon_groups/ 4. #mkdir test 5. #cd test/ 6. #cd mon_data/ 7. # ps PID TTY TIME CMD 4129 tty1 00:00:00 bash 4138 tty1 00:00:00 ps 8. Assigning the bash pid to tasks #echo 4129 > tasks 9. Set the task to run on cpu 0. Cpu 0 is on resource id 0. taskset -p 0x1 4129 10. cd mon_data/ 11. cd mon_L3_00/ 12 # cat mbm_local_bytes Unavailable #cat mbm_total_bytes Unavailable pqos -d Hardware capabilities Monitoring Cache Monitoring Technology (CMT) events: LLC Occupancy (LLC) Memory Bandwidth Monitoring (MBM) events: Total Memory Bandwidth (TMEM) Local Memory Bandwidth (LMEM) Remote Memory Bandwidth (RMEM) (calculated) Allocation Cache Allocation Technology (CAT) L3 CAT CDP: disabled Num COS: 16 Memory Bandwidth Allocation (MBA) Num COS: 16 (In reply to Paweł Szulik from comment #9) > 1. #mount -t resctrl resctrl /sys/fs/resctrl/ > 2. #cd /sys/fs/resctrl/ > 3. #cd mon_groups/ > 4. #mkdir test > 5. #cd test/ > 6. #cd mon_data/ > 7. # ps > PID TTY TIME CMD > 4129 tty1 00:00:00 bash > 4138 tty1 00:00:00 ps > > 8. Assigning the bash pid to tasks > #echo 4129 > tasks > > 9. Set the task to run on cpu 0. Cpu 0 is on resource id 0. > taskset -p 0x1 4129 > > 10. cd mon_data/ > > 11. cd mon_L3_00/ > > 12 # cat mbm_local_bytes > Unavailable > #cat mbm_total_bytes > Unavailable I followed the same steps and cannot reproduce. I have the latest kernel. # uname -r 5.13.0-rc4+ # pwd /sys/fs/resctrl/mon_groups/test/mon_data/mon_L3_00 # cat mbm_local_bytes 5847808 # cat mbm_total_bytes 7722240 Can please try with latest kernel? I will try your 5.12.7 kernel and see if i can reproduce. I cant reproduce with 5.12.7 also. # uname -r 5.12.7 # cat mon_data/mon_L3_00/mbm_local_bytes 973504 # cat mon_data/mon_L3_00/mbm_total_bytes 4049664 Please try with the latest kernel and let me know. I tried. What BIOS options you have? I have an internal bios here. I dont think it matters here. I changed my machine. Now everything seems to be working despite one difference between AMD and Intel which is described by this example: AMD: # cat mon_data/*/mbm_total_bytes Unavailable Unavailable Unavailable Unavailable Unavailable Unavailable 1939904 Unavailable Unavailable Unavailable Unavailable Unavailable Unavailable Unavailable Unavailable Unavailable Unavailable Unavailable Unavailable Unavailable Unavailable Unavailable Unavailable Unavailable Unavailable Unavailable Unavailable Unavailable Unavailable Unavailable Unavailable Unavailable Intel: # cat mon_data/*/mbm_total_bytes 1239501 0 And making any mon group leads to having "Unavailable" values in the root control group. (In reply to Paweł Szulik from comment #15) > I changed my machine. Now everything seems to be working despite one > difference between AMD and Intel which is described by this example: > > AMD: > # cat mon_data/*/mbm_total_bytes > Unavailable > > Unavailable > > Unavailable > > Unavailable > > Unavailable > > Unavailable > > 1939904 > > Unavailable > > Unavailable > > Unavailable > > Unavailable > > Unavailable > > Unavailable > > Unavailable > > Unavailable > > Unavailable > > Unavailable > > Unavailable > > Unavailable > > Unavailable > > Unavailable > > Unavailable > > Unavailable > > Unavailable > > Unavailable > > Unavailable > > Unavailable > > Unavailable > > Unavailable > > Unavailable > > Unavailable > > Unavailable > > Intel: > # cat mon_data/*/mbm_total_bytes > 1239501 > 0 There is a difference between AMD and Intel. In Intel there will only one resource id under one socket. But, in AMD there will be multiple resource ids under one socket. There will be one resource id for each core complex. You have 32 core complexes here (mon_L3_00 to mon_L3_31). The values in this Monitor groups are initialized to "Unavailable" when they are created. Lets start here. #mount -t resctrl resctrl /sys/fs/resctrl/ #cd resctrl/mon_groups #mkdir test #cd test # cat mon_data/mon_L3_*/mbm_local_bytes Unavailable Unavailable Unavailable Unavailable Unavailable Unavailable Unavailable Unavailable Unavailable Unavailable Unavailable Unavailable Unavailable Unavailable Unavailable Unavailable cat mon_data/mon_L3_*/mbm_total_bytes Unavailable Unavailable Unavailable Unavailable Unavailable Unavailable Unavailable Unavailable Unavailable Unavailable Unavailable Unavailable Unavailable Unavailable Unavailable Unavailable It is initialized to Unavailable. See above. Then start assigning the tasks to specific group. Then the values will start changing. # ps PID TTY TIME CMD 70743 pts/0 00:00:00 bash 70991 pts/0 00:00:00 ps # echo 70743 > tasks # cat mon_data/mon_L3_*/mbm_total_bytes 1536320 Unavailable Unavailable Unavailable Unavailable Unavailable Unavailable Unavailable Unavailable Unavailable Unavailable Unavailable Unavailable Unavailable Unavailable Unavailable [root@ethanolx1cc4-babu-gn-b1 test]# cat mon_data/mon_L3_*/mbm_local_bytes 4962368 Unavailable Unavailable Unavailable Unavailable Unavailable Unavailable Unavailable Unavailable Unavailable Unavailable Unavailable Unavailable Unavailable Unavailable Unavailable Look at the number above. When I updated the task, it started running on the core complex 0 (mon_L3_00). Kernel updated the values only on mon_L3_00. Now pin the task to another cpu which is in a different core complex. Moving the task to core complex 1(mon_L3_01). Run "pqos -s" to find out which core is which core complex(L3ID will show that). #taskset -p 0x10 70743 # taskset -p 0x10 70743 pid 70743's current affinity mask: 1 pid 70743's new affinity mask: 10 # cat mon_data/mon_L3_*/mbm_total_bytes 1536320 2159872 Unavailable Unavailable Unavailable Unavailable Unavailable Unavailable Unavailable Unavailable Unavailable Unavailable Unavailable Unavailable Unavailable Unavailable [root@ethanolx1cc4-babu-gn-b1 test]# cat mon_data/mon_L3_*/mbm_local_bytes 4962368 4009536 Unavailable Unavailable Unavailable Unavailable Unavailable Unavailable Unavailable Unavailable Unavailable Unavailable Unavailable Unavailable Unavailable Unavailable Now pin the task to another core which is core complex 02. taskset -p 0x100 70743 test]# taskset -p 0x100 70743 pid 70743's current affinity mask: 10 pid 70743's new affinity mask: 100 # cat mon_data/mon_L3_*/mbm_total_bytes 1536320 2159872 2941888 Unavailable Unavailable Unavailable Unavailable Unavailable Unavailable Unavailable Unavailable Unavailable Unavailable Unavailable Unavailable Unavailable [root@ethanolx1cc4-babu-gn-b1 test]# cat mon_data/mon_L3_*/mbm_local_bytes 4962368 4009536 2941888 Unavailable Unavailable Unavailable Unavailable Unavailable Unavailable Unavailable Unavailable Unavailable Unavailable Unavailable Unavailable Unavailable Because the task is running on core complex 02, now you see the values in mon_L3_02. Please check and let me know. I also noticed "Unavailable" becomes all 0 while testing. I dont know how that happened. I'm aware of the difference between AMD and Intel. For me, the strange behavior is that the values in this Monitor groups are initialized to "Unavailable" when they are created. On Intel, they are 0. The other problem is that making any mon group leads to having "Unavailable" values in the root group. (In reply to Paweł Szulik from comment #18) > I'm aware of the difference between AMD and Intel. For me, the strange > behavior is that the values in this Monitor groups are initialized to > "Unavailable" when they are created. On Intel, they are 0. > > The other problem is that making any mon group leads to having "Unavailable" > values in the root group. Yes, I see the problem now. Investigating it right now. (In reply to Paweł Szulik from comment #18) > I'm aware of the difference between AMD and Intel. For me, the strange > behavior is that the values in this Monitor groups are initialized to > "Unavailable" when they are created. On Intel, they are 0. When we create a new group, normally the new rmid is assigned to the new group. But the rmid is not active yet. When we read the events on that rmid, it is expected to show as Unavailable. > > The other problem is that making any mon group leads to having "Unavailable" > values in the root group. I have kind of know what is going on here. This is bug in rmid monotering code. I will send a fix proposal for this next week. |