Bug 213311 - Resctrl monitoring groups do not work properly on AMD EPYC 7742
Summary: Resctrl monitoring groups do not work properly on AMD EPYC 7742
Status: NEW
Alias: None
Product: Platform Specific/Hardware
Classification: Unclassified
Component: x86-64 (show other bugs)
Hardware: x86-64 Linux
: P1 normal
Assignee: platform_x86_64@kernel-bugs.osdl.org
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2021-06-01 15:03 UTC by Paweł Szulik
Modified: 2021-06-25 22:19 UTC (History)
2 users (show)

See Also:
Kernel Version: 5.12.7
Tree: Mainline
Regression: No


Attachments
attachment-27277-0.html (575 bytes, text/html)
2021-06-02 19:06 UTC, Paweł Szulik
Details

Description Paweł Szulik 2021-06-01 15:03:33 UTC
Creating a new monitoring group in the root resctrl leads to getting the "Unavailable" value for mbm_total_bytes and mbm_local_bytes on the entire filesystem.

Useful link: https://elixir.bootlin.com/linux/v5.12.7/source/arch/x86/kernel/cpu/resctrl/ctrlmondata.c#L482
Comment 1 Paweł Szulik 2021-06-01 19:28:10 UTC
This https://github.com/intel/intel-cmt-cat tool programs the monitoring groups via Model Specific Registers (MSR) on a hardware thread basic and it's working on that hardware. 

Monitoring of a group of pids feature uses a kernel interface and it's also not working.

Changing NUMA node topology via bios does not help.

Some of basic information:

CPU family:                      23
Model:                           49
Model name:                      AMD EPYC 7742 64-Core Processor
Stepping:                        0
Flags:                           fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc cpuid extd_apicid aperfmperf pni pcl
                                 mulqdq monitor ssse3 fma cx16 sse4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs skinit wdt tce topoext perfctr_core perfctr_nb bpex
                                 t perfctr_llc mwaitx cpb cat_l3 cdp_l3 hw_pstate ssbd mba ibrs ibpb stibp vmmcall fsgsbase bmi1 avx2 smep bmi2 cqm rdt_a rdseed adx smap clflushopt clwb sha_ni xsaveopt xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_
                                 total cqm_mbm_local clzero irperf xsaveerptr rdpru wbnoinvd amd_ppin arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold avic v_vmsave_vmload vgif umip rdpid overflow_re
                                 cov succor smca sme sev sev_es

$ cat /proc/cmdline 
BOOT_IMAGE=/boot/vmlinuz-5.12.7-051207-generic root=UUID=36253b6e-960e-4faf-ae6e-f546225ee9b4 ro rdt=cmt,mbmtotal,mbmlocal,l3cat,l3cdp,mba
Comment 2 Babu Moger 2021-06-01 23:07:07 UTC
Need more details.
1. Dmesg
2. .config
3. Output of "pqos -d"

I will try to reproduce on my system here.
Comment 3 Paweł Szulik 2021-06-01 23:07:29 UTC
Created attachment 297111 [details]
attachment-8531-0.html

I'm on holidays till 04.11 !

________________________________
Pawel Szulik
Cloud Solutions Engineer | Data Center Group
Mobile: +48 536 873 912
Location: Gdansk, PL | Mail: pawel.szulik@intel.com<mailto:pawel.szulik@intel.com>
Intel Technology Poland | intel.com<http://intel.com>
Comment 4 Paweł Szulik 2021-06-02 06:50:04 UTC
(In reply to Paweł Szulik from comment #3)
> Created attachment 297111 [details]
> attachment-8531-0.html
> 
> I'm on holidays till 04.11 !
> 
> ________________________________
> Pawel Szulik
> Cloud Solutions Engineer | Data Center Group
> Mobile: +48 536 873 912
> Location: Gdansk, PL | Mail:
> pawel.szulik@intel.com<mailto:pawel.szulik@intel.com>
> Intel Technology Poland | intel.com<http://intel.com>

This message is outdated.

I'm out of office. I'll be available from 8.06.
Comment 5 Babu Moger 2021-06-02 19:03:16 UTC
Pawel, That is fine. Please update the bug with more details when you come back.

I am able to create the monitor groups on my system. It is working as expected. This is how I have created the new groups.

1. #mount -t resctrl resctrl /sys/fs/resctrl/
2. #cd /sys/fs/resctrl/
3. #cd mon_groups/
4. #mkdir test
5. #cd test/
6. #cd mon_data/
7. # ps
    PID TTY          TIME CMD
   2375 pts/0    00:00:00 bash
   2690 pts/0    00:00:00 ps

8. Assigning the bash pid to tasks
   #echo 2375 >tasks

9. Set the task to run on cpu 0. Cpu 0 is on resource id 0.
   taskset -p 0x1 2375 

10. cd mon_data/

11. cd mon_L3_00/

12 # cat mbm_local_bytes
    11840
   #cat mbm_total_bytes
    18560


Also https://github.com/intel/intel-cmt-cat is working fine.
  1. git clone https://github.com/intel/intel-cmt-cat.git
  2. make; make install
  3. export LD_LIBRARY_PATH=/usr/local/lib
  4. pqos -d
   
    Hardware capabilities
    Monitoring
        Cache Monitoring Technology (CMT) events:
            LLC Occupancy (LLC)
        Memory Bandwidth Monitoring (MBM) events:
            Total Memory Bandwidth (TMEM)
            Local Memory Bandwidth (LMEM)
            Remote Memory Bandwidth (RMEM) (calculated)
    Allocation
        Cache Allocation Technology (CAT)
            L3 CAT
                CDP: disabled
                Num COS: 16
        Memory Bandwidth Allocation (MBA)
            Num COS: 16

 
Please give us more information what is not working.
Thanks
Comment 6 Paweł Szulik 2021-06-02 19:06:08 UTC
Created attachment 297121 [details]
attachment-27277-0.html

I'm out of office. I'll be available from 8.06
Comment 7 Paweł Szulik 2021-06-08 15:49:11 UTC
If I run "pqos", stopped it, and made mon group I got values instead of the "Unavailable".
Comment 8 Babu Moger 2021-06-08 15:59:43 UTC
I dont see that problem. Can you please list the steps to re-create the issue?
Comment 9 Paweł Szulik 2021-06-09 09:58:40 UTC
1. #mount -t resctrl resctrl /sys/fs/resctrl/
2. #cd /sys/fs/resctrl/
3. #cd mon_groups/
4. #mkdir test
5. #cd test/
6. #cd mon_data/
7. # ps
    PID TTY          TIME CMD
   4129 tty1    00:00:00 bash
   4138 tty1    00:00:00 ps

8. Assigning the bash pid to tasks
   #echo 4129 > tasks

9. Set the task to run on cpu 0. Cpu 0 is on resource id 0.
   taskset -p 0x1 4129 

10. cd mon_data/

11. cd mon_L3_00/

12 # cat mbm_local_bytes
    Unavailable
   #cat mbm_total_bytes
    Unavailable
Comment 10 Paweł Szulik 2021-06-09 10:58:54 UTC
pqos -d
   
    Hardware capabilities
    Monitoring
        Cache Monitoring Technology (CMT) events:
            LLC Occupancy (LLC)
        Memory Bandwidth Monitoring (MBM) events:
            Total Memory Bandwidth (TMEM)
            Local Memory Bandwidth (LMEM)
            Remote Memory Bandwidth (RMEM) (calculated)
    Allocation
        Cache Allocation Technology (CAT)
            L3 CAT
                CDP: disabled
                Num COS: 16
        Memory Bandwidth Allocation (MBA)
            Num COS: 16
Comment 11 Babu Moger 2021-06-09 14:19:48 UTC
(In reply to Paweł Szulik from comment #9)
> 1. #mount -t resctrl resctrl /sys/fs/resctrl/
> 2. #cd /sys/fs/resctrl/
> 3. #cd mon_groups/
> 4. #mkdir test
> 5. #cd test/
> 6. #cd mon_data/
> 7. # ps
>     PID TTY          TIME CMD
>    4129 tty1    00:00:00 bash
>    4138 tty1    00:00:00 ps
> 
> 8. Assigning the bash pid to tasks
>    #echo 4129 > tasks
> 
> 9. Set the task to run on cpu 0. Cpu 0 is on resource id 0.
>    taskset -p 0x1 4129 
> 
> 10. cd mon_data/
> 
> 11. cd mon_L3_00/
> 
> 12 # cat mbm_local_bytes
>     Unavailable
>    #cat mbm_total_bytes
>     Unavailable

I followed the same steps and cannot reproduce. I have the latest kernel.

# uname -r
5.13.0-rc4+
# pwd
/sys/fs/resctrl/mon_groups/test/mon_data/mon_L3_00
# cat mbm_local_bytes
5847808
# cat mbm_total_bytes
7722240

Can please try with latest kernel? I will try your 5.12.7 kernel and see if i can reproduce.
Comment 12 Babu Moger 2021-06-09 19:09:35 UTC
I cant reproduce with 5.12.7 also.
# uname -r
5.12.7
# cat mon_data/mon_L3_00/mbm_local_bytes
973504
# cat mon_data/mon_L3_00/mbm_total_bytes
4049664

Please try with the latest kernel and let me know.
Comment 13 Paweł Szulik 2021-06-10 15:10:29 UTC
I tried.

What BIOS options you have?
Comment 14 Babu Moger 2021-06-12 01:47:01 UTC
I have an internal bios here. I dont think it matters here.
Comment 15 Paweł Szulik 2021-06-16 11:19:06 UTC
I changed my machine. Now everything seems to be working despite one difference between AMD and Intel which is described by this example:

AMD:
# cat mon_data/*/mbm_total_bytes 
Unavailable                                                                         
Unavailable                                                                      
Unavailable                                                                      
Unavailable                                                                      
Unavailable                                                                      
Unavailable                                                                      
1939904                                                                          
Unavailable                                                                      
Unavailable                                                                      
Unavailable                                                                      
Unavailable                                                                      
Unavailable                                                                      
Unavailable                                                                       
Unavailable                                                                      
Unavailable                                                                      
Unavailable                                                                      
Unavailable                                                                      
Unavailable                                                                      
Unavailable                                                                      
Unavailable                                                                      
Unavailable                                                                      
Unavailable                                                                      
Unavailable                                                                      
Unavailable                                                                      
Unavailable                                                                      
Unavailable                                                                      
Unavailable                                                                      
Unavailable                                                                      
Unavailable                                                                      
Unavailable                                                                      
Unavailable                                                                      
Unavailable

Intel:
# cat mon_data/*/mbm_total_bytes
1239501
0
Comment 16 Paweł Szulik 2021-06-16 16:59:04 UTC
And making any mon group leads to having "Unavailable" values in the root control group.
Comment 17 Babu Moger 2021-06-16 22:02:49 UTC
(In reply to Paweł Szulik from comment #15)
> I changed my machine. Now everything seems to be working despite one
> difference between AMD and Intel which is described by this example:
> 
> AMD:
> # cat mon_data/*/mbm_total_bytes 
> Unavailable                                                                 
> 
> Unavailable                                                                 
> 
> Unavailable                                                                 
> 
> Unavailable                                                                 
> 
> Unavailable                                                                 
> 
> Unavailable                                                                 
> 
> 1939904                                                                     
> 
> Unavailable                                                                 
> 
> Unavailable                                                                 
> 
> Unavailable                                                                 
> 
> Unavailable                                                                 
> 
> Unavailable                                                                 
> 
> Unavailable                                                                 
> 
> Unavailable                                                                 
> 
> Unavailable                                                                 
> 
> Unavailable                                                                 
> 
> Unavailable                                                                 
> 
> Unavailable                                                                 
> 
> Unavailable                                                                 
> 
> Unavailable                                                                 
> 
> Unavailable                                                                 
> 
> Unavailable                                                                 
> 
> Unavailable                                                                 
> 
> Unavailable                                                                 
> 
> Unavailable                                                                 
> 
> Unavailable                                                                 
> 
> Unavailable                                                                 
> 
> Unavailable                                                                 
> 
> Unavailable                                                                 
> 
> Unavailable                                                                 
> 
> Unavailable                                                                 
> 
> Unavailable
> 
> Intel:
> # cat mon_data/*/mbm_total_bytes
> 1239501
> 0

There is a difference between AMD and Intel. In Intel there will only one resource id under one socket. But, in AMD there will be multiple resource ids under one socket. There will be one resource id for each core complex.
You have 32 core complexes here (mon_L3_00 to mon_L3_31).

The values in this Monitor groups are initialized to "Unavailable" when they are created.

Lets start here.

#mount -t resctrl resctrl /sys/fs/resctrl/
#cd resctrl/mon_groups
#mkdir test
#cd test
# cat mon_data/mon_L3_*/mbm_local_bytes
Unavailable
Unavailable
Unavailable
Unavailable
Unavailable
Unavailable
Unavailable
Unavailable
Unavailable
Unavailable
Unavailable
Unavailable
Unavailable
Unavailable
Unavailable
Unavailable
cat mon_data/mon_L3_*/mbm_total_bytes
Unavailable
Unavailable
Unavailable
Unavailable
Unavailable
Unavailable
Unavailable
Unavailable
Unavailable
Unavailable
Unavailable
Unavailable
Unavailable
Unavailable
Unavailable
Unavailable

It is initialized to Unavailable. See above.


Then start assigning the tasks to specific group. Then the values will start changing. 

# ps
PID TTY          TIME CMD
70743 pts/0    00:00:00 bash
70991 pts/0    00:00:00 ps
# echo 70743 > tasks

# cat mon_data/mon_L3_*/mbm_total_bytes
1536320
Unavailable
Unavailable
Unavailable
Unavailable
Unavailable
Unavailable
Unavailable
Unavailable
Unavailable
Unavailable
Unavailable
Unavailable
Unavailable
Unavailable
Unavailable
[root@ethanolx1cc4-babu-gn-b1 test]# cat mon_data/mon_L3_*/mbm_local_bytes
4962368
Unavailable
Unavailable
Unavailable
Unavailable
Unavailable
Unavailable
Unavailable
Unavailable
Unavailable
Unavailable
Unavailable
Unavailable
Unavailable
Unavailable
Unavailable

Look at the number above. When I updated the task, it started running on the core complex 0 (mon_L3_00). Kernel updated the values only on mon_L3_00.


Now pin the task to another cpu which is in a different core complex.
Moving the task to core complex 1(mon_L3_01). Run "pqos -s" to find out which core is which core complex(L3ID will show that).

#taskset -p 0x10 70743
# taskset -p 0x10 70743
pid 70743's current affinity mask: 1
pid 70743's new affinity mask: 10

# cat mon_data/mon_L3_*/mbm_total_bytes
1536320
2159872
Unavailable
Unavailable
Unavailable
Unavailable
Unavailable
Unavailable
Unavailable
Unavailable
Unavailable
Unavailable
Unavailable
Unavailable
Unavailable
Unavailable
[root@ethanolx1cc4-babu-gn-b1 test]# cat mon_data/mon_L3_*/mbm_local_bytes
4962368
4009536
Unavailable
Unavailable
Unavailable
Unavailable
Unavailable
Unavailable
Unavailable
Unavailable
Unavailable
Unavailable
Unavailable
Unavailable
Unavailable
Unavailable


Now pin the task to another core which is core complex 02. 
taskset -p 0x100 70743

test]# taskset -p 0x100 70743
pid 70743's current affinity mask: 10
pid 70743's new affinity mask: 100


# cat mon_data/mon_L3_*/mbm_total_bytes
1536320
2159872
2941888
Unavailable
Unavailable
Unavailable
Unavailable
Unavailable
Unavailable
Unavailable
Unavailable
Unavailable
Unavailable
Unavailable
Unavailable
Unavailable

[root@ethanolx1cc4-babu-gn-b1 test]# cat mon_data/mon_L3_*/mbm_local_bytes
4962368
4009536
2941888
Unavailable
Unavailable
Unavailable
Unavailable
Unavailable
Unavailable
Unavailable
Unavailable
Unavailable
Unavailable
Unavailable
Unavailable
Unavailable

Because the task is running on core complex 02, now you see the values in mon_L3_02. Please check and let me know. I also noticed "Unavailable" becomes all 0 while testing. I dont know how that happened.
Comment 18 Paweł Szulik 2021-06-17 09:05:25 UTC
I'm aware of the difference between AMD and Intel. For me, the strange behavior is that the values in this Monitor groups are initialized to "Unavailable" when they are created. On Intel, they are 0.

The other problem is that making any mon group leads to having "Unavailable" values in the root group.
Comment 19 Babu Moger 2021-06-17 21:07:32 UTC
(In reply to Paweł Szulik from comment #18)
> I'm aware of the difference between AMD and Intel. For me, the strange
> behavior is that the values in this Monitor groups are initialized to
> "Unavailable" when they are created. On Intel, they are 0.
> 
> The other problem is that making any mon group leads to having "Unavailable"
> values in the root group.

Yes, I see the problem now. Investigating it right now.
Comment 20 Babu Moger 2021-06-25 22:19:46 UTC
(In reply to Paweł Szulik from comment #18)
> I'm aware of the difference between AMD and Intel. For me, the strange
> behavior is that the values in this Monitor groups are initialized to
> "Unavailable" when they are created. On Intel, they are 0.

When we create a new group, normally the new rmid is assigned to the new group. But the rmid is not active yet. When we read the events on that rmid, it is expected to show as Unavailable.

> 
> The other problem is that making any mon group leads to having "Unavailable"
> values in the root group.

I have kind of know what is going on here. This is bug in rmid monotering code. I will send a fix proposal for this next week.

Note You need to log in before you can comment on or make changes to this bug.