Bug 209163 - amdgpu: The CS has been cancelled because the context is lost
Summary: amdgpu: The CS has been cancelled because the context is lost
Status: NEW
Alias: None
Product: Memory Management
Classification: Unclassified
Component: Other (show other bugs)
Hardware: x86-64 Linux
: P1 high
Assignee: drivers_video-dri
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2020-09-05 13:56 UTC by Satish patel
Modified: 2020-09-10 08:39 UTC (History)
3 users (show)

See Also:
Kernel Version: 4.9.118
Subsystem:
Regression: No
Bisected commit-id:


Attachments
dmesg log (69.40 KB, text/plain)
2020-09-05 13:56 UTC, Satish patel
Details
AMDGPU version information (377 bytes, text/plain)
2020-09-05 14:05 UTC, Satish patel
Details
Mesa_opencl version information (589 bytes, text/plain)
2020-09-05 14:05 UTC, Satish patel
Details
lspci information (55.25 KB, text/plain)
2020-09-05 14:15 UTC, Satish patel
Details
VRAM Utilization screen shot (80.04 KB, image/jpeg)
2020-09-10 04:08 UTC, Satish patel
Details

Description Satish patel 2020-09-05 13:56:17 UTC
Created attachment 292355 [details]
dmesg log

I am getting error after playing application continuously .
Comment 1 Satish patel 2020-09-05 14:05:09 UTC
Created attachment 292357 [details]
AMDGPU version information
Comment 2 Satish patel 2020-09-05 14:05:42 UTC
Created attachment 292359 [details]
Mesa_opencl version information
Comment 3 Satish patel 2020-09-05 14:15:41 UTC
Created attachment 292361 [details]
lspci information
Comment 4 Christian König 2020-09-09 11:03:54 UTC
This is expected behavior, your application tries to use more memory than physical available:

[71804.930003] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Not enough memory for command submission!

That is most likely a bug in the application, e.g. a memory leak.
Comment 5 Satish patel 2020-09-09 18:17:09 UTC
(In reply to Christian König from comment #4)
> This is expected behavior, your application tries to use more memory than
> physical available:
> 
> [71804.930003] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Not enough memory for
> command submission!
> 
> That is most likely a bug in the application, e.g. a memory leak.

Dear Mr. Konig, 

Thanks for your reply , But I would like to inform and describe same application running up to 10 days until  Physical memory and swap memory not utilized in CentOS 7 (gnome display ) with kernel 3.10.0-1127.el7.x86_64. 

But same application has error "amdgpu: The CS has been cancelled because the context is lost" even system utilize only  75% physical memory from Total 5.83 GB Physical memory  and 1% swap memory from 15 GB swap partition. This Error , I am getting in Kernel 4.9.118. Why system crash ( Display flickering and touch screen not responding) and not utilize swap memory area ? . But CPU and memory utilization showing when monitoring from other system .
Comment 6 Christian König 2020-09-09 18:26:48 UTC
You are running out of VRAM, not system memory.

Can you test this on an up to date kernel as well?
Comment 7 Satish patel 2020-09-10 04:08:29 UTC
Created attachment 292449 [details]
VRAM Utilization screen shot

It's attached VRAM Utilization error screen shot as output of -  cat /sys/kernel/debug/dri/0/amdgpu_vram_mm
Comment 8 Satish patel 2020-09-10 04:19:29 UTC
(In reply to Christian König from comment #6)
> You are running out of VRAM, not system memory.
> 
> Can you test this on an up to date kernel as well?

Is there any way to restrict not utilize full VRAM by AMDGPU module parameter settings ? same application running with on same hardware in Gnome desktop  (Centos 7) with kernel 3.10.xx.1127 . 

I am getting error when Utilize same application in X Windows and getting error after 19 hours.  where same application running more than 7 days with above Operating system and kernel version.
Comment 9 Christian König 2020-09-10 08:39:09 UTC
Try amdgpu.vramlimit=512 on the kernel command line to limit the available VRAM to 512MB.

The problem is certainly some kind of memory leak.

You need to test an up to date kernel, like 5.8 or even better the latest bleeding edge amd-staging-drm-next branch.

Note You need to log in before you can comment on or make changes to this bug.