Created attachment 287567 [details] dmesg of the system with kernel 5.4 and NVIDIA GTX 1660 card We got a Acer desktop equipped with Intel i7-8700 CPU and NVIDIA GTX 1660 card. We found it takes long time (more than 50 seconds) to resume after suspend. 01:00.0 VGA compatible controller [0300]: NVIDIA Corporation TU116 [GeForce GTX 1660] [10de:2184] (rev a1) (prog-if 00 [VGA controller]) Subsystem: PC Partner Limited / Sapphire Technology TU116 [GeForce GTX 1660] [174b:a544] Flags: bus master, fast devsel, latency 0, IRQ 130 Memory at a3000000 (32-bit, non-prefetchable) [size=16M] Memory at 90000000 (64-bit, prefetchable) [size=256M] Memory at a0000000 (64-bit, prefetchable) [size=32M] I/O ports at 4000 [size=128] [virtual] Expansion ROM at 000c0000 [disabled] [size=128K] Capabilities: <access denied> Kernel driver in use: nvidia Kernel modules: nvidiafb, nouveau 01:00.3 Serial bus controller [0c80]: NVIDIA Corporation Device [10de:1aed] (rev a1) Subsystem: PC Partner Limited / Sapphire Technology Device [174b:a544] Flags: bus master, fast devsel, latency 0, IRQ 126 Memory at a4084000 (32-bit, non-prefetchable) [size=4K] Capabilities: <access denied> Kernel driver in use: nvidia-gpu Kernel modules: i2c_nvidia_gpu To diagnose this issue, I made system boot into multi-user.target, then loaded nvidia driver manually and did suspend & resume. [ 49.735998] nvidia: loading out-of-tree module taints kernel. [ 49.736017] nvidia: module license 'NVIDIA' taints kernel. [ 49.736029] Disabling lock debugging due to kernel taint [ 49.744215] nvidia-nvlink: Nvlink Core is being initialized, major device number 239 [ 49.744559] nvidia 0000:01:00.0: vgaarb: changed VGA decodes: olddecodes=io+mem,decodes=none:owns=io+mem [ 49.794435] NVRM: loading NVIDIA UNIX x86_64 Kernel Module 440.59 Thu Jan 30 01:00:41 UTC 2020 [ 49.816029] nvidia-modeset: Loading NVIDIA Kernel Mode Setting Driver for UNIX platforms 440.59 Thu Jan 30 00:59:18 UTC 2020 [ 49.855229] [drm] [nvidia-drm] [GPU ID 0x00000100] Loading driver [ 50.506324] [drm] Supports vblank timestamp caching Rev 2 (21.10.2013). [ 50.506324] [drm] No driver support for vblank timestamp query. [ 50.630054] [drm] Initialized nvidia-drm 0.0.0 20160202 for 0000:01:00.0 on minor 0 [ 64.198505] r8169 0000:02:00.0 enp2s0: Link is Down [ 64.250963] PM: suspend entry (deep) [ 64.414745] Filesystems sync: 0.163 seconds [ 64.415905] Freezing user space processes ... [ 84.422985] Freezing of tasks failed after 20.007 seconds (1 tasks refusing to freeze, wq_busy=0): [ 84.423009] systemd-udevd D 0 478 427 0x80004324 [ 84.423010] Call Trace: [ 84.423019] __schedule+0x2e5/0x6e0 [ 84.423027] schedule+0x33/0xa0 [ 84.423035] __rt_mutex_slowlock+0xa9/0xf0 [ 84.423045] rt_mutex_slowlock+0xd5/0x1e0 [ 84.423055] rt_mutex_lock+0x3c/0x40 [ 84.423064] i2c_adapter_lock_bus+0x12/0x20 [ 84.423074] i2c_transfer+0x7a/0x100 [ 84.423084] ccg_read+0x11e/0x170 [ucsi_ccg] [ 84.423095] ? __queue_work+0x106/0x3f0 [ 84.423105] ucsi_ccg_probe+0x19a/0x210 [ucsi_ccg] [ 84.423116] ? ucsi_ccg_init+0xe0/0xe0 [ucsi_ccg] [ 84.423127] i2c_device_probe+0x190/0x250 [ 84.423137] really_probe+0x1c8/0x3f0 [ 84.423146] driver_probe_device+0xbb/0x100 [ 84.423155] device_driver_attach+0x58/0x60 [ 84.423165] __driver_attach+0x8f/0x150 [ 84.423174] ? device_driver_attach+0x60/0x60 [ 84.423184] bus_for_each_dev+0x79/0xc0 [ 84.423193] ? kmem_cache_alloc_trace+0x15e/0x230 [ 84.423204] driver_attach+0x1e/0x20 [ 84.423213] bus_add_driver+0x154/0x1f0 [ 84.423222] ? 0xffffffffc05a3000 [ 84.423230] driver_register+0x70/0xc0 [ 84.423238] ? 0xffffffffc05a3000 [ 84.423246] i2c_register_driver+0x42/0x90 [ 84.423255] ? 0xffffffffc05a3000 [ 84.423264] ucsi_ccg_driver_init+0x1c/0x1000 [ucsi_ccg] [ 84.423276] do_one_initcall+0x4a/0x1fa [ 84.423285] ? kfree+0x228/0x240 [ 84.423293] ? _cond_resched+0x19/0x30 [ 84.423302] ? kmem_cache_alloc_trace+0x15e/0x230 [ 84.423313] do_init_module+0x60/0x230 [ 84.423322] load_module+0x178a/0x1a10 [ 84.423332] __do_sys_finit_module+0xbd/0x120 [ 84.423556] ? __do_sys_finit_module+0xbd/0x120 [ 84.423800] __x64_sys_finit_module+0x1a/0x20 [ 84.424019] do_syscall_64+0x57/0x190 [ 84.424253] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [ 84.424491] RIP: 0033:0x7f19be8902a9 [ 84.424716] Code: Bad RIP value. [ 84.424966] RSP: 002b:00007ffe369e6a08 EFLAGS: 00000246 ORIG_RAX: 0000000000000139 [ 84.425206] RAX: ffffffffffffffda RBX: 000055e3ef887fa0 RCX: 00007f19be8902a9 [ 84.425450] RDX: 0000000000000000 RSI: 00007f19be794cad RDI: 0000000000000010 [ 84.425726] RBP: 00007f19be794cad R08: 0000000000000000 R09: 0000000000000000 [ 84.425974] R10: 0000000000000010 R11: 0000000000000246 R12: 0000000000000000 [ 84.426249] R13: 000055e3efa606f0 R14: 0000000000020000 R15: 000055e3ef887fa0 [ 84.426500] OOM killer enabled. [ 84.426500] Restarting tasks ... done. [ 84.427668] PM: suspend exit [ 84.427703] PM: suspend entry (s2idle) [ 84.463704] Filesystems sync: 0.035 seconds [ 84.464776] Freezing user space processes ... (elapsed 15.857 seconds) done. ... [ 221.762502] ucsi_ccg 0-0008: failed to reset PPM! [ 221.762508] ucsi_ccg 0-0008: PPM init failed (-110) [ 260.986511] ucsi_ccg 0-0008: PPM NOT RESPONDING [ 260.986516] PM: dpm_run_callback(): ucsi_ccg_resume+0x0/0x20 [ucsi_ccg] returns -110 [ 260.986517] PM: Device 0-0008 failed to resume: error -110 [ 260.989188] OOM killer enabled. [ 260.989188] Restarting tasks ... done. [ 261.047545] PM: suspend exit According to the log, we notice "Freezing of tasks failed after 20.007 seconds (1 tasks refusing to freeze, wq_busy=0)" during suspending and "PM: Device 0-0008 failed to resume: error -110" during resuming. The full dmesg is as the attachment.
Created attachment 287569 [details] dmesg of the system with kernel 5.6-rc2 and NVIDIA GTX 1660 card I also tested latest mainline kernel 5.6-rc2. [ 28.060831] PM: suspend entry (deep) [ 28.144260] Filesystems sync: 0.083 seconds [ 28.150219] Freezing user space processes ... [ 48.153282] Freezing of tasks failed after 20.003 seconds (1 tasks refusing to freeze, wq_busy=0): [ 48.153447] systemd-udevd D13440 382 330 0x80004124 [ 48.153457] Call Trace: [ 48.153504] ? __schedule+0x272/0x5a0 [ 48.153558] ? hrtimer_start_range_ns+0x18c/0x2c0 [ 48.153622] schedule+0x45/0xb0 [ 48.153668] schedule_hrtimeout_range_clock+0x8f/0x100 [ 48.153738] ? hrtimer_init_sleeper+0x80/0x80 [ 48.153798] usleep_range+0x5a/0x80 [ 48.153850] gpu_i2c_check_status.isra.0+0x3a/0xa0 [i2c_nvidia_gpu] [ 48.153933] gpu_i2c_master_xfer+0x155/0x20e [i2c_nvidia_gpu] [ 48.154012] __i2c_transfer+0x163/0x4c0 [ 48.154067] i2c_transfer+0x6e/0xc0 [ 48.154120] ccg_read+0x11f/0x170 [ucsi_ccg] [ 48.154182] get_fw_info+0x17/0x50 [ucsi_ccg] [ 48.154242] ucsi_ccg_probe+0xf4/0x200 [ucsi_ccg] [ 48.154312] ? ucsi_ccg_init+0xe0/0xe0 [ucsi_ccg] [ 48.154377] i2c_device_probe+0x113/0x210 [ 48.154435] really_probe+0xdf/0x280 [ 48.154487] driver_probe_device+0x4b/0xc0 [ 48.154545] device_driver_attach+0x4e/0x60 [ 48.154604] __driver_attach+0x44/0xb0 [ 48.154657] ? device_driver_attach+0x60/0x60 [ 48.154717] bus_for_each_dev+0x6c/0xb0 [ 48.154772] bus_add_driver+0x172/0x1c0 [ 48.154824] driver_register+0x67/0xb0 [ 48.154877] i2c_register_driver+0x39/0x70 [ 48.154932] ? 0xffffffffc00ac000 [ 48.154978] do_one_initcall+0x3e/0x1d0 [ 48.155032] ? free_vmap_area_noflush+0x8d/0xe0 [ 48.155093] ? _cond_resched+0x10/0x20 [ 48.155145] ? kmem_cache_alloc_trace+0x3a/0x1b0 [ 48.155208] do_init_module+0x56/0x200 [ 48.155260] load_module+0x21fe/0x24e0 [ 48.155322] ? __do_sys_finit_module+0xbf/0xe0 [ 48.155381] __do_sys_finit_module+0xbf/0xe0 [ 48.155441] do_syscall_64+0x3d/0x130 [ 48.156841] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [ 48.158074] RIP: 0033:0x7fba3b4bc2a9 [ 48.158707] Code: Bad RIP value. [ 48.158990] RSP: 002b:00007ffe1da3a6d8 EFLAGS: 00000246 ORIG_RAX: 0000000000000139 [ 48.159259] RAX: ffffffffffffffda RBX: 000055ca6922c470 RCX: 00007fba3b4bc2a9 [ 48.159566] RDX: 0000000000000000 RSI: 00007fba3b3c0cad RDI: 0000000000000010 [ 48.159842] RBP: 00007fba3b3c0cad R08: 0000000000000000 R09: 0000000000000000 [ 48.160117] R10: 0000000000000010 R11: 0000000000000246 R12: 0000000000000000 [ 48.160412] R13: 000055ca6922f940 R14: 0000000000020000 R15: 000055ca6922c470 Seems get problem in gpu_i2c_check_status.
Created attachment 287571 [details] Debug to get time expending analyzing I add some debug message to have the time expending analyzing. The result is as following.
Created attachment 287573 [details] dmesg including debug message in comment #2 of the system with kernel 5.6-rc2 and NVIDIA GTX 1660 card (In reply to jian-hong from comment #2) [ 24.905116] ucsi_ccg 1-0008: get_fw_info: CCGX_RAB_READ_ALL_VER [ 24.905966] ucsi_ccg 1-0008: ccg_read: goint to read 4 bytes [ 24.906833] i2c i2c-1: __i2c_transfer: calling master_xfer 00000000108da070 [ 24.907742] nvidia-gpu 0000:01:00.3: gpu_i2c_master_xfer: test pm_runtime_get_sync start [ 24.908704] nvidia-gpu 0000:01:00.3: gpu_i2c_master_xfer: test pm_runtime_get_sync finshed [ 24.909641] nvidia-gpu 0000:01:00.3: gpu_i2c_master_xfer: calling gpu_i2c_start [ 24.910582] nvidia-gpu 0000:01:00.3: gpu_i2c_start: timing start [ 24.911508] nvidia-gpu 0000:01:00.3: gpu_i2c_start: timing stop [ 25.912527] nvidia-gpu 0000:01:00.3: gpu_i2c_check_status: timing start. val=0xe0000000 [ 25.913610] nvidia-gpu 0000:01:00.3: gpu_i2c_check_status: timing stop. val=0xe0000000 [ 25.914677] nvidia-gpu 0000:01:00.3: gpu_i2c_master_xfer: gpu_i2c_start returns 0 [ 25.915754] nvidia-gpu 0000:01:00.3: gpu_i2c_master_xfer: calling gpu_i2c_write [ 25.916737] nvidia-gpu 0000:01:00.3: gpu_i2c_write: timing start [ 25.917707] nvidia-gpu 0000:01:00.3: gpu_i2c_write: timing stop [ 25.918667] nvidia-gpu 0000:01:00.3: gpu_i2c_write: timing start [ 25.919597] nvidia-gpu 0000:01:00.3: gpu_i2c_write: timing stop [ 26.920960] nvidia-gpu 0000:01:00.3: gpu_i2c_check_status: timing start. val=0xe0000040 [ 26.922047] nvidia-gpu 0000:01:00.3: gpu_i2c_check_status: timing stop. val=0xe0000040 [ 26.923119] nvidia-gpu 0000:01:00.3: gpu_i2c_master_xfer: gpu_i2c_write returns 0 [ 26.924211] nvidia-gpu 0000:01:00.3: gpu_i2c_master_xfer: calling gpu_i2c_write [ 26.925316] nvidia-gpu 0000:01:00.3: gpu_i2c_write: timing start [ 26.926403] nvidia-gpu 0000:01:00.3: gpu_i2c_write: timing stop [ 26.927479] nvidia-gpu 0000:01:00.3: gpu_i2c_write: timing start [ 26.928433] nvidia-gpu 0000:01:00.3: gpu_i2c_write: timing stop [ 27.929884] nvidia-gpu 0000:01:00.3: gpu_i2c_check_status: timing start. val=0xe0000040 [ 27.930994] nvidia-gpu 0000:01:00.3: gpu_i2c_check_status: timing stop. val=0xe0000040 [ 27.932061] nvidia-gpu 0000:01:00.3: gpu_i2c_master_xfer: gpu_i2c_write returns 0 [ 27.933069] nvidia-gpu 0000:01:00.3: gpu_i2c_master_xfer: calling gpu_i2c_write [ 27.934093] nvidia-gpu 0000:01:00.3: gpu_i2c_write: timing start [ 27.935130] nvidia-gpu 0000:01:00.3: gpu_i2c_write: timing stop [ 27.936090] nvidia-gpu 0000:01:00.3: gpu_i2c_write: timing start [ 27.936964] nvidia-gpu 0000:01:00.3: gpu_i2c_write: timing stop [ 28.937895] nvidia-gpu 0000:01:00.3: gpu_i2c_check_status: timing start. val=0xe0000040 [ 28.938919] nvidia-gpu 0000:01:00.3: gpu_i2c_check_status: timing stop. val=0xe0000040 [ 28.939926] nvidia-gpu 0000:01:00.3: gpu_i2c_master_xfer: gpu_i2c_write returns 0 [ 28.941727] nvidia-gpu 0000:01:00.3: gpu_i2c_master_xfer: going to write address [ 28.943337] nvidia-gpu 0000:01:00.3: gpu_i2c_master_xfer: going to read 4 bytes [ 28.944173] nvidia-gpu 0000:01:00.3: gpu_i2c_read: calling gpu_i2c_check_status [ 29.944660] nvidia-gpu 0000:01:00.3: gpu_i2c_check_status: timing start. val=0xf0000100 [ 29.945641] nvidia-gpu 0000:01:00.3: gpu_i2c_check_status: timing stop. val=0xf0000100 [ 29.946555] nvidia-gpu 0000:01:00.3: gpu_i2c_read: gpu_i2c_check_status returns 0 [ 29.948271] nvidia-gpu 0000:01:00.3: gpu_i2c_master_xfer: gpu_i2c_read returns 0 [ 29.949993] nvidia-gpu 0000:01:00.3: gpu_i2c_stop: timing start [ 29.950940] nvidia-gpu 0000:01:00.3: gpu_i2c_stop: timing stop [ 30.952891] nvidia-gpu 0000:01:00.3: gpu_i2c_check_status: timing start. val=0xe0000000 [ 30.953869] nvidia-gpu 0000:01:00.3: gpu_i2c_check_status: timing stop. val=0xe0000000 [ 30.954806] i2c i2c-1: __i2c_transfer: ret=2 [ 30.956451] ucsi_ccg 1-0008: ccg_read: read 20 bytes, remain 20 bytes ... [ 61.170336] ucsi_ccg 1-0008: get_fw_info: CCGX_RAB_READ_ALL_VER finished. err=0 According to the log, we can find: * get_fw_info() takes 36 seconds for "CCGX_RAB_READ_ALL_VER". It does many i2c actions. Most of the i2c actions calls gpu_i2c_check_status() to check i2c communication status. * The do-while loop in gpu_i2c_check_status always takes around 1 second and does not break earlier. The 1 second comes from the target variable msecs_to_jiffies(1000) in gpu_i2c_check_status. Therefore, calling gpu_i2c_check_status more times will take more waiting time, which makes the issue happen. Then, the bits' state of register I2C_MST_CNTL, which is read as the variable val and checked in the do-while loop of gpu_i2c_check_status are interesting. We can find the macro at [1]. However, we are not clear about the meaning. [1] https://elixir.bootlin.com/linux/v5.6-rc2/source/drivers/i2c/busses/i2c-nvidia-gpu.c#L27
We found both ASUS UX581LV equipped with NNVIDIA GeForce RTX 2060 and Acer Predator PH315-52 equipped with NVIDIA GeForce RTX 2060 Mobile also hit this issue.
Log in comment3 shows that i2c transaction never completes. Bit31 I2C_MST_CNTL_CYCLE_TRIGGER remains set and also bus (Bit[28:29] =0x3 )is always busy. This leads to timeout of 1 seconds. It appears we are not catching this timeout error due to exact 1 seconds and recent fix at [1] is merged for this. This fix will return TIMEDOUT error and no further i2c transfer will be initiated. The switch statement checking the status I2C_MST_CNTL_STATUS doesn't consider status of bus always being busy (I2C_MST_CNTL_STATUS_BUS_BUSY) and returns sucess (=0) in default case. We should add a case to handle this: ---------------------------- Fix [2] ------------------- --- a/drivers/i2c/busses/i2c-nvidia-gpu.c +++ b/drivers/i2c/busses/i2c-nvidia-gpu.c @@ -100,6 +100,7 @@ static int gpu_i2c_check_status(struct gpu_i2c_dev *i2cd) case I2C_MST_CNTL_STATUS_NO_ACK: return -ENXIO; case I2C_MST_CNTL_STATUS_TIMEOUT: + case I2C_MST_CNTL_STATUS_BUS_BUSY: return -ETIMEDOUT; default: return 0; I would recommend to use fix at [1] and [2] and see if it helps. Hi Jian, We also need to see why i2c transactions are failing for those GPUs.Do you know if you plan to use USB Type-C interface on GPU card? Thanks Ajay 1: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=d944b27df121e2ee854a6c2fad13d6c6300792d4
Created attachment 288405 [details] dmesg with commit d944b27df121 (In reply to Ajay Gupta from comment #5) Thanks for Ajay's comment! After apply commit d944b27df121 ("i2c: nvidia-gpu: Handle timeout correctly in gpu_i2c_check_status()"), system gets the ETIMEDOUT from the probing of ucsi_ccg: [ 8.841157] nvidia-gpu 0000:01:00.3: i2c timeout error e0000000 [ 8.841159] ucsi_ccg 0-0008: i2c_transfer failed -110 [ 8.841161] ucsi_ccg 0-0008: ucsi_ccg_init failed - -110 [ 8.841165] ucsi_ccg: probe of 0-0008 failed with error -110 The resume seems fine, although we have no screen after resume. Since ucsi_ccg probes failed, there is no more related following action after the probing. So, I like this probing error. It helps this issue. > I would recommend to use fix at [1] and [2] and see if it helps. > Hi Jian, > We also need to see why i2c transactions are failing for those GPUs. Do you > know if you plan to use USB Type-C interface on GPU card? Uh ... There is no type-c port on the NVIDIA GTX 1660 card. And, we do not have a display with type-c interface.
Hello, I don't have these issues. Resume after suspend is instantaneous. My graphics card (with virtuallink port): >01:00.0 VGA compatible controller: NVIDIA Corporation TU104GL [Quadro RTX >4000] (rev a1) (prog-if 00 [VGA controller]) > Subsystem: NVIDIA Corporation TU104GL [Quadro RTX 4000] > Flags: bus master, fast devsel, latency 0, IRQ 50 > Memory at ee000000 (32-bit, non-prefetchable) [size=16M] > Memory at d0000000 (64-bit, prefetchable) [size=256M] > Memory at e0000000 (64-bit, prefetchable) [size=32M] > I/O ports at e000 [size=128] > [virtual] Expansion ROM at 000c0000 [disabled] [size=128K] > Capabilities: [60] Power Management version 3 > Capabilities: [68] MSI: Enable+ Count=1/1 Maskable- 64bit+ > Capabilities: [78] Express Legacy Endpoint, MSI 00 > Capabilities: [100] Virtual Channel > Capabilities: [250] Latency Tolerance Reporting > Capabilities: [258] L1 PM Substates > Capabilities: [128] Power Budgeting <?> > Capabilities: [420] Advanced Error Reporting > Capabilities: [600] Vendor Specific Information: ID=0001 Rev=1 Len=024 ><?> > Capabilities: [900] #19 > Capabilities: [bb0] #15 > Kernel driver in use: nvidia > Kernel modules: nvidia_drm, nvidia >01:00.1 Audio device: NVIDIA Corporation TU104 HD Audio Controller (rev a1) > Subsystem: NVIDIA Corporation TU104 HD Audio Controller > Flags: bus master, fast devsel, latency 0, IRQ 17 > Memory at ef080000 (32-bit, non-prefetchable) [size=16K] > Capabilities: [60] Power Management version 3 > Capabilities: [68] MSI: Enable- Count=1/1 Maskable- 64bit+ > Capabilities: [78] Express Endpoint, MSI 00 > Capabilities: [100] Advanced Error Reporting > Kernel driver in use: snd_hda_intel >01:00.2 USB controller: NVIDIA Corporation TU104 USB 3.1 Host Controller (rev >a1) (prog-if 30 [XHCI]) > Subsystem: NVIDIA Corporation TU104 USB 3.1 Host Controller > Flags: fast devsel, IRQ 30 > Memory at e2000000 (64-bit, prefetchable) [size=256K] > Memory at e2040000 (64-bit, prefetchable) [size=64K] > Capabilities: [68] MSI: Enable+ Count=1/1 Maskable- 64bit+ > Capabilities: [78] Express Endpoint, MSI 00 > Capabilities: [b4] Power Management version 3 > Capabilities: [100] Advanced Error Reporting > Kernel driver in use: xhci_hcd >01:00.3 Serial bus controller [0c80]: NVIDIA Corporation TU104 USB Type-C UCSI >Controller (rev a1) > Subsystem: NVIDIA Corporation TU104 USB Type-C UCSI Controller > Flags: bus master, fast devsel, latency 0, IRQ 48 > Memory at ef084000 (32-bit, non-prefetchable) [size=4K] > Capabilities: [68] MSI: Enable+ Count=1/1 Maskable- 64bit+ > Capabilities: [78] Express Endpoint, MSI 00 > Capabilities: [b4] Power Management version 3 > Capabilities: [100] Advanced Error Reporting > Kernel driver in use: nvidia-gpu Drivers: >x11-drivers/nvidia-drivers-440.82-r3::gentoo Session: >systemctl get-default >graphical.target I do not have these enabled: >systemctl status nvidia-suspend.service >nvidia-suspend.service - NVIDIA system suspend actions > Loaded: loaded (/lib/systemd/system/nvidia-suspend.service; enabled; > vendor preset: disabled) > Active: inactive (dead) >localhost /usr/share/i18n/locales # systemctl status nvidia-hibernate.service >nvidia-hibernate.service - NVIDIA system hibernate actions > Loaded: loaded (/lib/systemd/system/nvidia-hibernate.service; enabled; > >vendor preset: disabled) > Active: inactive (dead) >localhost /usr/share/i18n/locales # systemctl status nvidia-resume.service > nvidia-resume.service - NVIDIA system resume actions > Loaded: loaded (/lib/systemd/system/nvidia-resume.service; enabled; > vendor >preset: disabled) > Active: inactive (dead) Kernel/distro >uname -r >5.4.38-gentoo One thing I did notice though, but I don't know if it's related. Right or left clicks to menus take 2-3 seconds to respond e.g. change tabs on my terminal (left click), opening menus on any windows (left click), right clicking on a web link in my browser. >We also need to see why i2c transactions are failing for those GPUs.Do you >know >if you plan to use USB Type-C interface on GPU card? I do. I have purchased an USB hub (Startech HB31C4AB). I'll get my hands on it in about a week and test it on my card.
I have tested the unit and although it was detected along all devices trough it, after a while they suddenly disconnect. I don't have this problem if I connect the devices directly to the motherboard's USB ports. I have modules I2C_NVIDIA_GPU, TYPEC, TYPEC_USCI, UCSI_CCG modules built in to the kernel. How can I be of help? >One thing I did notice though, but I don't know if it's related. Right or left >clicks to menus take 2-3 seconds to respond e.g. change tabs on my terminal >(left click), opening menus on any windows (left click), right clicking on a >web link in my browser. this problem went away.
Hello, Why there are such errors with a Nvidia GTX 1660 Ti? nvidia-gpu 0000:01:00.3: i2c timeout error e0000000 ucsi_ccg 0-0008: i2c_transfer failed -110 ucsi_ccg 0-0008: ucsi_ccg_init failed - -110 ucsi_ccg: probe of 0-0008 failed with error -110 These errors get displayed each system boot and despite every thing seems to function well, these errors are uglying plymouth splash.
I can certainly confirm this issue while running the latest Mint 20 on 5.4.0-42-generic, and it seems like it's NOT actually painless. My second monitor, which is connected via DisplayPort has stopped working in-sync with this error started to appear. I'm also running a 1660 Ti GPU.
Hello, Why on earth these timeout are displayed at error level as they are purely information? This is uglying Plymouth each system boot. Also these timeout are displayed only at system boot. Every thing on my Nvidia graphic card (GeForce GTX 1660 Ti with only display port, there is no USB-C connectors on it) is running fine with Nvidia drivers, 3D acceleration is OK, suspend/resume is OK, all DisplayPorts are OK. So why these error level?
NVIDIA GTX 1660 Ti doesn't have USB Type-C interface. (See https://www.nvidia.com/en-us/geforce/graphics-cards/gtx-1660-ti/) NVIDIA I2C driver is loaded based on "PCI_VENDOR_ID_NVIDIA, PCI_ANY_ID, PCI_ANY_ID, PCI_ANY_ID PCI_CLASS_SERIAL_UNKNOWN" (refer A) and then it loads ucsi_ccg driver which fails after i2c transfer timeouts since there is no Type-C interface. Below messages are harmless and shouldn't impact any functionality. nvidia-gpu 0000:01:00.3: i2c timeout error e0000000 ucsi_ccg 0-0008: i2c_transfer failed -110 ucsi_ccg 0-0008: ucsi_ccg_init failed - -110 ucsi_ccg: probe of 0-0008 failed with error -110 -----------------------------------------------[A] ------ drivers/i2c/busses/i2c-nvidia-gpu.c /* * This driver is for Nvidia GPU cards with USB Type-C interface. * We want to identify the cards using vendor ID and class code only * to avoid dependency of adding product id for any new card which * requires this driver. * Currently there is no class code defined for UCSI device over PCI * so using UNKNOWN class for now and it will be updated when UCSI * over PCI gets a class code. * There is no other NVIDIA cards with UNKNOWN class code. Even if the * driver gets loaded for an undesired card then eventually i2c_read() * (initiated from UCSI i2c_client) will timeout or UCSI commands will * timeout. */ #define PCI_CLASS_SERIAL_UNKNOWN 0x0c80 static const struct pci_device_id gpu_i2c_ids[] = { { PCI_VENDOR_ID_NVIDIA, PCI_ANY_ID, PCI_ANY_ID, PCI_ANY_ID, PCI_CLASS_SERIAL_UNKNOWN << 8, 0xffffff00}, { } };
Hi Ajay, I know GeForce GTX 1660 Ti has no type-c interface, and that these messages are harmeless. But they are displayed as error and destroy Plymouth (splashscreen) functionality. This is not user friendly. If a timeout message is not harmfull as it is not involve functionality lost, it should not be logged as error but as information message. No functionality lost = information level Minor Functionality lost = warning level Major / breakage functionality lost = error or emergency level Code seems to be corrected to not log these non harmfull message.
I can try changing it using below patch. Does it look fine to you? Can ou test with it? --- a/drivers/i2c/busses/i2c-nvidia-gpu.c +++ b/drivers/i2c/busses/i2c-nvidia-gpu.c @@ -85,7 +85,7 @@ static int gpu_i2c_check_status(struct gpu_i2c_dev *i2cd) 500, 1000 * USEC_PER_MSEC); if (ret) { - dev_err(i2cd->dev, "i2c timeout error %x\n", val); + dev_info(i2cd->dev, "i2c timeout error %x\n", val); return -ETIMEDOUT; } diff --git a/drivers/usb/typec/ucsi/ucsi_ccg.c b/drivers/usb/typec/ucsi/ucsi_ccg.c index 2b7b5de..3948ee0 100644 --- a/drivers/usb/typec/ucsi/ucsi_ccg.c +++ b/drivers/usb/typec/ucsi/ucsi_ccg.c @@ -252,7 +252,7 @@ static int ccg_read(struct ucsi_ccg *uc, u16 rab, u8 *data, u32 len) put_unaligned_le16(rab, buf); status = i2c_transfer(client->adapter, msgs, ARRAY_SIZE(msgs)); if (status < 0) { - dev_err(uc->dev, "i2c_transfer failed %d\n", status); + dev_info(uc->dev, "i2c_transfer failed %d\n", status); pm_runtime_put_sync(uc->dev); return status; } @@ -289,7 +289,7 @@ static int ccg_write(struct ucsi_ccg *uc, u16 rab, const u8 *data, u32 len) pm_runtime_get_sync(uc->dev); status = i2c_transfer(client->adapter, msgs, ARRAY_SIZE(msgs)); if (status < 0) { - dev_err(uc->dev, "i2c_transfer failed %d\n", status); + dev_info(uc->dev, "i2c_transfer failed %d\n", status); pm_runtime_put_sync(uc->dev); kfree(buf); return status; @@ -1346,8 +1346,8 @@ static int ucsi_ccg_probe(struct i2c_client *client, /* reset ccg device and initialize ucsi */ status = ucsi_ccg_init(uc); if (status < 0) { - dev_err(uc->dev, "ucsi_ccg_init failed - %d\n", status); + dev_info(uc->dev, "ucsi_ccg_init failed - %d\n", status); return status; } status = get_fw_info(uc);
This patch should be working. I'll inform our Kernel maintainers for Mageia Kernel to test it in our distribution. Thanks,
@ Ajay Gupta, Proposed patch in Comment 14 really works on our latest Linux 5.8.2 Mageia Kernels. This could be mainline. Errors are no longer displayed on Console, no longer breaks plymouth splash. Lines info are still here in journal so Good. Also, suspend and resume work great on our systems with these graphics cards. Should be get resolved fixed?
This bug is still present on kernel 5.8.10. I have just changed my GPU to a NVIDIA GTX 1660 TI on Arch Linux and I get the same error.
Posted below change to resolve the issue. I2C: https://marc.info/?l=linux-i2c&m=160070968116942&w=2 USB: https://marc.info/?l=linux-usb&m=160071015217052&w=2 Thanks.
Changes in comment#18 is not acceptable as-is. Hello Oudelet Please help test below change, If this works then I can post them for review. diff --git a/drivers/i2c/busses/i2c-nvidia-gpu.c b/drivers/i2c/busses/i2c-nvidia-gpu.c index f9a69b1..b5a54aa 100644 --- a/drivers/i2c/busses/i2c-nvidia-gpu.c +++ b/drivers/i2c/busses/i2c-nvidia-gpu.c @@ -84,11 +84,6 @@ static int gpu_i2c_check_status(struct gpu_i2c_dev *i2cd) (val & I2C_MST_CNTL_STATUS) != I2C_MST_CNTL_STATUS_BUS_BUSY, 500, 1000 * USEC_PER_MSEC); - if (ret) { - dev_err(i2cd->dev, "i2c timeout error %x\n", val); - return -ETIMEDOUT; - } - val = readl(i2cd->regs + I2C_MST_CNTL); switch (val & I2C_MST_CNTL_STATUS) { case I2C_MST_CNTL_STATUS_OKAY: @@ -97,6 +92,8 @@ static int gpu_i2c_check_status(struct gpu_i2c_dev *i2cd) return -ENXIO; case I2C_MST_CNTL_STATUS_TIMEOUT: return -ETIMEDOUT; + case I2C_MST_CNTL_STATUS_BUS_BUSY: + return -EBUSY; default: return 0; } diff --git a/drivers/usb/typec/ucsi/ucsi_ccg.c b/drivers/usb/typec/ucsi/ucsi_ccg.c index 2b7b5de..93c6ffa 100644 --- a/drivers/usb/typec/ucsi/ucsi_ccg.c +++ b/drivers/usb/typec/ucsi/ucsi_ccg.c @@ -252,7 +252,10 @@ static int ccg_read(struct ucsi_ccg *uc, u16 rab, u8 *data, u32 len) put_unaligned_le16(rab, buf); status = i2c_transfer(client->adapter, msgs, ARRAY_SIZE(msgs)); if (status < 0) { - dev_err(uc->dev, "i2c_transfer failed %d\n", status); + if (uc->fw_build != CCG_FW_BUILD_NVIDIA || + status != -EBUSY) + dev_err(uc->dev, "i2c_transfer failed %d\n", + status); pm_runtime_put_sync(uc->dev); return status; } @@ -289,7 +292,8 @@ static int ccg_write(struct ucsi_ccg *uc, u16 rab, const u8 *data, u32 len) pm_runtime_get_sync(uc->dev); status = i2c_transfer(client->adapter, msgs, ARRAY_SIZE(msgs)); if (status < 0) { - dev_err(uc->dev, "i2c_transfer failed %d\n", status); + if (uc->fw_build != CCG_FW_BUILD_NVIDIA || status != -EBUSY) + dev_err(uc->dev, "i2c_transfer failed %d\n", status); pm_runtime_put_sync(uc->dev); kfree(buf); return status; @@ -1346,7 +1350,10 @@ static int ucsi_ccg_probe(struct i2c_client *client, /* reset ccg device and initialize ucsi */ status = ucsi_ccg_init(uc); if (status < 0) { - dev_err(uc->dev, "ucsi_ccg_init failed - %d\n", status); + if (uc->fw_build == CCG_FW_BUILD_NVIDIA && status == -EBUSY) + dev_info(uc->dev, "USB typec not present\n"); + else + dev_err(uc->dev, "ucsi_ccg_init failed - %d\n", status); return status; }
Transmitting to our Mageia kernel Dev. (In reply to Ajay Gupta from comment #19) > Changes in comment#18 is not acceptable as-is. > > Hello Oudelet > Please help test below change, If this works then I can post them for review. > > diff --git a/drivers/i2c/busses/i2c-nvidia-gpu.c > b/drivers/i2c/busses/i2c-nvidia-gpu.c > index f9a69b1..b5a54aa 100644 > --- a/drivers/i2c/busses/i2c-nvidia-gpu.c > +++ b/drivers/i2c/busses/i2c-nvidia-gpu.c > @@ -84,11 +84,6 @@ static int gpu_i2c_check_status(struct gpu_i2c_dev *i2cd) > (val & I2C_MST_CNTL_STATUS) != > I2C_MST_CNTL_STATUS_BUS_BUSY, > 500, 1000 * USEC_PER_MSEC); > > - if (ret) { > - dev_err(i2cd->dev, "i2c timeout error %x\n", val); > - return -ETIMEDOUT; > - } > - > val = readl(i2cd->regs + I2C_MST_CNTL); > switch (val & I2C_MST_CNTL_STATUS) { > case I2C_MST_CNTL_STATUS_OKAY: > @@ -97,6 +92,8 @@ static int gpu_i2c_check_status(struct gpu_i2c_dev *i2cd) > return -ENXIO; > case I2C_MST_CNTL_STATUS_TIMEOUT: > return -ETIMEDOUT; > + case I2C_MST_CNTL_STATUS_BUS_BUSY: > + return -EBUSY; > default: > return 0; > } > diff --git a/drivers/usb/typec/ucsi/ucsi_ccg.c > b/drivers/usb/typec/ucsi/ucsi_ccg.c > index 2b7b5de..93c6ffa 100644 > --- a/drivers/usb/typec/ucsi/ucsi_ccg.c > +++ b/drivers/usb/typec/ucsi/ucsi_ccg.c > @@ -252,7 +252,10 @@ static int ccg_read(struct ucsi_ccg *uc, u16 rab, u8 > *data, u32 len) > put_unaligned_le16(rab, buf); > status = i2c_transfer(client->adapter, msgs, > ARRAY_SIZE(msgs)); > if (status < 0) { > - dev_err(uc->dev, "i2c_transfer failed %d\n", status); > + if (uc->fw_build != CCG_FW_BUILD_NVIDIA || > + status != -EBUSY) > + dev_err(uc->dev, "i2c_transfer failed %d\n", > + status); > pm_runtime_put_sync(uc->dev); > return status; > } > @@ -289,7 +292,8 @@ static int ccg_write(struct ucsi_ccg *uc, u16 rab, const > u8 *data, u32 len) > pm_runtime_get_sync(uc->dev); > status = i2c_transfer(client->adapter, msgs, ARRAY_SIZE(msgs)); > if (status < 0) { > - dev_err(uc->dev, "i2c_transfer failed %d\n", status); > + if (uc->fw_build != CCG_FW_BUILD_NVIDIA || status != -EBUSY) > + dev_err(uc->dev, "i2c_transfer failed %d\n", status); > pm_runtime_put_sync(uc->dev); > kfree(buf); > return status; > @@ -1346,7 +1350,10 @@ static int ucsi_ccg_probe(struct i2c_client *client, > /* reset ccg device and initialize ucsi */ > status = ucsi_ccg_init(uc); > if (status < 0) { > - dev_err(uc->dev, "ucsi_ccg_init failed - %d\n", status); > + if (uc->fw_build == CCG_FW_BUILD_NVIDIA && status == -EBUSY) > + dev_info(uc->dev, "USB typec not present\n"); > + else > + dev_err(uc->dev, "ucsi_ccg_init failed - %d\n", > status); > return status; > }
Same bug happening with a GTX 1650 SUPER in MINT 20 with kernel 5.4.0-52-generic How can I test that patch?
(In reply to Ajay Gupta from comment #19) > Changes in comment#18 is not acceptable as-is. > > Hello Oudelet > Please help test below change, If this works then I can post them for review. > > diff --git a/drivers/i2c/busses/i2c-nvidia-gpu.c > b/drivers/i2c/busses/i2c-nvidia-gpu.c > index f9a69b1..b5a54aa 100644 > --- a/drivers/i2c/busses/i2c-nvidia-gpu.c > +++ b/drivers/i2c/busses/i2c-nvidia-gpu.c > @@ -84,11 +84,6 @@ static int gpu_i2c_check_status(struct gpu_i2c_dev *i2cd) > (val & I2C_MST_CNTL_STATUS) != > I2C_MST_CNTL_STATUS_BUS_BUSY, > 500, 1000 * USEC_PER_MSEC); > > - if (ret) { > - dev_err(i2cd->dev, "i2c timeout error %x\n", val); > - return -ETIMEDOUT; > - } > - > val = readl(i2cd->regs + I2C_MST_CNTL); > switch (val & I2C_MST_CNTL_STATUS) { > case I2C_MST_CNTL_STATUS_OKAY: > @@ -97,6 +92,8 @@ static int gpu_i2c_check_status(struct gpu_i2c_dev *i2cd) > return -ENXIO; > case I2C_MST_CNTL_STATUS_TIMEOUT: > return -ETIMEDOUT; > + case I2C_MST_CNTL_STATUS_BUS_BUSY: > + return -EBUSY; > default: > return 0; > } > diff --git a/drivers/usb/typec/ucsi/ucsi_ccg.c > b/drivers/usb/typec/ucsi/ucsi_ccg.c > index 2b7b5de..93c6ffa 100644 > --- a/drivers/usb/typec/ucsi/ucsi_ccg.c > +++ b/drivers/usb/typec/ucsi/ucsi_ccg.c > @@ -252,7 +252,10 @@ static int ccg_read(struct ucsi_ccg *uc, u16 rab, u8 > *data, u32 len) > put_unaligned_le16(rab, buf); > status = i2c_transfer(client->adapter, msgs, > ARRAY_SIZE(msgs)); > if (status < 0) { > - dev_err(uc->dev, "i2c_transfer failed %d\n", status); > + if (uc->fw_build != CCG_FW_BUILD_NVIDIA || > + status != -EBUSY) > + dev_err(uc->dev, "i2c_transfer failed %d\n", > + status); > pm_runtime_put_sync(uc->dev); > return status; > } > @@ -289,7 +292,8 @@ static int ccg_write(struct ucsi_ccg *uc, u16 rab, const > u8 *data, u32 len) > pm_runtime_get_sync(uc->dev); > status = i2c_transfer(client->adapter, msgs, ARRAY_SIZE(msgs)); > if (status < 0) { > - dev_err(uc->dev, "i2c_transfer failed %d\n", status); > + if (uc->fw_build != CCG_FW_BUILD_NVIDIA || status != -EBUSY) > + dev_err(uc->dev, "i2c_transfer failed %d\n", status); > pm_runtime_put_sync(uc->dev); > kfree(buf); > return status; > @@ -1346,7 +1350,10 @@ static int ucsi_ccg_probe(struct i2c_client *client, > /* reset ccg device and initialize ucsi */ > status = ucsi_ccg_init(uc); > if (status < 0) { > - dev_err(uc->dev, "ucsi_ccg_init failed - %d\n", status); > + if (uc->fw_build == CCG_FW_BUILD_NVIDIA && status == -EBUSY) > + dev_info(uc->dev, "USB typec not present\n"); > + else > + dev_err(uc->dev, "ucsi_ccg_init failed - %d\n", > status); > return status; > } Hello, I found this thread while searching for a solution for the following messages, ``` kernel: nvidia-gpu 0000:01:00.3: i2c timeout error e0000000 kernel: ucsi_ccg 1-0008: i2c_transfer failed -1101 ``` My system contains an RTX 2060 and a Ryzen 7 4800H, currently runs Manjaro with kernel v5.9.8. I applied the patches yesterday and have rebooted several times. I can confirm those messages are not present in the log after any of the reboots and no inestability was introduced. Cheers!
(In reply to nopej92270 from comment #21) > Same bug happening with a GTX 1650 SUPER in MINT 20 with kernel > 5.4.0-52-generic > > How can I test that patch? Same here. Can we get some help please.
I have the same problem in my Destktop - Kubuntu 18.04 - GTX 2070 Super + AMD Ryzen 3 3600. I have applied patch which was mentioned in Comment 19 to ubuntu kernel hwe 5.4.57. But it not worked.
(In reply to Tu Phung Van from comment #24) > I have the same problem in my Destktop > - Kubuntu 18.04 > - GTX 2070 Super + AMD Ryzen 3 3600. > I have applied patch which was mentioned in Comment 19 to ubuntu kernel hwe > 5.4.57. > But it not worked. Hi Tu Phung Van Comment #22 confirms that no more error log seen with patch in comment #19. Please update on what is not working for you? Thanks Ajay
(In reply to Ajay Gupta from comment #25) > (In reply to Tu Phung Van from comment #24) > > I have the same problem in my Destktop > > - Kubuntu 18.04 > > - GTX 2070 Super + AMD Ryzen 3 3600. > > I have applied patch which was mentioned in Comment 19 to ubuntu kernel hwe > > 5.4.57. > > But it not worked. > > Hi Tu Phung Van > Comment #22 confirms that no more error log seen with patch in comment #19. > Please update on what is not working for you? > > Thanks > Ajay Hi Ajay. I had same problem after applying your patch to Kernel HWE 5.4.57. Then I tried reboot about 5 times. Wow, it worked. I dont't know why??Maybe my hardware was not stable. Recently, I've updated to 5.4.0-58-generic (from Ubuntu). No more errors. Thanks. Tu
Hi, I am on kernel 5.11.8 and I have the same issue I have an asus ROG Zephryus G14 with nvidia GTX 1660ti and I have an usb c port -> I don't know if its connected to the nvidia or to the internal amd gpu [ +0.004231] nvidia-gpu 0000:01:00.3: i2c timeout error e0000000 [ +0.001641] ucsi_ccg 1-0008: i2c_transfer failed -110 [ +0.001287] ucsi_ccg 1-0008: ucsi_ccg_init failed - -110 [ +0.000948] ucsi_ccg: probe of 1-0008 failed with error -110 [ +0.004231] nvidia-gpu 0000:01:00.3: i2c timeout error e0000000 [ +0.001641] ucsi_ccg 1-0008: i2c_transfer failed -110 [ +0.001287] ucsi_ccg 1-0008: ucsi_ccg_init failed - -110 [ +0.000948] ucsi_ccg: probe of 1-0008 failed with error -110
Created attachment 296055 [details] dmesg kernel 5.11.8 ubuntu 20.10
5.11.15, Arch Linux, GTX 1660 Ti 6GB, exactly the same error text as Nicola Lunghi's. If that patch has been applied to kernel 5.9.x and it has fixed the problem, it must have been removed from 5.11 which doesn't make much sense - why removing a fix?
Created attachment 296853 [details] dmesg 5.12.4-arch1-2 RTX 2060 I also noticed that error message on 5.12.4-arch1-2 (Card RTX 2060).
Same error message than Nicola Lunghi in a 5.12.5-arch1-1 kernel and a 2060RTX card. X will not start. Xorg process falls into a D state and although mouse and keyboard will not respond I can connect via ssh to the computer. Here my Xorg.0.log https://pastebin.com/pdUvrT9D
It is happens using the lts kernel and lts drivers (although these ones are (In reply to Luis Ángel Fernández Fernández from comment #31) > Same error message than Nicola Lunghi in a 5.12.5-arch1-1 kernel and a > 2060RTX card. X will not start. Xorg process falls into a D state and > although mouse and keyboard will not respond I can connect via ssh to the > computer. Here my Xorg.0.log https://pastebin.com/pdUvrT9D Here https://termbin.com/ronu0 my dmesg.