Created attachment 136101 [details] 0001-drm-i915-don-t-wait-for-power-cycle-when-waiting-for.patch On a Dell XPS11 convertable (Intel HSW-YLT), i915 resume from S3 got slower in 3.15-rc1 Linux 3.14: 400ms Linux 3.15-rc1: 1200ms If the attached patch is reverted, 400ms resume time is restored. Unfortunately, the attached patch also had the dramatic _benefit_ of reducing i915 suspend latency on the same XPS11 to under 150ms from over 1300ms. The numbers are different when measured on a Lenovo Yoga2 Pro HSW-ULT, but the trend is the same. The patch increases resume time to about 700ms from under 400ms, and the patch reduces suspend time to under 300ms from over 1100ms. Obviously, it would be ideal to capture the suspend speed benefit of this patch without having the resume cost. However, if we have to pick between suspend and resume performance, resume performance is more important. Note that it is easy to graphically measure resume latency with this tool: https://01.org/suspendresume if analyze_suspend.py is invoked with the -f argument, it also produces an ftrace that can show exactly where the time goes. In the case of 3.15-rc1 on the Dell XPS11: intel_edp_panel_on takes > 1000ms wait_panel_power_cycle takes 800ms wait_panel_status takes 200ms
The regressing commit is commit 1a5ef5b7b4ee4a9514ca10057b08d978431a03e2 Author: Paulo Zanoni <paulo.r.zanoni@intel.com> Date: Thu Dec 19 14:29:43 2013 -0200 drm/i915: don't wait for power cycle when waiting for power off
(In reply to Len Brown from comment #0) > if analyze_suspend.py is invoked with the -f argument, it also > produces an ftrace that can show exactly where the time goes. > In the case of 3.15-rc1 on the Dell XPS11: > > intel_edp_panel_on takes > 1000ms > wait_panel_power_cycle takes 800ms > wait_panel_status takes 200ms Please share these figures with the regressing commit reverted, with the time hogs there.
Created attachment 136261 [details] avoid full hotplug detect on resume This one may be sufficient. If not, try the next one (or just try the next one for fun, should make things really fast).
Created attachment 136271 [details] make crtc enable asynchronous
Re: patch in comment #3 No measurable difference. resume in 1246ms becomes 1233ms Same time-consumers as above. Here's a copy/paste of the anayze_suspend.py ftrace output, sorry the indenting is not retained. I'll see if i can attach the whole compressed output. 0000:00:02.0 resume_general (1212.094 ms @ 141.618 to 1353.712) dpm_run_callback (1212.093 ms @ 141.618) initcall_debug_start (0.023 ms @ 141.619) pm_dev_dbg (0.000 ms @ 141.644) pci_pm_resume (1212.052 ms @ 141.645) pci_has_legacy_pm_support (0.000 ms @ 141.646) pci_pm_default_resume (0.003 ms @ 141.647) i915_pm_resume [i915] (1212.045 ms @ 141.651) dev_get_drvdata (0.000 ms @ 141.651) i915_resume [i915] (1212.043 ms @ 141.653) __i915_drm_thaw [i915] (1212.041 ms @ 141.653) mutex_lock (0.001 ms @ 141.654) i915_gem_restore_gtt_mappings [i915] (88.342 ms @ 141.656) mutex_unlock (0.000 ms @ 229.999) i915_restore_state [i915] (0.710 ms @ 230.000) intel_opregion_setup [i915] (6.124 ms @ 230.710) intel_init_pch_refclk [i915] (0.073 ms @ 236.835) drm_mode_config_reset [drm] (0.001 ms @ 236.909) mutex_lock (0.001 ms @ 236.910) i915_gem_init_hw [i915] (4.437 ms @ 236.911) mutex_unlock (0.000 ms @ 241.349) drm_irq_install [drm] (0.197 ms @ 241.349) intel_modeset_init_hw [i915] (0.469 ms @ 241.546) drm_modeset_lock_all [drm] (0.004 ms @ 242.016) intel_modeset_setup_hw_state [i915] (1071.893 ms @ 242.020) haswell_get_pipe_config [i915] (0.011 ms @ 242.020) haswell_get_pipe_config [i915] (0.013 ms @ 242.031) haswell_get_pipe_config [i915] (0.012 ms @ 242.045) intel_ddi_setup_hw_pll_state [i915] (0.000 ms @ 242.058) intel_ddi_get_hw_state [i915] (0.006 ms @ 242.058) intel_ddi_get_hw_state [i915] (0.005 ms @ 242.064) intel_ddi_get_hw_state [i915] (0.005 ms @ 242.070) intel_ddi_connector_get_hw_state [i915] (0.008 ms @ 242.075) intel_ddi_connector_get_hw_state [i915] (0.008 ms @ 242.084) intel_ddi_connector_get_hw_state [i915] (0.008 ms @ 242.092) intel_ddi_connector_get_hw_state [i915] (0.009 ms @ 242.101) gen6_read32 [i915] (0.002 ms @ 242.110) hsw_write32 [i915] (0.003 ms @ 242.112) intel_crtc_update_dpms [i915] (0.001 ms @ 242.115) intel_dump_pipe_config [i915] (0.001 ms @ 242.117) gen6_read32 [i915] (0.002 ms @ 242.119) hsw_write32 [i915] (0.003 ms @ 242.121) intel_crtc_update_dpms [i915] (0.001 ms @ 242.124) intel_dump_pipe_config [i915] (0.001 ms @ 242.126) gen6_read32 [i915] (0.002 ms @ 242.127) hsw_write32 [i915] (0.003 ms @ 242.129) intel_crtc_update_dpms [i915] (0.001 ms @ 242.132) intel_dump_pipe_config [i915] (0.001 ms @ 242.133) ilk_wm_get_hw_state [i915] (0.031 ms @ 242.135) i915_redisable_vga [i915] (0.007 ms @ 242.167) __intel_set_mode [i915] (1071.285 ms @ 242.174) kmem_cache_alloc_trace (0.009 ms @ 242.174) kmem_cache_alloc_trace (0.005 ms @ 242.183) drm_mode_copy [drm] (0.000 ms @ 242.188) drm_mode_copy [drm] (0.000 ms @ 242.189) drm_mode_set_crtcinfo [drm] (0.000 ms @ 242.190) drm_mode_set_crtcinfo [drm] (0.000 ms @ 242.191) intel_ddi_compute_config [i915] (0.041 ms @ 242.191) intel_pipe_has_type [i915] (0.000 ms @ 242.233) hsw_crtc_supports_ips [i915] (0.000 ms @ 242.233) intel_dump_pipe_config [i915] (0.001 ms @ 242.234) drm_calc_timestamping_constants [drm] (0.000 ms @ 242.236) drm_object_property_set_value [drm] (0.000 ms @ 242.237) haswell_modeset_global_resources [i915] (0.039 ms @ 242.237) drm_vblank_pre_modeset [drm] (0.004 ms @ 242.277) haswell_crtc_mode_set [i915] (0.141 ms @ 242.281) drm_vblank_post_modeset [drm] (0.001 ms @ 242.422) intel_ddi_mode_set [i915] (0.001 ms @ 242.424) haswell_crtc_enable [i915] (1071.018 ms @ 242.425) intel_set_cpu_fifo_underrun_reporting [i915] (0.001 ms @ 242.426) intel_ddi_pre_enable [i915] (1035.452 ms @ 242.428) intel_ddi_get_encoder_port [i915] (0.000 ms @ 242.428) intel_edp_panel_on [i915] (1015.037 ms @ 242.429) edp_have_panel_power [i915] (0.003 ms @ 242.429) wait_panel_power_cycle [i915] (806.108 ms @ 242.433) msecs_to_jiffies (0.000 ms @ 242.433) schedule_timeout_uninterruptible (806.101 ms @ 242.434) wait_panel_status [i915] (0.005 ms @ 1048.536) _pp_ctrl_reg [i915] (0.000 ms @ 1048.541) ironlake_get_pp_control [i915] (0.003 ms @ 1048.542) hsw_write32 [i915] (0.003 ms @ 1048.545) gen6_read32 [i915] (0.002 ms @ 1048.549) wait_panel_status [i915] (208.914 ms @ 1048.552) hsw_write32 [i915] (0.005 ms @ 1257.467) intel_dp_sink_dpms [i915] (1.040 ms @ 1257.473) intel_dp_start_link_train [i915] (11.334 ms @ 1258.514) intel_dp_complete_link_train [i915] (8.031 ms @ 1269.849) intel_ddi_enable_pipe_clock [i915] (0.002 ms @ 1277.882) ironlake_pfit_enable [i915] (0.000 ms @ 1277.885) intel_crtc_load_lut [i915] (1.639 ms @ 1277.886) intel_ddi_set_pipe_settings [i915] (0.007 ms @ 1279.526) intel_ddi_enable_transcoder_func [i915] (0.008 ms @ 1279.534) intel_update_watermarks [i915] (0.086 ms @ 1279.543) intel_enable_pipe [i915] (0.028 ms @ 1279.630) intel_enable_ddi [i915] (0.505 ms @ 1279.659) intel_opregion_notify_encoder [i915] (0.002 ms @ 1280.165) intel_enable_primary_hw_plane [i915] (17.246 ms @ 1280.168) intel_enable_planes [i915] (0.002 ms @ 1297.415) intel_crtc_update_cursor [i915] (0.000 ms @ 1297.417) hsw_enable_ips [i915] (15.990 ms @ 1297.418) mutex_lock (0.001 ms @ 1313.409) intel_update_fbc [i915] (0.031 ms @ 1313.411) mutex_unlock (0.000 ms @ 1313.443) kfree (0.007 ms @ 1313.445) kfree (0.007 ms @ 1313.452) __intel_set_mode [i915] (0.080 ms @ 1313.461) __intel_set_mode [i915] (0.077 ms @ 1313.542) intel_modeset_check_state [i915] (0.292 ms @ 1313.620) drm_modeset_unlock_all [drm] (0.004 ms @ 1313.914) intel_hpd_init [i915] (0.030 ms @ 1313.918) drm_sysfs_hotplug_event [drm] (0.156 ms @ 1313.950) intel_opregion_init [i915] (38.115 ms @ 1314.107) console_trylock (0.005 ms @ 1352.222) queue_work_on (0.031 ms @ 1352.228) mutex_lock (1.432 ms @ 1352.259) mutex_unlock (0.000 ms @ 1353.692) intel_runtime_pm_put [i915] (0.001 ms @ 1353.693) drm_kms_helper_poll_enable [drm_kms_helper] (0.000 ms @ 1353.696) __suspend_report_result (0.000 ms @ 1353.698) initcall_debug_report (0.013 ms @ 1353.699)
bugzilla accepts 5MB attachments, but the compressed trace is 9MB
the patch in comment #4 did not apply to upstream Linux. What should I apply it to?
It was based off the drm-intel-nightly branch from git://anongit.freedesktop.org/drm-intel, but that was last week, we've probably had 100 patches land since then...
right, it doesn't apply to today's drm-inte-nightly either
drm-intel-nightly doesn't boot. So even if you patch did apply there, that branch isn't useful. Also, the fix to this bug must be with a patch that applies to Linux-3.15-rc. So if you can send a patch that applies to upstream Linux, I'll be more than happy to test it.
Can you try the fast-modeset branch of my kernel.org tree? git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/drm-intel
tested: git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/drm-intel fast-modeset branch, as of top commit: commit 895d2bde2fc067dcc73edfa4857cd6b088fceabb Author: Jesse Barnes <jbarnes@virtuousgeek.org> Date: Thu May 15 12:18:41 2014 -0700 drm/i915: make CRTC enable/disable asynchronous it panics on boot.
> it panics on boot. not always. sometimes system freeze, sometime system reset. once I got it to boot and was able to measure suspend/resume. i915 suspend: 115ms This is in parallel with the Lynx Point Audio, which is currently 224ms i915 resume: invisible! I ran this back-to-back with analyze_syspend -x2, and did not detect any deferred i915 resume delaying a subsequent suspend. (what happens if a suspend request occurs when your deferred resume has not completed?) So this patch looks like a very interesting prototype, but very unstable.
Yeah I don't defer the whole suspend, just the mode set part. And that gets synchronized later on (or at suspend if we suspend again that fast), but as part of a sync_work() call rather than a sync on the whole resume. For the panics, do you have any logs? I didn't see panics here on my MBA when testing, but I probably have a different config than you.
unfortunately, no logs on the panics, and death seemed to come in a variety of ways... They seemed to happen when the screen was not printing characters and then when things stopped all the only thing on the screen was something about i915 switching back to text mode -- presumably because of the panic that had since scrolled off the screen. drm-intel-nightly has died on boot all my recent attempts to use it -- so you might be inheriting something bad from drm-intel tree. If you can re-base base your patch on upstream, then apples/apples may be more reliable and objective. If there are any knobs to turn that may make the failure more visible, I'll be happy to turn them. I can also try this tree on another box.
Lenovo Yoga2 Pro had no problems booting the same kernel binary (built from kernel in comment #11) that the Dell XPS11 struggled with. I rebooted the Yoga2 5 times w/ no issues. suspend/resume also worked fine on the Yoga2. i915 supsend time was 277ms, resume time not visible (/me wonders if we should add trace points to this patch so that analyze_suspend can see when the i915 does this work). Though stable, the Yoga2 277ms is considerably slower than on the Dell XPS11, which suspended in 115ms. Further, this 277ms is the long-pole in suspend on the Yoga2, with the next slowest parallel device taking 25ms.
Created attachment 137281 [details] screen image of i915 kernel fault from Dell XPS11 The Lenovo had vt.handoff=7 on the cmdline and the Dell did not, so I slapped that on the Dell cmdline and it booted, suspended/resumed, but didn't reboot. The good news is that the resulting crash showed on the screen -- image attached. Hopefully it gives a clue.
On the XPS11, in about 10 boot requests, I get 3-4 successful boots, a couple of resets, and twice the stack trace attached in comment #17. If you can't read it in the image, the fault address is 0x00..000088
(In reply to Len Brown from comment #18) > On the XPS11, in about 10 boot requests, I get 3-4 successful boots, > a couple of resets, and twice the stack trace attached in comment #17. > > If you can't read it in the image, the fault address is 0x00..000088 commit 46470fc932ac8a0e8317a220b3f4ea4ed903338e Author: Mika Kuoppala <mika.kuoppala@linux.intel.com> Date: Wed May 21 19:01:06 2014 +0300 drm/i915: Add null state batch to active list should fix the oops.
I extracted comment #19's 46470fc932a from drm-intel tree, and applied it on top of jbarnes branch in comment #11. On the Lenovo Yoga2 Pro, it works fine -- though it worked fine before. On the Del XPS11, the system is no longer bringing up the GUI -- just the background and the X cursor -- so it seems that the crashes hosed the graphics config on this system somewhat. I'm going to re-test after a re-install.
On fresh install of Ubuntu 14.04 on the Dell XPS11, the patch from comment #19 applied on top of the tree from comment #11 has booted without problems 5 times in 5 attempts.
Jesse, what's the status with the patches?
Daniel rejected them based on the assumption that we'd have async atomic mode setting at some point.
Well then, what's the status of this bug?
We need to re-measure, but iirc the regression was caused by us honoring panel timings, so it was really fixing a bug. We can keep this bug open to track the resume performance in general, or just close it. The resume time should be fixed up after we land the async atomic stuff.
(In reply to Jesse Barnes from comment #25) > We need to re-measure, but iirc the regression was caused by us honoring > panel timings, so it was really fixing a bug. > > We can keep this bug open to track the resume performance in general, or > just close it. The resume time should be fixed up after we land the async > atomic stuff. Closing. The power sequencing code has changed so much in the past months that any measurements here are not valid anymore. Feel free to reopen with current data.
Created attachment 241901 [details] current analyze_suspend output This problem still exists in Linux-4.8.0, though the delay on this machine is now 909 ms, rather than 1,200 ms. Attached, please find the analzye_suspend.py output as run on the same laptop as above, installed with Ubuntu 16.10 and its 4.8.0-22-generic kernel. Curiously, i915 resume invokes intel_hdmi_detect FOUR TIMES -- twice in a kthread, and again twice in-line. This costs 160ms each time it is invoked.
This bug has fallen below the radar, not least because we've consolidated all bug handling to the freedesktop.org bugzilla. Created a new bug there, and resolving this as moved. https://bugs.freedesktop.org/show_bug.cgi?id=100540