Bug 69521 - [ums OOPS] i915 warning during resume from s3
Summary: [ums OOPS] i915 warning during resume from s3
Status: RESOLVED CODE_FIX
Alias: None
Product: Drivers
Classification: Unclassified
Component: Video(DRI - Intel) (show other bugs)
Hardware: i386 Linux
: P1 blocking
Assignee: Rodrigo Vivi
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2014-01-27 16:29 UTC by Valerio Vanni
Modified: 2014-03-11 22:06 UTC (History)
3 users (show)

See Also:
Kernel Version: 3.12.6-3.13.2
Subsystem:
Regression: No
Bisected commit-id:


Attachments
logs and configs (152.78 KB, text/plain)
2014-01-27 16:29 UTC, Valerio Vanni
Details
logs with 3.12.8 on squeeze (12.31 KB, text/plain)
2014-01-29 22:22 UTC, Valerio Vanni
Details
logs with 3.12.9 on squeeze (11.99 KB, text/plain)
2014-01-29 23:51 UTC, Valerio Vanni
Details
logs 3.12.9 on Lenny (84.12 KB, text/plain)
2014-01-31 02:31 UTC, Valerio Vanni
Details
logs nightly on lenny (79.41 KB, text/plain)
2014-01-31 02:35 UTC, Valerio Vanni
Details
logs with 3.13.1 on Lenny - suspend (40.13 KB, text/plain)
2014-02-03 01:26 UTC, Valerio Vanni
Details
logs with 3.13.1 on Lenny - CTRL+ALT+F1 (40.65 KB, text/plain)
2014-02-03 01:26 UTC, Valerio Vanni
Details
guess-patch (1.61 KB, patch)
2014-02-06 16:36 UTC, Rodrigo Vivi
Details | Diff
fix UMS Oops (1.47 KB, patch)
2014-02-07 15:39 UTC, Daniel Vetter
Details | Diff
i915 crash on 3.13.5 (2.87 KB, text/plain)
2014-03-04 22:44 UTC, Valerio Vanni
Details
i915 crash with 3.13.6 (starting google earth) (37.25 KB, text/plain)
2014-03-11 22:05 UTC, Valerio Vanni
Details

Description Valerio Vanni 2014-01-27 16:29:58 UTC
Created attachment 123601 [details]
logs and configs

I already submitted this bug here, because I see it during resume
https://bugzilla.kernel.org/show_bug.cgi?id=69351
but they told me to submint these driver issues one at once in the driver sections.


[1.] One line summary of the problem:

Kernel 3.12.8 gives a warning during resume from S3 sleep

[2.] Full description of the problem/report:

It happens also with 3.12.7, 3.12.6 and 3.13. It doesn't happen with 2.6.24.7.
OS is Debian Lenny, with vanilla kernel. It happens the same after upgrade to Squeeze.

I suspend the machine with s2ram and it goes off.
During the resume it writes that warning, then it seem to work normally, 
except for serial redirection of console.

I use redirection of console to serial port (with lilo directive: append="
console=ttyS0 console=tty0") and I check the messages in another machine.
This stops working as soon as I suspend. It begins to send mangled lines and 
it doesn't work again until the next full restart.
Only the serial redirection has this problem, the local console  works.

I put some log as attachment.
Comment 1 Valerio Vanni 2014-01-29 22:22:31 UTC
Created attachment 123831 [details]
logs with 3.12.8 on squeeze

I tried with kernel 3.12.9, and as far I can see the i915 warning has gone.

I put the log (like the 3.12.8 one, it is about the entire process of suspending and resuming) so that you can see and (eventually) find something I might have missed.
Comment 2 Valerio Vanni 2014-01-29 22:54:57 UTC
Comment on attachment 123831 [details]
logs with 3.12.8 on squeeze

Sorry, I sent this as a "3.12.9", but it was a "3.12.8" (the difference from that at the start of the thread is simply an upgrade from lenny to sqeeze")
Comment 3 Valerio Vanni 2014-01-29 23:51:53 UTC
Created attachment 123861 [details]
logs with 3.12.9 on squeeze

This is 3.12.9 on squeeze, later I'll try also 3.12.9 on Lenny.
Comment 4 Rodrigo Vivi 2014-01-30 08:53:05 UTC
Could you please verify if this issue is present on our development tree. More specifically drm-intel-nightly branch: 
http://cgit.freedesktop.org/~danvet/drm-intel/log/?h=drm-intel-nightly

Also, please boot your kernel with drm.debug=0xe at cmdline and attach the new dmesg.

Thanks,
Rodrigo.
Comment 5 Valerio Vanni 2014-01-30 15:54:28 UTC
Should I boot with drm.debug=0xe on 3.12.8, on the nightly or on both?
Comment 6 Rodrigo Vivi 2014-01-30 16:31:07 UTC
on -nightly please
Comment 7 Valerio Vanni 2014-01-31 02:31:08 UTC
Created attachment 123911 [details]
logs 3.12.9 on Lenny

Logs from 3.12.9
Comment 8 Valerio Vanni 2014-01-31 02:35:17 UTC
Created attachment 123921 [details]
logs nightly on lenny

And these are logs with the nightly kernel.
The crash happens the same as with 3.12.8 and 3.12.9, but the i915 warning does not show anymore.
Comment 9 Rodrigo Vivi 2014-01-31 11:57:00 UTC
Apparently the crash on i915 has gone on development tree. The remaining crash is from v4l driver. In this case I'm afraid you would have to file a new bug against other component.
Comment 10 Rodrigo Vivi 2014-01-31 12:11:15 UTC
I just noticed you already filed another bug for v4l. So I'm going to close this bug for now on our side. feel free to reopen if you need further info.
Also, please keep in mind drm-intel-nightly branch is the development branch for i915 driver. This changes will probably land only on 3.14 or on 3.15.
And this is only the i915 development branch. So probabaly v4l devs doesn't even know the -nightly and will probably ask you to test their own dev branch.
Comment 11 Valerio Vanni 2014-02-03 01:22:43 UTC
I've tried with 3.13.1 and it behaves much worse than the previous (and than the nightly, that as we saw seems to resolve the i915 part of the crash).

If I try to suspend with 3.13.1, it doesn't even reach the point of suspending the machine because it crashes before.
The machine remains alive but the screen goes blank and it doesn't respond anymore to mouse and keyboard. I can telnet into it.from another machine, and serial console continues to transfer messages.

Nearly the same crash happens if I try to switch to text consoles (CTRL+ALT+Fn).

I put logs from both events.
Comment 12 Valerio Vanni 2014-02-03 01:26:07 UTC
Created attachment 124231 [details]
logs with 3.13.1 on Lenny - suspend
Comment 13 Valerio Vanni 2014-02-03 01:26:59 UTC
Created attachment 124241 [details]
logs with 3.13.1 on Lenny - CTRL+ALT+F1
Comment 14 Chris Wilson 2014-02-03 22:00:42 UTC
The simplest fix for the OOPS would appear to be to not call drm_vblank_init() for UMS. I can not spot anything in the kernel that requires vblank for UMS, but it feels like a feature that would have been used...
Comment 15 Rodrigo Vivi 2014-02-06 16:36:53 UTC
Created attachment 124811 [details]
guess-patch

My guess is that this patch fixes the oops your are seeing and it was already merged upstream.
Could you please try this patch on your 3.13.1 kernel?
Comment 16 Valerio Vanni 2014-02-07 03:18:45 UTC
I've tried on 3.13.1 an on 3.13.2. It doesn't fix.
I said it happens only suspending and switching to text console, but I'm noticing that it crashes also in other conditions (randomly). With or without the patch.

The only 3.13.* working is the nightly.
Comment 17 Daniel Vetter 2014-02-07 15:39:34 UTC
Created attachment 125121 [details]
fix UMS Oops

This patch should fix the regression introduced in 3.13. Like Rodrigo said, all other WARNs are in v4l, so you need to file a new bug report for those.
Comment 18 Valerio Vanni 2014-02-08 01:53:43 UTC
This fixes the crash (I've tried on 3.13.2).

I've already filed a bug in the v4l section, and another in the serial one.
Unfortunately, they seem to have not received much attention.
I cannot do anything other than try on newer kernels as long as they come out.
Comment 19 Daniel Vetter 2014-02-11 10:52:40 UTC
I'll send out the pull request for this today, it should show up in stable kernels soonish (but there's a bit a process-enforced delay).

Unfortunately I can't help you with the v4l and serial issues. If they're regressions and repeated poking doesn't help just send out a mail to lkml+Linux Torvalds with cc: to all the relevant maintainers. scripts/get_maintainers.pl can help you witht he maintainer lists.

Linus takes regressions _very_ seriously, so that should get things rolling. But only do it as a last resort, people are occasionally just busy.
Comment 20 Valerio Vanni 2014-02-23 01:37:01 UTC
I've tried 3.13.5 and UMS Oops does not happen anymore.

For the other bugs, I begin to lose hope :-(
For the v4l I've found the point of regression (latest working - first failing). I've sent the mail (to linux-media, Linus and mantainers) but it seems to have been ignored.

I'm new in reporting bug kernel, I'm asking if I'm doing it right or if I can do better.
Comment 21 Daniel Vetter 2014-03-03 07:34:42 UTC
Your bug report here was spot-on so easy for me to resolve. But the different subsystems in the kernel have different ways to track and handle bugs. Best to start here, then escalate to mailing lists and if it's a regression include Linus Torvalds as a last resort - he takes regressions _really_ serious, at least if you timely report them (like this issue here).

Besides that I can't help you more with issues outside of drm/i915.
Comment 22 Valerio Vanni 2014-03-04 22:44:54 UTC
Created attachment 128031 [details]
i915 crash on 3.13.5

Things are moving in the other bug reports.

I've just encountered a i915 crash with 3.13.5 on Debian Squeeze.
I don't know if it's better to open another bug report, for now I put it here.

So far it happened only once, the pointer was inside a VmWare Player window,and then X went down.
Comment 23 Chris Wilson 2014-03-05 00:27:02 UTC
That wasn't a i915 crash per-se, just a very verbose warning that userspace just shot itself in the head. It appears your ddx is severely buggy - that error would be indicative of a use-after-free bug.
Comment 24 Valerio Vanni 2014-03-11 22:05:28 UTC
Created attachment 129021 [details]
i915 crash with 3.13.6 (starting google earth)

I just got this in 3.13.6.
To reproduce it I simply have to start GoogleEarth.
The machine is stuck, I have to power off with the button.

I've tried with 3.13, and it's the same.
With 3.12.14 it's a bit better: it's reachable from another machine with telnet (but it's not able to shutdown), and it responds to SysRq.

The nightly does not crash.

Please tell me if I should open another bug, or if I'm doing some mistake.

Note You need to log in before you can comment on or make changes to this bug.