Bug 106431

Summary: On ASUS A8JN laptop with G72M GPU dmesg is flooded with error messages
Product: Drivers Reporter: RussianNeuroMancer (russianneuromancer)
Component: Video(DRI - non Intel)Assignee: drivers_video-dri
Status: NEW ---    
Severity: normal CC: daniel, ikey, imirkin, thierry.reding
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 4.3rc6 Tree: Mainline
Regression: No
Attachments: dmesg
attempt at fixing pre-nv50 pageflip events
Revised version of Daniel's patch

Description RussianNeuroMancer 2015-10-21 15:25:56 UTC
Created attachment 190751 [details]
dmesg

On ASUS A8JN laptop with NV46 (G72, GeForce Go 7300 GPU) dmesg is flooded with errors like this one: 

[   84.362383] WARNING: CPU: 0 PID: 1136 at /home/kernel/COD/linux/drivers/gpu/drm/drm_irq.c:924 drm_vblank_count_and_time+0x73/0x80 [drm]()
[   84.362386] Modules linked in: drbg ansi_cprng ctr ccm gspca_vc032x gspca_main videodev media snd_hda_codec_analog snd_hda_codec_generic snd_hda_intel snd_hda_codec arc4 r852 snd_hda_core sm_common nand iwl3945 snd_hwdep iwlegacy nand_ecc coretemp snd_pcm nand_bch mac80211 snd_seq_midi bch asus_laptop nand_ids snd_seq_midi_event mtd snd_rawmidi input_leds joydev cfg80211 snd_seq serio_raw lpc_ich snd_seq_device snd_timer r592 snd soundcore memstick sparse_keymap input_polldev irda shpchp crc_ccitt mac_hid parport_pc ppdev lp parport autofs4 btrfs xor raid6_pq nouveau psmouse mxm_wmi firewire_ohci wmi i2c_algo_bit pata_acpi ttm sdhci_pci drm_kms_helper sdhci syscopyarea firewire_core sysfillrect sysimgblt crc_itu_t fb_sys_fops drm r8169 mii video fjes
[   84.362465] CPU: 0 PID: 1136 Comm: gnome-shell Not tainted 4.3.0-040300rc6-generic #201510182030
[   84.362467] Hardware name: ASUSTeK Computer Inc.  A8JN                /A8JN      , BIOS A8JncAS.211  03/02/2007
[   84.362470]  ffffffffc00719a8 ffff8800b7a03a90 ffffffff813a62ed 0000000000000000
[   84.362474]  ffff8800b7a03ac8 ffffffff8107a586 ffff880035a3e000 ffff8800b2239780
[   84.362478]  ffff8800b7a03b00 00000000ffffffff ffff880035a32000 ffff8800b7a03ad8
[   84.362482] Call Trace:
[   84.362485]  <IRQ>  [<ffffffff813a62ed>] dump_stack+0x44/0x57
[   84.362497]  [<ffffffff8107a586>] warn_slowpath_common+0x86/0xc0
[   84.362500]  [<ffffffff8107a67a>] warn_slowpath_null+0x1a/0x20
[   84.362513]  [<ffffffffc0041d33>] drm_vblank_count_and_time+0x73/0x80 [drm]
[   84.362526]  [<ffffffffc00420c5>] drm_send_vblank_event+0x65/0x70 [drm]
[   84.362530]  [<ffffffff813b6802>] ? __sg_alloc_table+0x72/0x140
[   84.362583]  [<ffffffffc02a2b1b>] nouveau_finish_page_flip+0x7b/0x150 [nouveau]
[   84.362619]  [<ffffffffc02a2c1b>] nouveau_flip_complete+0x2b/0x1f0 [nouveau]
[   84.362623]  [<ffffffff81552b54>] ? scsi_init_sgtable+0x44/0x70
[   84.362626]  [<ffffffff81552bca>] ? scsi_init_io+0x4a/0x1c0
[   84.362631]  [<ffffffff815811cf>] ? ata_sff_exec_command+0x2f/0x40
[   84.362651]  [<ffffffffc0202cd3>] nvif_notify+0xa3/0x180 [nouveau]
[   84.362655]  [<ffffffff8154a9e7>] ? scsi_host_alloc_command+0x47/0xc0
[   84.362691]  [<ffffffffc029787f>] nvkm_client_ntfy+0x5f/0x70 [nouveau]
[   84.362711]  [<ffffffffc0202fb2>] nvkm_client_notify+0x22/0x30 [nouveau]
[   84.362731]  [<ffffffffc0205d66>] nvkm_notify_send+0x86/0x140 [nouveau]
[   84.362751]  [<ffffffffc0203ddc>] nvkm_event_send+0xcc/0xf0 [nouveau]
[   84.362787]  [<ffffffffc029429d>] nvkm_sw_chan_mthd+0x4d/0x60 [nouveau]
[   84.362823]  [<ffffffffc0293851>] nvkm_sw_mthd+0xc1/0x120 [nouveau]
[   84.362859]  [<ffffffffc02642c9>] nv04_fifo_intr+0x6b9/0x820 [nouveau]
[   84.362863]  [<ffffffff81551b31>] ? scsi_run_queue+0x211/0x2a0
[   84.362866]  [<ffffffff8154bd39>] ? scsi_put_command+0x79/0xc0
[   84.362901]  [<ffffffffc0263464>] nvkm_fifo_intr+0x14/0x20 [nouveau]
[   84.362921]  [<ffffffffc020369f>] nvkm_engine_intr+0x1f/0x30 [nouveau]
[   84.362942]  [<ffffffffc0207477>] nvkm_subdev_intr+0x17/0x20 [nouveau]
[   84.362972]  [<ffffffffc0244292>] nvkm_mc_intr+0x72/0x100 [nouveau]
[   84.363003]  [<ffffffffc0248736>] nvkm_pci_intr+0x46/0x70 [nouveau]
[   84.363008]  [<ffffffff810d0469>] handle_irq_event_percpu+0x39/0x180
[   84.363012]  [<ffffffff810d05f5>] handle_irq_event+0x45/0x70
[   84.363015]  [<ffffffff810d3356>] handle_fasteoi_irq+0x96/0x150
[   84.363019]  [<ffffffff810191bd>] handle_irq+0x1d/0x30
[   84.363024]  [<ffffffff817b6a7d>] do_IRQ+0x4d/0xd0
[   84.363027]  [<ffffffff817b4702>] common_interrupt+0x82/0x82
[   84.363029]  <EOI> 
[   84.363032] ---[ end trace a2c6130042049fde ]---

While dmesg is flooded this doesn't seems like affect running Gnome Shell Wayland session.
Comment 1 Ilia Mirkin 2015-10-21 17:43:59 UTC
Looks like this is because pre-tesla supplies a crtcid of -1 for pre-tesla (which your gpu is), and "pipe" was recently made into unsigned int, which causes the warn to trigger.

J'accuse

commit cc1ef118fc099295ae6aabbacc8af94d8d8885eb
Author: Thierry Reding <treding@nvidia.com>
Date:   Wed Aug 12 17:00:31 2015 +0200

    drm/irq: Make pipe unsigned and name consistent
    
    Name all references to the pipe number (CRTC index) consistently to make
    it easier to distinguish which is a pipe number and which is a pointer
    to struct drm_crtc.
    
    While at it also make all references to the pipe number unsigned because
    there is no longer any reason why it should ever be negative.
    
    Signed-off-by: Thierry Reding <treding@nvidia.com>
    Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Comment 2 Daniel Vetter 2015-10-21 18:25:42 UTC
Three ways to fix this:

- open-code the old behaviour (without the warning) again in nouveau, just to shut it up.

- don't register vblanks with the drm core (i.e. no call to drm_vblank_init) on pre-tesla if the hw/driver can't do it since it's just a lie apparently. According to following commit at least:

commit af4870e406126b7ac0ae7c7ce5751f25ebe60f28
Author: Mario Kleiner <mario.kleiner.de@gmail.com>
Date:   Tue May 13 00:42:08 2014 +0200

    drm/nouveau/kms/nv04-nv40: fix pageflip events via special case.

- fix up the underlying issue of sending the vblank events before they happen. Old intel hw is similar in that the pageflip irq from the hw happens when the flip arms, not when it completes at the next vblank. We solve that by enabling the corresponding vblank, and from the flip handler stuff the event someplace where the vblank handler can pick it up. Then on the next vblank we'll send out the event (knowing the timestamp to be accurate) and drop the vblank reference.
Comment 3 Daniel Vetter 2015-10-30 21:57:28 UTC
Created attachment 191661 [details]
attempt at fixing pre-nv50 pageflip events

Ok, I made an attempt at correctly fixing this, i.e. the 3rd option I laid out above. Patch attached, please test.

Note that I'll be on vacation next 2 weeks, so Thierry needs to take over any follow-up work.
Comment 4 Ikey Doherty 2015-11-02 20:38:35 UTC
Hi - we've tested this in Solus Operating System as we encountered the same issue [1]. The patch required slight modification (exposing drm_arm_vblank_event via drmP.h)
Comment 5 Ikey Doherty 2015-11-02 20:40:33 UTC
Created attachment 191861 [details]
Revised version of Daniel's patch

Added a revised version of Daniel's patch to include the symbol in the header.

Validated on Solus, where the bug was actually worse and causing graphical "static" artefacts 

[1] https://plus.google.com/114713706129194876663/posts/DrUqQypp24w
Comment 6 Ikey Doherty 2015-11-02 20:50:55 UTC
Above fixed confirmed as resolving nouveau functionality in X - however the 304 series driver from nvidia is not functional (whereas it's working on 4.1.12). (7600gt)

P.S. I know the nvidia driver isn't anyone's issue here, more of a heads up for those who come looking for answers.
Comment 7 poma 2015-11-07 05:15:34 UTC
See Also:
https://bugs.freedesktop.org/show_bug.cgi?id=92852
Comment 8 poma 2015-11-07 17:09:55 UTC
Tested with:
4.3.0-2.fc22.i686
i.e. 4.3.0-1.fc24.i686 + nv-drm-vblank.patch
https://bugzilla.kernel.org/attachment.cgi?id=191861

Tested-by: poma <pomidorabelisima@gmail.com>
Comment 9 poma 2015-11-07 17:12:44 UTC
Please push to linux-next and mainline 4.3.