Bug 202365

Summary: broked nvidia and vbox drivers
Product: Memory Management Reporter: Cristian Crinteanu (crinteanu.cristian)
Component: OtherAssignee: Andrew Morton (akpm)
Status: NEW ---    
Severity: normal CC: bormant, kazakevichilya, kjhambrick, marcop, Wayne
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 4.4.168 Subsystem:
Regression: No Bisected commit-id:

Description Cristian Crinteanu 2019-01-21 17:12:48 UTC
mm patches introduced in 4.4.168 (maybe other(s) LTS kernel(s) too) nvidia drivers and virtualbox drivers
here is some log:
While installing package dkms-nvidia-current-390.87-4pclos2019:


Creating symlink /var/lib/dkms/nvidia-current/390.87-4pclos2019/source ->
                 /usr/src/nvidia-current-390.87-4pclos2019

DKMS: add Completed.

Preparing kernel 4.4.170-pclos1 for module build:
(This is not compiling a kernel, just preparing kernel symbols)
Storing current .config to be restored when complete
Running Generic preparation routine
make clean....
using /boot/config-4.4.170-pclos1
make oldconfig....
make prepare....

Building module:
cleaning build area....
export IGNORE_CC_MISMATCH=1;'make' KERNEL_UNAME=4.4.170-pclos1 modules SKIP_STACK_VALIDATION=1..............(bad exit status: 2)

Error! Bad return status for module build on kernel: 4.4.170-pclos1 (x86_64)
Consult the make.log in the build directory
/var/lib/dkms/nvidia-current/390.87-4pclos2019/build/ for more information.

and from make.log:

.......
In file included from /var/lib/dkms/nvidia-current/390.87-4pclos2019/build/common/inc/nv-linux.h:21:0,
                 from /var/lib/dkms/nvidia-current/390.87-4pclos2019/build/nvidia/os-mlock.c:15:
/var/lib/dkms/nvidia-current/390.87-4pclos2019/build/nvidia/os-mlock.c: In function ‘os_lock_user_pages’:
/var/lib/dkms/nvidia-current/390.87-4pclos2019/build/nvidia/os-mlock.c:120:48: warning: passing argument 6 of ‘get_user_pages’ makes pointer from integer without a cast [-Wint-conversion]
                             page_count, write, force, user_pages, NULL);
                                                ^
/var/lib/dkms/nvidia-current/390.87-4pclos2019/build/common/inc/nv-mm.h:44:70: note: in definition of macro ‘NV_GET_USER_PAGES’
         get_user_pages(current, current->mm, start, nr_pages, write, force, pages, vmas)
                                                                      ^~~~~
In file included from /var/lib/dkms/nvidia-current/390.87-4pclos2019/build/common/inc/nv-pgprot.h:17:0,
                 from /var/lib/dkms/nvidia-current/390.87-4pclos2019/build/common/inc/nv-linux.h:20,
                 from /var/lib/dkms/nvidia-current/390.87-4pclos2019/build/nvidia/os-mlock.c:15:
/usr/src/kernel-devel-4.4.170-pclos1/include/linux/mm.h:1222:6: note: expected ‘struct page **’ but argument is of type ‘NvBool {aka unsigned char}’
 long get_user_pages(struct task_struct *tsk, struct mm_struct *mm,
      ^~~~~~~~~~~~~~
In file included from /var/lib/dkms/nvidia-current/390.87-4pclos2019/build/common/inc/nv-linux.h:21:0,
                 from /var/lib/dkms/nvidia-current/390.87-4pclos2019/build/nvidia/os-mlock.c:15:
/var/lib/dkms/nvidia-current/390.87-4pclos2019/build/nvidia/os-mlock.c:120:55: warning: passing argument 7 of ‘get_user_pages’ from incompatible pointer type [-Wincompatible-pointer-types]
                             page_count, write, force, user_pages, NULL);
                                                       ^
/var/lib/dkms/nvidia-current/390.87-4pclos2019/build/common/inc/nv-mm.h:44:77: note: in definition of macro ‘NV_GET_USER_PAGES’
         get_user_pages(current, current->mm, start, nr_pages, write, force, pages, vmas)
                                                                             ^~~~~
In file included from /var/lib/dkms/nvidia-current/390.87-4pclos2019/build/common/inc/nv-pgprot.h:17:0,
                 from /var/lib/dkms/nvidia-current/390.87-4pclos2019/build/common/inc/nv-linux.h:20,
                 from /var/lib/dkms/nvidia-current/390.87-4pclos2019/build/nvidia/os-mlock.c:15:
/usr/src/kernel-devel-4.4.170-pclos1/include/linux/mm.h:1222:6: note: expected ‘struct vm_area_struct **’ but argument is of type ‘struct page **’
 long get_user_pages(struct task_struct *tsk, struct mm_struct *mm,
      ^~~~~~~~~~~~~~
In file included from /var/lib/dkms/nvidia-current/390.87-4pclos2019/build/common/inc/nv-linux.h:21:0,
                 from /var/lib/dkms/nvidia-current/390.87-4pclos2019/build/nvidia/os-mlock.c:15:
/var/lib/dkms/nvidia-current/390.87-4pclos2019/build/common/inc/nv-mm.h:44:9: error: too many arguments to function ‘get_user_pages’
         get_user_pages(current, current->mm, start, nr_pages, write, force, pages, vmas)
         ^
/var/lib/dkms/nvidia-current/390.87-4pclos2019/build/nvidia/os-mlock.c:119:11: note: in expansion of macro ‘NV_GET_USER_PAGES’
     ret = NV_GET_USER_PAGES((unsigned long)address,
           ^~~~~~~~~~~~~~~~~
In file included from /var/lib/dkms/nvidia-current/390.87-4pclos2019/build/common/inc/nv-pgprot.h:17:0,
                 from /var/lib/dkms/nvidia-current/390.87-4pclos2019/build/common/inc/nv-linux.h:20,
                 from /var/lib/dkms/nvidia-current/390.87-4pclos2019/build/nvidia/os-mlock.c:15:
/usr/src/kernel-devel-4.4.170-pclos1/include/linux/mm.h:1222:6: note: declared here
 long get_user_pages(struct task_struct *tsk, struct mm_struct *mm,
      ^~~~~~~~~~~~~~
make[3]: *** [/usr/src/kernel-devel-4.4.170-pclos1/scripts/Makefile.build:278: /var/lib/dkms/nvidia-current/390.87-4pclos2019/build/nvidia/os-mlock.o] Error 1
make[2]: *** [/usr/src/kernel-devel-4.4.170-pclos1/Makefile:1429: _module_/var/lib/dkms/nvidia-current/390.87-4pclos2019/build] Error 2
make[2]: Leaving directory '/usr/src/kernel-devel-4.4.170-pclos1'


thx!
Comment 1 Serg Bormant 2019-01-31 18:40:15 UTC
There is API change between 4.4.167 and 4.4.168:

4.4.167:
long get_user_pages(struct task_struct *tsk, struct mm_struct *mm,
		    unsigned long start, unsigned long nr_pages,
		    int write, int force, struct page **pages,
		    struct vm_area_struct **vmas);
4.4.168:
long get_user_pages(struct task_struct *tsk, struct mm_struct *mm,
		    unsigned long start, unsigned long nr_pages,
		    unsigned int gup_flags, struct page **pages,
		    struct vm_area_struct **vmas);

-		    int write, int force, struct page **pages,
+		    unsigned int gup_flags, struct page **pages,

Builds are broken with this API change.
Comment 2 Konrad J Hambrick 2019-02-01 13:12:46 UTC
Nvidia has release 418.40 Beta to address this issue:  https://www.nvidia.com/download/driverResults.aspx/142166/en-us


And mimochodem shared a patch for VirtualBox on LQ:  https://www.linuxquestions.org/questions/slackware-14/virtualbox-fails-to-build-drivers-for-kernel-4-4-172-a-4175647407/#post5956289

I've got a similar issue with VMWare Workstation 12.1 that I've not had time to research yet so I am 'stuck on' 4.4.167.

Is there a way to address this in 4.4.y so that it 'does not break userspace' ?

Thanks.

-- kjh
Comment 3 Marco 2019-02-07 13:30:00 UTC
We're also stuck on 4.4.167 as many drivers can't be compiled anymore since the API breakage in >=4.4.168. And I'm still wondering why the API of an LTS kernel has been changed?

There's also an entry about this in Gentoo's own bugtracker:

https://bugs.gentoo.org/675310

Regards,

Marco
Comment 4 Wayne Sallee 2019-02-15 14:55:03 UTC
Wow so much for LTS. I could see this on a new kernel, but for an update on an lts kernel???

Any plans to fix this bug?

Wayne Sallee
Wayne@WayneSallee.com
http://www.WayneSallee.com
Comment 5 Wayne Sallee 2019-02-22 15:03:21 UTC
Is this bug also 3.16 kernels?

And if not are there plans to infect that kernel too?

Wayne Sallee
Wayne@WayneSallee.com
http://www.WayneSallee.com
Comment 6 Ilya 2019-02-25 22:56:43 UTC
Konrad, it is not userspace actually, it is kernel module API, is not it?

However, it broke Vbox drivers for many distros that use bleeding-edge kernels (like Slackware) :(
Comment 7 Wayne Sallee 2019-02-25 23:18:30 UTC
LTS kernel is not bleeding-edge.

Wayne Sallee
Wayne@WayneSallee.com
Comment 8 Konrad J Hambrick 2019-02-26 15:54:19 UTC
Ilya --

Yes -and- no, I suppose ...

While get_user_pages( ) is not a 'user-facing' interface, as others have mentioned it does seem very disruptive to modify kernel APIs in an LTS Kernel.

In the meantime, I am stuck with 4.4.167 until I can upgrade VMWare Workstation and verify that their Kernel Modules Compile against 4.4.168+

-- kjh