Thanks for the details, I'll check the options and see how I can
continue with this.

Best,
Nicolas

On 5/2/19 6:11 PM, Kevin Keane wrote:
> It looks to me like this is expected behavior.
> 
> Nvidia drivers will only work with the kernel version that was running
> when it was compiled (not any other kernels that were installed at the
> same time), AND whose kernel headers were available. You have to
> recompile the driver after every kernel update *after* rebooting into
> the new kernel. It has been a while since I worked with NVidia drivers,
> but I recall that there actually was a command-line option that let you
> compile it for a different kernel, but I'm not sure how well that works;
> you would also need the correct kernel headers, kernel-devel RPM, and
> probably more. I never bothered trying to make that work.
> 
> In your case, the proper procedure should be (caveat: this is theory; I
> did not actually test this):
> 
> - Create a driver-build machine that has the correct *old* kernel
> installed. I would recommend doing this away from xCAT. A virtual
> machine is fine for the purpose.
> - Recompile the driver.
> - Build an RPM from the driver's binaries, not the source code. This RPM
> should include a dependency on the correct kernel version.
> - On the build machine, upgrade the kernel and all other RPMs.
> - Build another RPM from these binaries. Again, make sure this RPM
> depends on the correct kernel version.
> - Create a repository (or use an existing one) and put both RPMs in (and
> any future ones you create this way).
> - Make this repository available to xCAT
> 
> This will allow you to initially install the old nvidia driver into your
> osimage (because of the dependency, it will pick the one for the old
> kernel), and then when you update the kernel, the nvidia driver will be
> updated along with it from your repository.
> 
> If you want to avoid a few steps, at the expense of more manual work
> later, you can also install this RPM into your osimage *after* you
> update the kernel RPM to the correct version.
> 
> You have to rebuild the RPM with every new kernel version. Since you are
> using an older version of CentOS, that shouldn't be too frequent.
> 
> There is another option, but that is may be less desirable in an xCAT
> system: you can install DKMS to automatically recompile the driver every
> time a kernel is updated. That means that you will have to have a lot of
> extra stuff (kernel headers, gcc, various devel RPMs) on each node.
> 
> _______________________________________________________________________
> Kevin Keane | Systems Architect | University of San Diego ITS |
> [email protected] <mailto:[email protected]>
> Maher Hall, 192 |5998 Alcalá Park | San Diego, CA 92110-2492 |
> 619.260.6859 | Text: 760-721-8339
> 
> *REMEMBER! **_No one from IT at USD will ever ask to confirm or supply
> your password_*.
> These messages are an attempt to steal your username and password.
> Please do not reply to, click the links within, or open the attachments
> of these messages. Delete them!
> 
> 
> 
> 
> On Thu, May 2, 2019 at 6:31 AM Roosen, Nicolas <[email protected]
> <mailto:[email protected]>> wrote:
> 
>     Hello, I have some trouble installing the Nvidia drivers into a compute
>     node, using a custom script.
> 
>     Using xcat 2.13.5 on Centos 7.3
> 
>     We repackaged the Nvidia driver in a RPM, which installs fine when the
>     node is up.
> 
>     But when we install it during a node re-image, it fails, because there
>     are two different kernel version.
> 
>     Bellow are more details, does anyone has some experience with the Nvidia
>     driver ?
> 
>     +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>     This RPM is installed during the deployment process, which uses the
>     default Centos 7.3 kernel (3.10.0-514.el7). The kernel is also updated
>     during the installation process (but *before* the Nvidia driver
>     installation).
> 
>     Once the node deployment is finished, it reboots into the latest kernel
>     (3.10.0-514.26.2.el7), and the Nvidia driver fails to load. If I reboot
>     into the older kernel, it works.
> 
>     So I'd like to know if there is an options to install the Nvidia driver
>     for another kernel than the running one?
> 
>     I have this error, if that helps:
> 
>     Making nvidia.ko silently in
>     /opt/sgi/Factory-Install/nvidia/NVIDIA-Linux-x86_64-418.40.04/kernel
>     Module nvidia.ko from kernel 3.10.0-514.el7.x86_64 is not compatible
>     with kernel 3.10.0-514.26.2.el7.x86_64 in symbols:
>     acpi_bus_register_driver acpi_bus_get_device acpi_bus_unregister_driver
>     nvidia.ko:
>     /lib/modules/3.10.0-514.el7.x86_64/video/nvidia.ko
> 
>     Curiously enough, if I re-install the same RPM by hand while running the
>     latest kernel, it works ... So I'm a bit lost here ...
>     +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> 
>     Thanks.
>     -- 
>     Nicolas
> 
>     _______________________________________________
>     xCAT-user mailing list
>     [email protected] <mailto:[email protected]>
>     https://lists.sourceforge.net/lists/listinfo/xcat-user
>     
> <https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.sourceforge.net_lists_listinfo_xcat-2Duser&d=DwMFaQ&c=C5b8zRQO1miGmBeVZ2LFWg&r=dSidRWupNvfUOmSygNtlkLuZYgrjT0PrvjeblxbFens&m=g7OKvvgBVVcJzYrZv9zn9WNMf8Bdpq2Kek3cmhzgitU&s=o7DgEMaS8xXWV8V2PvwRa9UZTFnPat_yejNKprmjb3c&e=>
> 
> 
> 
> _______________________________________________
> xCAT-user mailing list
> [email protected]
> https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.sourceforge.net_lists_listinfo_xcat-2Duser&d=DwICAg&c=C5b8zRQO1miGmBeVZ2LFWg&r=dSidRWupNvfUOmSygNtlkLuZYgrjT0PrvjeblxbFens&m=g7OKvvgBVVcJzYrZv9zn9WNMf8Bdpq2Kek3cmhzgitU&s=o7DgEMaS8xXWV8V2PvwRa9UZTFnPat_yejNKprmjb3c&e=
> 

-- 
Nicolas Roosen
Technical Consultant

+33 970010023  Office
+33 777161256  Mobile

Les Ulis
hpe.com


HPE logo <http://www.hpe.com>

_______________________________________________
xCAT-user mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/xcat-user

Reply via email to