Thanks for the details, I'll check the options and see how I can continue with this.
Best, Nicolas On 5/2/19 6:11 PM, Kevin Keane wrote: > It looks to me like this is expected behavior. > > Nvidia drivers will only work with the kernel version that was running > when it was compiled (not any other kernels that were installed at the > same time), AND whose kernel headers were available. You have to > recompile the driver after every kernel update *after* rebooting into > the new kernel. It has been a while since I worked with NVidia drivers, > but I recall that there actually was a command-line option that let you > compile it for a different kernel, but I'm not sure how well that works; > you would also need the correct kernel headers, kernel-devel RPM, and > probably more. I never bothered trying to make that work. > > In your case, the proper procedure should be (caveat: this is theory; I > did not actually test this): > > - Create a driver-build machine that has the correct *old* kernel > installed. I would recommend doing this away from xCAT. A virtual > machine is fine for the purpose. > - Recompile the driver. > - Build an RPM from the driver's binaries, not the source code. This RPM > should include a dependency on the correct kernel version. > - On the build machine, upgrade the kernel and all other RPMs. > - Build another RPM from these binaries. Again, make sure this RPM > depends on the correct kernel version. > - Create a repository (or use an existing one) and put both RPMs in (and > any future ones you create this way). > - Make this repository available to xCAT > > This will allow you to initially install the old nvidia driver into your > osimage (because of the dependency, it will pick the one for the old > kernel), and then when you update the kernel, the nvidia driver will be > updated along with it from your repository. > > If you want to avoid a few steps, at the expense of more manual work > later, you can also install this RPM into your osimage *after* you > update the kernel RPM to the correct version. > > You have to rebuild the RPM with every new kernel version. Since you are > using an older version of CentOS, that shouldn't be too frequent. > > There is another option, but that is may be less desirable in an xCAT > system: you can install DKMS to automatically recompile the driver every > time a kernel is updated. That means that you will have to have a lot of > extra stuff (kernel headers, gcc, various devel RPMs) on each node. > > _______________________________________________________________________ > Kevin Keane | Systems Architect | University of San Diego ITS | > [email protected] <mailto:[email protected]> > Maher Hall, 192 |5998 Alcalá Park | San Diego, CA 92110-2492 | > 619.260.6859 | Text: 760-721-8339 > > *REMEMBER! **_No one from IT at USD will ever ask to confirm or supply > your password_*. > These messages are an attempt to steal your username and password. > Please do not reply to, click the links within, or open the attachments > of these messages. Delete them! > > > > > On Thu, May 2, 2019 at 6:31 AM Roosen, Nicolas <[email protected] > <mailto:[email protected]>> wrote: > > Hello, I have some trouble installing the Nvidia drivers into a compute > node, using a custom script. > > Using xcat 2.13.5 on Centos 7.3 > > We repackaged the Nvidia driver in a RPM, which installs fine when the > node is up. > > But when we install it during a node re-image, it fails, because there > are two different kernel version. > > Bellow are more details, does anyone has some experience with the Nvidia > driver ? > > +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > This RPM is installed during the deployment process, which uses the > default Centos 7.3 kernel (3.10.0-514.el7). The kernel is also updated > during the installation process (but *before* the Nvidia driver > installation). > > Once the node deployment is finished, it reboots into the latest kernel > (3.10.0-514.26.2.el7), and the Nvidia driver fails to load. If I reboot > into the older kernel, it works. > > So I'd like to know if there is an options to install the Nvidia driver > for another kernel than the running one? > > I have this error, if that helps: > > Making nvidia.ko silently in > /opt/sgi/Factory-Install/nvidia/NVIDIA-Linux-x86_64-418.40.04/kernel > Module nvidia.ko from kernel 3.10.0-514.el7.x86_64 is not compatible > with kernel 3.10.0-514.26.2.el7.x86_64 in symbols: > acpi_bus_register_driver acpi_bus_get_device acpi_bus_unregister_driver > nvidia.ko: > /lib/modules/3.10.0-514.el7.x86_64/video/nvidia.ko > > Curiously enough, if I re-install the same RPM by hand while running the > latest kernel, it works ... So I'm a bit lost here ... > +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > > Thanks. > -- > Nicolas > > _______________________________________________ > xCAT-user mailing list > [email protected] <mailto:[email protected]> > https://lists.sourceforge.net/lists/listinfo/xcat-user > > <https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.sourceforge.net_lists_listinfo_xcat-2Duser&d=DwMFaQ&c=C5b8zRQO1miGmBeVZ2LFWg&r=dSidRWupNvfUOmSygNtlkLuZYgrjT0PrvjeblxbFens&m=g7OKvvgBVVcJzYrZv9zn9WNMf8Bdpq2Kek3cmhzgitU&s=o7DgEMaS8xXWV8V2PvwRa9UZTFnPat_yejNKprmjb3c&e=> > > > > _______________________________________________ > xCAT-user mailing list > [email protected] > https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.sourceforge.net_lists_listinfo_xcat-2Duser&d=DwICAg&c=C5b8zRQO1miGmBeVZ2LFWg&r=dSidRWupNvfUOmSygNtlkLuZYgrjT0PrvjeblxbFens&m=g7OKvvgBVVcJzYrZv9zn9WNMf8Bdpq2Kek3cmhzgitU&s=o7DgEMaS8xXWV8V2PvwRa9UZTFnPat_yejNKprmjb3c&e= > -- Nicolas Roosen Technical Consultant +33 970010023 Office +33 777161256 Mobile Les Ulis hpe.com HPE logo <http://www.hpe.com> _______________________________________________ xCAT-user mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/xcat-user
