Hello, I have some trouble installing the Nvidia drivers into a compute node, using a custom script.
Using xcat 2.13.5 on Centos 7.3 We repackaged the Nvidia driver in a RPM, which installs fine when the node is up. But when we install it during a node re-image, it fails, because there are two different kernel version. Bellow are more details, does anyone has some experience with the Nvidia driver ? +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ This RPM is installed during the deployment process, which uses the default Centos 7.3 kernel (3.10.0-514.el7). The kernel is also updated during the installation process (but *before* the Nvidia driver installation). Once the node deployment is finished, it reboots into the latest kernel (3.10.0-514.26.2.el7), and the Nvidia driver fails to load. If I reboot into the older kernel, it works. So I'd like to know if there is an options to install the Nvidia driver for another kernel than the running one? I have this error, if that helps: Making nvidia.ko silently in /opt/sgi/Factory-Install/nvidia/NVIDIA-Linux-x86_64-418.40.04/kernel Module nvidia.ko from kernel 3.10.0-514.el7.x86_64 is not compatible with kernel 3.10.0-514.26.2.el7.x86_64 in symbols: acpi_bus_register_driver acpi_bus_get_device acpi_bus_unregister_driver nvidia.ko: /lib/modules/3.10.0-514.el7.x86_64/video/nvidia.ko Curiously enough, if I re-install the same RPM by hand while running the latest kernel, it works ... So I'm a bit lost here ... +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Thanks. -- Nicolas _______________________________________________ xCAT-user mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/xcat-user
