Some details on our setup in light of the approach Jarrod outlined … Am Wed, 29 Mar 2023 17:37:30 +0000 schrieb Jarrod Johnson <jjohns...@lenovo.com>:
> On confluent diskless, there is an interesting benefit that becomes a > challenge for bittorrent: a typical diskless node never downloads the > whole diskless image. This means less ram sucked up by the diskless > image, and also that the diskless image can be large without pruning. I guess this is mitigated by our OS image being rather minimal to begin with. It only has the basic system software and drivers, up to a working C/C++ compiler setup that is able to bootstrap further software. Such further software is provided in a versioned tree via NFS and managed via environment modules. So such an approach to optimize the usage of a large OS image by only keeping necessary parts in memory would not benefit us much. The squashfs is below 1G, which, for our compute nodes with 64G of RAM is no big deal. A full image of 10G would be annoying. Rather, the split for us is this below 1G system image and the software tree on NFS with 421G, grown over about 8 years system lifetime. Add to that an uncounted number of anaconda, spack, whatever trees that users installed into their storage shares. Getting whole images quickly out to the cluster nodes is very much valid for this scenario also for the next system we will set up. Of course one could imagine full-on NFS root, but there are reasons why that is out of fashion, and with a minimal main system image, it can be considered as a mode of aggressive client caching. It might not matter much with 10G network or IB on the image server, but any avoidable bottleneck sucks, even if it does not hurt right now in practice. > trick were done to only torrent the parts as needed locally That does sound like a complexity nightmare … but it might still provide benefit, assuming that nodes need the same parts, mostly. You'd need to do a lot of work to integrate those layers. Not worth it, I guess. > the diskless images are now encrypted […] by node TPM Hm. Use of TPMs on cluster nodes. Didn't think about that much, yet. Another point: I'd love vendors to finally implement safeguards to ensure that root on a server cannot manipulate any firmware (and be it network card, hard disk) from userspace, and especially cannot access the BMC, which should only answer to external IPMI requests. Can Secure Boot really ensure nothing has been messed with through a root exploit? I'd love a simple switch that only allows certain platform changes in the pre-boot environment (BIOS, UEFI … and IPMI from the outside) and have things locked down once the Kernel boots. I still don't see how you really can trust a machine once someone had root on it, if you're really paranoid. The whole machinery of crypto checking (Secure Boot) is a rather elaborate mess which could be avoided if there was a clear hardware barrier that only allows certain modifications (also to PCIe and SATA devices, at least onboard devices) outside the booted Linux context. If there's no way to modify things for rogue users/hackers, then you know the system is clean on a fresh boot from the network, and maybe after replacing any SATA or USB devices that just cannot be protected that way. Is any vendor for compute nodes offering this kind of manipulation protection? I'd love that kind of security to start with. Not having to theoretically trash the hardware once someone possibly got a root exploit. Then talk about encrypting images and securing userspace … > if [ "untethered" = "$(getarg confluent_imagemethod)" ]; then > mount -t tmpfs untethered /mnt/remoteimg > curl > https://$confluent_whost/confluent-public/os/$confluent_profile/rootimg.sfs > -o /mnt/remoteimg/rootimg.sfs > else > confluent_urls="$confluent_urls > https://$confluent_whost/confluent-public/os/$confluent_profile/rootimg.sfs" > /opt/confluent/bin/urlmount $confluent_urls /mnt/remoteimg > fi Looks easy enough. > Is the logic for getting the image. One thing to note is that a > typical diskless image boot in confluent, the booted system does not > see rootimg.sfs, so the torrent execution would have to stay in the > 'initramfs' world (which does persist after boot, as a separate mount > namespace) I think such is why I hooked the rootimg.sfs up to /dev/loop0 back in the day, and hacked ctorret to allow the block device as data source. The loop device stays accessible. Anyone from xCAT with thoughts on this? Should I work on a patch for current xCAT (not sure where I'd find time to test that, though)? I don't know which kind of cluster management our next system will have. It could be that my path of least resistance is a quick hack on that one like I did with xCAT back in 2015 … Alrighty then, Thomas -- Dr. Thomas Orgis HPC @ Universität Hamburg _______________________________________________ xCAT-user mailing list xCAT-user@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/xcat-user