Re: [lustre-discuss] Lustre 2.15.1 server with ZFS nothing provides ksym

2023-07-11 Thread Feng Zhang via lustre-discuss
You can check here: https://github.com/prod-feng/Luste-KMOD-2.12.9-with-ZFS-0.7.13-on-Centos-7.9 Which has detailed solutions I found. Best, Feng On Tue, Jul 11, 2023 at 11:38 AM Jon Marshall via lustre-discuss < lustre-discuss@lists.lustre.org> wrote: > Hi, > > I'm having a bit of a nightmar

Re: [lustre-discuss] OSS on compute node

2023-10-13 Thread Feng Zhang via lustre-discuss
In theory it should work, but may have some performance issues though. This idea is also similar to BeeGFS BeeOND (BeeGFS on-demand). Should run computer(client) nodes in VM, or container, to avoid any users' app to crash the whole node intentionally or not. Best, Feng On Fri, Oct 13, 2023 at

Re: [lustre-discuss] OSS on compute node

2023-10-13 Thread Feng Zhang via lustre-discuss
Yes. I do not have a chance to use BeeGFS, while I know some institutes use it.https://io500.org/ Best, Feng On Fri, Oct 13, 2023 at 3:49 PM Fedele Stabile wrote: > > I have to study in detail beegfs, > Is it usable on a little HPC Cluster? > > > From: Feng Zha

Re: [lustre-discuss] OSS on compute node

2023-10-13 Thread Feng Zhang via lustre-discuss
As I can remember, the BeeGFS is free, but some features are NOT, like the BeeOND. I did some detailed research years ago for my project. Best, Feng On Fri, Oct 13, 2023 at 3:54 PM Feng Zhang wrote: > > Yes. I do not have a chance to use BeeGFS, while I know some > institutes use it.https://io5

Re: [lustre-discuss] lustre-client-dkms-2.15.4 is still checking for python2

2024-01-19 Thread Feng Zhang via lustre-discuss
For the lustre-client-dkms, maybe you can --skip-broken to force it to install, and it will automatically compile the source code of it, during which you can check the error message. Use dkms command or go directly to the source folder there to check. If there's no error message there, then you s

Re: [lustre-discuss] lustre and pytorch

2024-07-24 Thread Feng Zhang via lustre-discuss
I have a vague memory from many years ago, that I had an issue that was a file lock on lustre, that caused python app to be stuck(or failed). Best, Feng On Wed, Jul 24, 2024 at 2:40 PM Michael DiDomenico via lustre-discuss wrote: > > what it's worth here's the chunk of debug that seemed suspect

Re: [lustre-discuss] lustre and pytorch

2024-07-24 Thread Feng Zhang via lustre-discuss
Not sure if it is related, but I digged out this from my note: "Looks like the Lustre storage does not support the HDF5 lock. The local hard drive($TMPDIR), or NFS works fine. If you need to read HDF5 files on /lustre, just add: export HDF5_USE_FILE_LOCKING=FALSE " (older versions of netcdf4 and