If it's your input/output data, presumably you could implement a https://beam.apache.org/releases/javadoc/2.3.0/org/apache/beam/sdk/io/FileSystem.html for nfs. (I don't know what all that would entail...)
On Mon, Jan 30, 2023 at 9:04 AM Chad Dombrova <[email protected]> wrote: > > Hi Israel, > Thanks for responding. > >> And could not the dataset be accessed from Cloud Storage? Does it need to be >> specifically NFS? > > > No unfortunately it can't be accessed from Cloud Storage. Our data resides > on high performance Isilon [1] servers using a posix filesystem, and NFS is > the tried and true protocol for this. This configuration cannot be changed > for a multitude of reasons, not least of which is that fact that these > servers outperform cloud storage at a fraction of the cost of cloud offerings > (which is a very big difference for multiple petabytes of storage. If you'd > like more details on why this is not possible I'm happy to explain, but for > now let's just say that it's been investigated and it's not practical). The > use of fast posix filers over NFS is fairly ubiquitous in the media and > entertainment industry (if you want to know more about how we use Beam, I > gave a talk at the Beam Summit a few years ago[2]). > > thanks! > -chad > > [1] https://www.dell.com/en-hk/dt/solutions/media-entertainment.htm > [2] https://www.youtube.com/watch?v=gvbQI3I03a8&t=644s&ab_channel=ApacheBeam >
