I would guess that you have some existing code that expects random IO
access to the files via the Java IO or NIO interface (the common blocking
IO in a DoFn pattern), so using a Beam IO which is what we recommend and
are discussing here would be a significant rewrite?

I worked on Isilon from 6.5 - 7.2 and in those days NFS and SMB were the
high performance options, however these are complex protocols and userspace
implementations tend to be missing the features that enable that
performance. If you do go down the path of a userspace IO (either Beam or
Java NIO wrapper) you'd possibly get better results with FTP, SFTP, or
HTTP. It looks like Isilon added the ability to expose the filesystem via
the S3 protocol in 9.0
<http://doc.isilon.com/onefs/9.0.0/help/en-us/ifs_c_s3_support_intro.html>
and there
is a Beam S3 connector
<https://beam.apache.org/releases/javadoc/2.41.0/org/apache/beam/sdk/io/aws/s3/package-summary.html>
but I
would be amazed if you were on Isilon 9.0+.

On Tue, Jan 31, 2023 at 3:56 PM Luke Cwik via user <user@beam.apache.org>
wrote:

> I would also suggest looking at NFS client implementations in Java that
> would allow you to talk to the NFS server without needing to mount it
> within the OS. A quick search yielded
> https://github.com/raisercostin/yanfs or
> https://github.com/EMCECS/nfs-client-java
>
> On Tue, Jan 31, 2023 at 3:31 PM Chad Dombrova <chad...@gmail.com> wrote:
>
>> Thanks for the info.  We are going to test this further and we'll let you
>> know how it goes.
>>
>> -chad
>>
>>
>> On Mon, Jan 30, 2023 at 2:14 PM Valentyn Tymofieiev <valen...@google.com>
>> wrote:
>>
>>> It applies to custom containers as well. You can find the container
>>> manifest in the GCE VM metadata, and it should have an entry for privileged
>>> mode. The reason for this was to enable GPU accelerator support, but agree
>>> with Robert that it is not part of any contracts, so in theory this could
>>> change or perhaps be more strictly limited to accelerator support. In fact,
>>> originally, this was only enabled for pipelines using accelerators but for
>>> purely internal implementation details I believe it is currently enabled
>>> for all pipelines.
>>>
>>> So for prototyping purposes I think you could try it, but I can't make
>>> any guarantees in this thread that privileged mode will continue to work.
>>>
>>> cc: @Aaron Li <aaronle...@google.com> FYI
>>>
>>>
>>> On Mon, Jan 30, 2023 at 12:16 PM Robert Bradshaw <rober...@google.com>
>>> wrote:
>>>
>>>> I'm also not sure it's part of the contract that the containerization
>>>> technology we use will always have these capabilities.
>>>>
>>>> On Mon, Jan 30, 2023 at 10:53 AM Chad Dombrova <chad...@gmail.com>
>>>> wrote:
>>>> >
>>>> > Hi Valentyn,
>>>> >
>>>> >>
>>>> >> Beam SDK docker containers on Dataflow VMs are currently launched in
>>>> privileged mode.
>>>> >
>>>> >
>>>> > Does this only apply to stock sdk containers?  I'm asking because we
>>>> use a custom sdk container that we build.  We've tried various ways of
>>>> running mount from within our custom beam container in Dataflow and we
>>>> could not get it to work, while the same thing succeeds in local tests and
>>>> in our CI (gitlab).  The assessment at the time (this was maybe a year ago)
>>>> was that the container was not running in privileged mode, but if you think
>>>> that's incorrect we can revisit this and report back with some error logs.
>>>> >
>>>> > -chad
>>>> >
>>>>
>>>

Reply via email to