Hello!
I dug a bit into this (not a FileIO expert), and it looks like
LocalFileSystem only matches globs in file names (not directories):
https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/io/LocalFileSystem.java#L251
Perhaps related:
I am still having the problem that local file system (DirectRunner) will
not allow a local GLOB string to be passed as a file source. I have tried
both relative path and fully qualified paths.
I can confirm the same inputFile source GLOB returns data on a simple cat
command. So I know the GLOB is
Clarification on previous message. Only happens on local file system where
it is unable to match a pattern string. Via a `gs://` link it is
able to do multiple file matching.
On Fri, Jul 12, 2019 at 1:36 PM Shannon Duncan
wrote:
> Awesome. I got it working for a single file, but for a structure
Awesome. I got it working for a single file, but for a structure of:
/part-0001/index
/part-0001/data
/part-0002/index
/part-0002/data
I tried to do /part-* and /part-*/data
It does not find the multipart files. However if I just do /part-0001/data
it will find it and read it.
Any ideas why?
If I wanted to go ahead and include this within a new Java Pipeline, what
would I be looking at for level of work to integrate?
On Wed, Jul 3, 2019 at 3:54 AM Ismaël Mejía wrote:
> That's great. I can help whenever you need. We just need to choose its
> destination. Both the `hadoop-format` and
That's great. I can help whenever you need. We just need to choose its
destination. Both the `hadoop-format` and `hadoop-file-system` modules
are good candidates, I would even feel inclined to put it in its own
module `sdks/java/extensions/sequencefile` to make it more easy to
discover by the
It would be great I'd it was available for both Java and Python.
On Tue, Jul 2, 2019, 3:57 AM Ismaël Mejía wrote:
> (Adding dev@ and Solomon Duskis to the discussion)
>
> I was not aware of these thanks for sharing David. Definitely it would
> be a great addition if we could have those donated
(Adding dev@ and Solomon Duskis to the discussion)
I was not aware of these thanks for sharing David. Definitely it would
be a great addition if we could have those donated as an extension in
the Beam side. We can even evolve them in the future to be more FileIO
like. Any chance this can happen?