Re: [Python] Read Hadoop Sequence File?

2019-07-17 Thread Ryan Skraba
Hello! I dug a bit into this (not a FileIO expert), and it looks like LocalFileSystem only matches globs in file names (not directories): https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/io/LocalFileSystem.java#L251 Perhaps related:

Re: [Python] Read Hadoop Sequence File?

2019-07-16 Thread Shannon Duncan
I am still having the problem that local file system (DirectRunner) will not allow a local GLOB string to be passed as a file source. I have tried both relative path and fully qualified paths. I can confirm the same inputFile source GLOB returns data on a simple cat command. So I know the GLOB is

Re: [Python] Read Hadoop Sequence File?

2019-07-12 Thread Shannon Duncan
Clarification on previous message. Only happens on local file system where it is unable to match a pattern string. Via a `gs://` link it is able to do multiple file matching. On Fri, Jul 12, 2019 at 1:36 PM Shannon Duncan wrote: > Awesome. I got it working for a single file, but for a structure

Re: [Python] Read Hadoop Sequence File?

2019-07-12 Thread Shannon Duncan
Awesome. I got it working for a single file, but for a structure of: /part-0001/index /part-0001/data /part-0002/index /part-0002/data I tried to do /part-* and /part-*/data It does not find the multipart files. However if I just do /part-0001/data it will find it and read it. Any ideas why?

Re: [Python] Read Hadoop Sequence File?

2019-07-10 Thread Shannon Duncan
If I wanted to go ahead and include this within a new Java Pipeline, what would I be looking at for level of work to integrate? On Wed, Jul 3, 2019 at 3:54 AM Ismaël Mejía wrote: > That's great. I can help whenever you need. We just need to choose its > destination. Both the `hadoop-format` and

Re: [Python] Read Hadoop Sequence File?

2019-07-03 Thread Ismaël Mejía
That's great. I can help whenever you need. We just need to choose its destination. Both the `hadoop-format` and `hadoop-file-system` modules are good candidates, I would even feel inclined to put it in its own module `sdks/java/extensions/sequencefile` to make it more easy to discover by the

Re: [Python] Read Hadoop Sequence File?

2019-07-02 Thread Shannon Duncan
It would be great I'd it was available for both Java and Python. On Tue, Jul 2, 2019, 3:57 AM Ismaël Mejía wrote: > (Adding dev@ and Solomon Duskis to the discussion) > > I was not aware of these thanks for sharing David. Definitely it would > be a great addition if we could have those donated

Re: [Python] Read Hadoop Sequence File?

2019-07-02 Thread Ismaël Mejía
(Adding dev@ and Solomon Duskis to the discussion) I was not aware of these thanks for sharing David. Definitely it would be a great addition if we could have those donated as an extension in the Beam side. We can even evolve them in the future to be more FileIO like. Any chance this can happen?