date:20180316

Re: Looking for I/O transform to untar a tar.gz

2018-03-16 Thread Sajeevan Achuthan

Thanks Cham On 16 March 2018 at 23:28, Chamikara Jayalath wrote: > Actually, I could assign it to you. > > On Fri, Mar 16, 2018 at 4:27 PM Chamikara Jayalath > wrote: > >> Of course. Feel free to add a comment to JIRA and send out a pull request >>

Re: Looking for I/O transform to untar a tar.gz

2018-03-16 Thread Chamikara Jayalath

Actually, I could assign it to you. On Fri, Mar 16, 2018 at 4:27 PM Chamikara Jayalath wrote: > Of course. Feel free to add a comment to JIRA and send out a pull request > for this. > Can one of the JIRA admins assign this to Sajeevan ? > > Thanks, > Cham > > On Fri, Mar

Re: Looking for I/O transform to untar a tar.gz

2018-03-16 Thread Chamikara Jayalath

Of course. Feel free to add a comment to JIRA and send out a pull request for this. Can one of the JIRA admins assign this to Sajeevan ? Thanks, Cham On Fri, Mar 16, 2018 at 4:22 PM Sajeevan Achuthan < achuthan.sajee...@gmail.com> wrote: > Hi Guys, > > Can I take a look at this issue? If you

Re: Design specs for portable Combine

2018-03-16 Thread Daniel Oliveira

So since I made some updates to the doc I feel like this is a good time to add a summary (I didn't know I needed to do that when I originally sent it out). Structure and Lifting of Combines (In Apache Beam Portability) This doc covers how Combines will be modeled in the Runner API and Fn API, as

Re: Looking for I/O transform to untar a tar.gz

2018-03-16 Thread Sajeevan Achuthan

Hi Guys, Can I take a look at this issue? If you agree, my Jira id is eachsaj thanks Saj On 16 March 2018 at 22:13, Chamikara Jayalath wrote: > Created https://issues.apache.org/jira/browse/BEAM-3867. > > Thanks, > Cham > > On Fri, Mar 16, 2018 at 3:00 PM Eugene

Re: (java) stream & beam?

2018-03-16 Thread Jean-Baptiste Onofré

Big +1 Regards JB Le 16 mars 2018 à 15:59, à 15:59, Reuven Lax a écrit: >BTW while it's true that raw GBK can't be fluent (due to constraint on >element type). once we have schema support we can introduce >groupByField, >and that can be fluent. > > >On Wed, Mar 14, 2018 at

Re: (java) stream & beam?

2018-03-16 Thread Reuven Lax

BTW while it's true that raw GBK can't be fluent (due to constraint on element type). once we have schema support we can introduce groupByField, and that can be fluent. On Wed, Mar 14, 2018 at 11:50 PM Robert Bradshaw wrote: > On Wed, Mar 14, 2018 at 11:04 PM Romain

Re: Looking for I/O transform to untar a tar.gz

2018-03-16 Thread Chamikara Jayalath

Created https://issues.apache.org/jira/browse/BEAM-3867. Thanks, Cham On Fri, Mar 16, 2018 at 3:00 PM Eugene Kirpichov wrote: > Reading can not be parallelized, but processing can be - so there is value > in having our file-based sources automatically decompress .tar and

Re: Looking for I/O transform to untar a tar.gz

2018-03-16 Thread Eugene Kirpichov

Reading can not be parallelized, but processing can be - so there is value in having our file-based sources automatically decompress .tar and .tar.gz. (also, I suspect that many people use Beam even for cases with a modest amount of data, that don't have or need parallelism, just for the sake of

Re: Looking for I/O transform to untar a tar.gz

2018-03-16 Thread Jean-Baptiste Onofré

Gzip is supported by TextIO. However you are right, tar is not yet supported. It's similar in the way of dealing with entries. Could you please create a Jira about that ? Thanks Regards JB Le 16 mars 2018 à 14:50, à 14:50, Chamikara Jayalath a écrit: >FWIW, if you have

Re: Looking for I/O transform to untar a tar.gz

2018-03-16 Thread Chamikara Jayalath

FWIW, if you have a concat gzip file [1] TextIO and other file-based sources should be able to read that. But we don't support tar files. Is it possible to perform tar extraction before running the pipeline ? This step probably cannot be parallelized. So not much value in performing within the

Re: Routines intermittently not being executed on Apache Beam code

2018-03-16 Thread Lukasz Cwik

I asked the same question on the stack overflow question. Also, adding u...@beam.apache.org On Fri, Mar 16, 2018 at 2:03 PM Reuven Lax wrote: > Can you explain what you mean? Are you saying that you call > waitUntilFinish(), then execute some other code, and you think some

Re: Routines intermittently not being executed on Apache Beam code

2018-03-16 Thread Reuven Lax

Can you explain what you mean? Are you saying that you call waitUntilFinish(), then execute some other code, and you think some of that other code is not being executed? On Fri, Mar 16, 2018 at 1:46 PM Lucas Arruda wrote: > I have an Apache Beam pipeline written on Java.

Routines intermittently not being executed on Apache Beam code

2018-03-16 Thread Lucas Arruda

I have an Apache Beam pipeline written on Java. I'm with a problem that some routines are not being executed on all instances of that pipeline. Those routines are as simple as logging messages or excluding a file in GCS. They are all put to run after the following code: More at

Re: Looking for I/O transform to untar a tar.gz

2018-03-16 Thread Sajeevan Achuthan

Eugene - Yes, you are correct. I tried with a text file & Beam wordcount example. The TextIO reader reads some illegal characters as seen below. here’s: 1 addiction: 1 new: 1 we: 1 mood: 1 an: 1 incredible: 1 swings,: 1 known: 1 choices.: 1

Re: Using the Go Beam SDK

2018-03-16 Thread Henning Rohde

Hi Philip, Thanks for expressing interest in the Go SDK! The documentation is indeed still incomplete (BEAM-3826) and the main design document is probably be the best starting point right now: https://s.apache.org/beam-go-sdk-design-rfc It also contains links to some of the better

Re: Looking for I/O transform to untar a tar.gz

2018-03-16 Thread Eugene Kirpichov

The code behaves as I expected, and the output is corrupt. Beam unzipped the .gz, but then interpreted the .tar as a text file, and split the .tar file by \n. E.g. the first file of the output starts with lines:

Re: Looking for I/O transform to untar a tar.gz

2018-03-16 Thread Sajeevan Achuthan

Eugene, I ran the code and it works fine. I am very confident in this case. I appreciate you guys for the great work. The code supposed to show that Beam TextIO can read the double compressed files and write output without any processing. so ignored the processing steps. I agree with you the

Re: Looking for I/O transform to untar a tar.gz

2018-03-16 Thread Eugene Kirpichov

Sajeevan - I'm quite confident that TextIO can handle .gz, but can not handle properly .tar. Did you run this code? Did your test .tar.gz file contain multiple files? Did you obtain the expected output, identical to the input except for order of lines? (also, the ParDo in this code doesn't do

Re: Looking for I/O transform to untar a tar.gz

2018-03-16 Thread Sajeevan Achuthan

Hi Guys, The TextIo can handle the tar.gz type double compressed files. See the code test code. PipelineOptions optios = PipelineOptionsFactory.fromArgs(args).withValidation().create(); Pipeline p = Pipeline.create(optios); * p.apply("ReadLines", TextIO.read().from("/dataset.tar.gz"))*

Re: Looking for I/O transform to untar a tar.gz

Re: Looking for I/O transform to untar a tar.gz

Re: Looking for I/O transform to untar a tar.gz

Re: Design specs for portable Combine

Re: Looking for I/O transform to untar a tar.gz

Re: (java) stream & beam?

Re: (java) stream & beam?

Re: Looking for I/O transform to untar a tar.gz

Re: Looking for I/O transform to untar a tar.gz

Re: Looking for I/O transform to untar a tar.gz

Re: Looking for I/O transform to untar a tar.gz

Re: Routines intermittently not being executed on Apache Beam code

Re: Routines intermittently not being executed on Apache Beam code

Routines intermittently not being executed on Apache Beam code

Re: Looking for I/O transform to untar a tar.gz

Re: Using the Go Beam SDK

Re: Looking for I/O transform to untar a tar.gz

Re: Looking for I/O transform to untar a tar.gz

Re: Looking for I/O transform to untar a tar.gz

Re: Looking for I/O transform to untar a tar.gz

20 matches

Site Navigation

Mail list logo

Footer information