Re: Merge more than 127 map-reduce jobs not supported

2015-11-20 Thread Vitalii Tymchyshyn
Well, I think you should be able to store intermediate results to force multiple level job being generated. Чт, 19 лист. 2015 22:27 Binal Jhaveri пише: > Hi Daniel/Arvind, > > Thanks for the suggestions ! > > I understand the check is there for a long time. Is there a way to get > around this ?

Re: Documentation bug in REGEX_EXTRACT

2013-10-09 Thread Vitalii Tymchyshyn
Well, usually for regexp, 0 match is the whole match and groups start from 1. Are you sure you are getting group (the thing in brackets) with 0? 8 жовт. 2013 13:04, користувач "Steve Bernstein" написав: > Apologies if this is captured elsewhere. In the Pig 0.11.1 documentation > for the builtin

Re: pig script - failed reading input from s3

2013-04-09 Thread Vitalii Tymchyshyn
Have you tried it with native? AFAIR the limitation was raised to 5TB few years ago. 8 квіт. 2013 18:30, "Panshul Whisper" напис. > Thank you for the advice David. > > I tried this ant it works with the native system. But my problem is not > solved yet, because I have to work with files much bigg

Re: Aggregation for chronologically ordered dataset

2013-03-17 Thread Vitalii Tymchyshyn
I'd use rank function to join previous and next row, then filter out middle rows, then join first to last and calculate time. 15 бер. 2013 19:04, "pranjal rajput" напис. > Hi, > I am new to Pig. > I have a dataset from a time-tracker application. > It records the the time that users spend on vari

Re: S3 store and load

2013-03-02 Thread Vitalii Tymchyshyn
Well, simply use s3:// uris. It works. 2 бер. 2013 21:51, "Mohit Anchlia" напис. > Does anyone know of any storefunc/loadfunc for AWS S3 that is available? >

Re: Parse data with multiple delimiters and transpose rows and columns

2013-03-02 Thread Vitalii Tymchyshyn
You can try reading as single column, then splitting with reg exp and then flattening. 2 бер. 2013 00:29, "Mix Nin" напис. > -- Forwarded message -- > From: Mix Nin > Date: Fri, Mar 1, 2013 at 2:26 PM > Subject: > To: user@pig.apache.org > > > *Hi * > * > * > *I have a file that

Re: COUNT() thinks non-null tuples are null if the first field is null?

2013-02-07 Thread Vitalii Tymchyshyn
&& t.size() > 0 && t.get(0) != null ) > > > > cnt++; > > > > > > > > but it just seems bizarre and inconvenient to me. Nor is it mentioned > > in > > > > the documentation, unless the bit written for people who are good at > > SQL > > > > implies it. Now I'm wondering which of my past scripts might be buggy > > > > because I didn't expect this behavior. > > > > > > > > Anyone have an explanation? > > > > > > > > Thanks, > > > > > > > > Adair > > > > > > > > > > > > > > > -- > > *Note that I'm no longer using my Yahoo! email address. Please email me > at > > billgra...@gmail.com going forward.* > > > -- Best regards, Vitalii Tymchyshyn

Re: Using UDF to process whole record

2013-01-22 Thread Vitalii Tymchyshyn
BTW: http://pig.apache.org/docs/r0.10.0/basic.html has next example: C = FOREACH A GENERATE name, age, MyUDF(*); Looks like right what you need. 22 січ. 2013 09:35, "Young Ng" напис. > try something like: > raw_data = load ... as record; > > .. com.udf.SomeUDF(record) > > or you c

Re: Using UDF to process whole record

2013-01-22 Thread Vitalii Tymchyshyn
Try explicit totuple function 22 січ. 2013 08:51, "Stanley Xu" напис. > Dear all, > > We are using thrift and elephant-bird to store our logs. And I wanted to > use some UDF to do complex processing on a single record, so I write some > pig like the following: > >

Re: Parallelism for small input data

2013-01-14 Thread Vitalii Tymchyshyn
; === > > Forgive me if i am getting into wrong direction, feel free to correct me > and suggest your ways. > > Thanks in advance! > > > Regards, > Dipesh > -- > Dipesh Kr. Singh > -- Best regards, Vitalii Tymchyshyn

Re: XML -> Pig UDF

2012-12-29 Thread Vitalii Tymchyshyn
rney http://datasyndrome.com > > On Dec 24, 2012, at 12:10 AM, Vitalii Tymchyshyn wrote: > > > I was doing such a thing in my previous project, but I did parse on > demand. > > What I mean is that I've created set of xml-processing functions, each > can > > take a s

Re: XML -> Pig UDF

2012-12-24 Thread Vitalii Tymchyshyn
I was doing such a thing in my previous project, but I did parse on demand. What I mean is that I've created set of xml-processing functions, each can take a string or Dom on input plus explicit parse function. I did this because I was usually using concatenation/grouping on parsed input files and

Re: Re: Re: Pig UT last nearly 8 hours and TestEvalPipeline2 lasts for 37 minutes

2012-11-21 Thread Vitalii Tymchyshyn
Well, you could try multi JVM parallelization. AFAIR maven has the option.

Re: How to create an empty alias

2012-11-06 Thread Vitalii Tymchyshyn
mpty alias with column id, > c1, c2;* > GROUP = COGROUP A BY id, B BY id, C BY id; > GROUP2 = FOREACH GROUP GENERATE TOTUPLE(A, B, C) AS ALL; > OPERATION = FOREACH GROUP2 GENERATE MyUDF(ALL); > -- Best regards, Vitalii Tymchyshyn