Well, I think you should be able to store intermediate results to force
multiple level job being generated.
Чт, 19 лист. 2015 22:27 Binal Jhaveri пише:
> Hi Daniel/Arvind,
>
> Thanks for the suggestions !
>
> I understand the check is there for a long time. Is there a way to get
> around this ?
Well, usually for regexp, 0 match is the whole match and groups start from
1. Are you sure you are getting group (the thing in brackets) with 0?
8 жовт. 2013 13:04, користувач "Steve Bernstein"
написав:
> Apologies if this is captured elsewhere. In the Pig 0.11.1 documentation
> for the builtin
Have you tried it with native? AFAIR the limitation was raised to 5TB few
years ago.
8 квіт. 2013 18:30, "Panshul Whisper" напис.
> Thank you for the advice David.
>
> I tried this ant it works with the native system. But my problem is not
> solved yet, because I have to work with files much bigg
I'd use rank function to join previous and next row, then filter out middle
rows, then join first to last and calculate time.
15 бер. 2013 19:04, "pranjal rajput" напис.
> Hi,
> I am new to Pig.
> I have a dataset from a time-tracker application.
> It records the the time that users spend on vari
Well, simply use s3:// uris. It works.
2 бер. 2013 21:51, "Mohit Anchlia" напис.
> Does anyone know of any storefunc/loadfunc for AWS S3 that is available?
>
You can try reading as single column, then splitting with reg exp and then
flattening.
2 бер. 2013 00:29, "Mix Nin" напис.
> -- Forwarded message --
> From: Mix Nin
> Date: Fri, Mar 1, 2013 at 2:26 PM
> Subject:
> To: user@pig.apache.org
>
>
> *Hi *
> *
> *
> *I have a file that
&& t.size() > 0 && t.get(0) != null )
> > > > cnt++;
> > > >
> > > > but it just seems bizarre and inconvenient to me. Nor is it mentioned
> > in
> > > > the documentation, unless the bit written for people who are good at
> > SQL
> > > > implies it. Now I'm wondering which of my past scripts might be buggy
> > > > because I didn't expect this behavior.
> > > >
> > > > Anyone have an explanation?
> > > >
> > > > Thanks,
> > > >
> > > > Adair
> > > >
> > >
> >
> >
> >
> > --
> > *Note that I'm no longer using my Yahoo! email address. Please email me
> at
> > billgra...@gmail.com going forward.*
> >
>
--
Best regards,
Vitalii Tymchyshyn
BTW: http://pig.apache.org/docs/r0.10.0/basic.html has next example:
C = FOREACH A GENERATE name, age, MyUDF(*);
Looks like right what you need.
22 січ. 2013 09:35, "Young Ng" напис.
> try something like:
> raw_data = load ... as record;
>
> .. com.udf.SomeUDF(record)
>
> or you c
Try explicit totuple function
22 січ. 2013 08:51, "Stanley Xu" напис.
> Dear all,
>
> We are using thrift and elephant-bird to store our logs. And I wanted to
> use some UDF to do complex processing on a single record, so I write some
> pig like the following:
>
>
; ===
>
> Forgive me if i am getting into wrong direction, feel free to correct me
> and suggest your ways.
>
> Thanks in advance!
>
>
> Regards,
> Dipesh
> --
> Dipesh Kr. Singh
>
--
Best regards,
Vitalii Tymchyshyn
rney http://datasyndrome.com
>
> On Dec 24, 2012, at 12:10 AM, Vitalii Tymchyshyn wrote:
>
> > I was doing such a thing in my previous project, but I did parse on
> demand.
> > What I mean is that I've created set of xml-processing functions, each
> can
> > take a s
I was doing such a thing in my previous project, but I did parse on demand.
What I mean is that I've created set of xml-processing functions, each can
take a string or Dom on input plus explicit parse function.
I did this because I was usually using concatenation/grouping on parsed
input files and
Well, you could try multi JVM parallelization. AFAIR maven has the option.
mpty alias with column id,
> c1, c2;*
> GROUP = COGROUP A BY id, B BY id, C BY id;
> GROUP2 = FOREACH GROUP GENERATE TOTUPLE(A, B, C) AS ALL;
> OPERATION = FOREACH GROUP2 GENERATE MyUDF(ALL);
>
--
Best regards,
Vitalii Tymchyshyn
14 matches
Mail list logo