Re: How to use TOP?

2012-05-22 Thread Mohammad Tariq
Hi Abhinav, Thanks a lot for the valuable response..Actually I was thinking of doing the same thing, but being new to Pig I thought of asking it on the mailing list first..As far as the data is concerned, second column will always be in ascending order.But I don't think it will be of any help..

Re: [pig-0.10.0]: what jar are included in pig-0.10.0.jar and pig-0.10.0-withouthadoop.jar?

2012-05-22 Thread IGZ Nick
1. I guess so, not sure though.. I haven't checked. 2. You can do a jar -tvf to find out which classes are present in a particular jar. You can do a jar -xvf to extract the jar file ot the current dir. 3. If point no. 1 is true, this should surely work! On Mon, May 21, 2012 at 8:33 AM, lulynn_20

Re: How to use TOP?

2012-05-22 Thread Abhinav Neelam
Doing it in the pig script is not feasible because pig doesn't have any notion of sequentiality - to maintain it, you'd need to have access to state that's shared globally by all the mappers and reducers. One way I can think of doing this is to have a UDF that maintains state - perhaps it can maint

Re: How to use TOP?

2012-05-22 Thread Mohammad Tariq
Yes, it would be better if I do it at the time of insertion.Just have to add one more column.Thanks again. Regards,     Mohammad Tariq On Tue, May 22, 2012 at 2:36 PM, Abhinav Neelam wrote: > Doing it in the pig script is not feasible because pig doesn't have any > notion of sequentiality - to

Re: [pig-0.10.0]: what jar are included in pig-0.10.0.jar and pig-0.10.0-withouthadoop.jar?

2012-05-22 Thread praveenesh kumar
As far as I know pig-0.10.0-withouthadoop.jar is generally used when you want to use your own distribution of hadoop to use with PIG. Having said that, yes it won't include any hadoop dependencies. If you want to use hadoop jar files from a external hadoop package, just use pig-0.10.0-withouthadoo

RE: Design issue, need feedback

2012-05-22 Thread Ruslan Al-fakikh
Hey Nerius, As for the columns number changes - yes, Avro or Thrift can handle that. As for transforming a value of a row from this 'Age=23' to this '23' - this is what Pig can do for you. Try something like b = foreach a generate substring(0,4,$1) AS Age --I haven't tested it, there can be typ

Re: Design issue, need feedback

2012-05-22 Thread Andy Schlaikjer
Another possible solution: Use json for your storage and load via JsonLoader: https://github.com/kevinweil/elephant-bird/blob/master/src/java/com/twitter/elephantbird/pig/load/JsonLoader.java Then you could project the fields you'd like to use out of the loaded maps via Pig's map dereference oper

Re: Parse XML file with PIG

2012-05-22 Thread Francisco Javier Gonzalez Garcia
Hi, these symbols belong to regex java class: http://docs.oracle.com/javase/7/docs/api/java/util/regex/Pattern.html 2012/5/18 krishnan N > Hi , > Thanks so much It worked for me but can you please explain ([^<]*) > and \\n\\s* part by symbols from below. > > > (RegexExtractAll(revision,'([^<]*)

Re: UDF FilterFunc and logical OR

2012-05-22 Thread Johannes Schwenk
Thank you for your quick suggestions! - I am now using local mode - good point! - I know of builtin matches, the CONTAINS filter was just to get into programming UDFS... - Whatever I do the problem persists. I tried: * turning off all optimizations (-t All) : no effect * reordering the statement

Re: Design issue, need feedback

2012-05-22 Thread Nerius Landys
> There are a couple of ways that you can do this. One, is that you could > make a special loader that converts your format to a map of (key,value) > pairs, and then you can project however you want. > > Another (better, if at all possible) way would be to use something like > Avro or Thrift that a

Re: Design issue, need feedback

2012-05-22 Thread Jonathan Coveney
Why not convert your file format into a Map, which is essentially what it is (key, value) pairs, and then you can just project out the values that you want? It's a little annoying to have to project everything manually, but will be a lot more maintainable (and this is essentially what JSONStorage d

Re: UDF FilterFunc and logical OR

2012-05-22 Thread Jonathan Coveney
If this is a bug, it's an annoying one, so I definitely appreciate your help in getting to the bottom of it. So let's get to the bottom of it :) First, I would clone the trunk version of pig and run the same tests against it and compare. Always good to test any bugs against trunk to see if it is v

Re: Design issue, need feedback

2012-05-22 Thread Nerius Landys
> Why not convert your file format into a Map, which is essentially what it > is (key, value) pairs, and then you can just project out the values that > you want? It's a little annoying to have to project everything manually, > but will be a lot more maintainable (and this is essentially what > JSO

Re:Re: [pig-0.10.0]: what jar are included in pig-0.10.0.jar and pig-0.10.0-withouthadoop.jar?

2012-05-22 Thread lulynn_2008
Thanks. Totally agree with you. At 2012-05-22 18:25:13,"praveenesh kumar" wrote: >As far as I know pig-0.10.0-withouthadoop.jar is generally used when you >want to use your own distribution of hadoop to use with PIG. Having said >that, yes it won't include any hadoop dependencies. If you want

Re: why pig did not compile contrib directory during "ant jar"? How to compile contrib directory?

2012-05-22 Thread Prashant Kommireddi
You can navigate to piggybank directory and run the ant target (ant jar) from there. You should find "build.xml" for piggybank in there. $PIG_HOME/contrib/piggybank/java (where PIG_HOME is the parent directory for pig) That will create piggybank.jar. Thanks, Prashant On Tue, May 22, 2012 at 8

Re:Re: why pig did not compile contrib directory during "ant jar"? How to compile contrib directory?

2012-05-22 Thread lulynn_2008
Yes, I found build.xml for each part: penny, piggybank and zebra. Do you know what the 3 parts are used for? At 2012-05-23 11:17:29,"Prashant Kommireddi" wrote: >You can navigate to piggybank directory and run the ant target (ant jar) >from there. You should find "build.xml" for piggybank in

Re: why pig did not compile contrib directory during "ant jar"? How to compile contrib directory?

2012-05-22 Thread Bill Graham
You need to cd into the contrib root dir and build from there. Contribs have their own build.xml. On Tuesday, May 22, 2012, lulynn_2008 wrote: > Hi, > During generating pig jar files, I found the contrib directory is not > compiled. I assume maybe this is because the contrib directory is not for

Re:Re: why pig did not compile contrib directory during "ant jar"? How to compile contrib directory?

2012-05-22 Thread lulynn_2008
Thanks Bill. Do you know what the contrib directory is used for? At 2012-05-23 11:35:23,"Bill Graham" wrote: >You need to cd into the contrib root dir and build from there. Contribs >have their own build.xml. > >On Tuesday, May 22, 2012, lulynn_2008 wrote: > >> Hi, >> During generating pig j

Re:Re: why pig did not compile contrib directory during "ant jar"? How to compile contrib directory?

2012-05-22 Thread lulynn_2008
Hey, I found the document for contrib directory: https://cwiki.apache.org/confluence/display/PIG/User+Documentation At 2012-05-23 11:35:23,"Bill Graham" wrote: >You need to cd into the contrib root dir and build from there. Contribs >have their own build.xml. > >On Tuesday, May 22, 2012, luly

Re:Re: why pig did not compile contrib directory during "ant jar"? How to compile contrib directory?

2012-05-22 Thread lulynn_2008
Hey, I found the document for contrib directory: https://cwiki.apache.org/confluence/display/PIG/User+Documentation At 2012-05-23 11:17:29,"Prashant Kommireddi" wrote: >You can navigate to piggybank directory and run the ant target (ant jar) >from there. You should find "build.xml" for piggyb

Re: Re: why pig did not compile contrib directory during "ant jar"? How to compile contrib directory?

2012-05-22 Thread Bill Graham
The contrib directories are where the source of the various contribs live. You only need to compile them if you plan to use a given contrib. Piggybank is a collection of useful UDFs: https://cwiki.apache.org/confluence/display/PIG/PiggyBank Zebra: http://pig.apache.org/docs/r0.7.0/zebra_overview.

While/CROSS/FOREACH loop

2012-05-22 Thread Russell Jurney
I need to repeatedly CROSS a data set, then FOREACH it, reduce it with a filter, then group/test it to test if it's done yet, then repeat until it is baked. How do I do that with pig, and maybe some other tool? Twitter has some ruby stuff that can do this, I think, but is there some way with neste