Reading Avro files with Pig

2012-11-19 Thread Bart Verwilst
Hi, I'm trying to read the Avro file i stored on HDFS, but I seem to be hitting a snag. I'm hoping some of you will be able to shed some light on this and allow me to continue my adventure! REGISTER 'hdfs:///lib/avro-1.7.2.jar'; REGISTER 'hdfs:///lib/json-simple-1.1.1.jar'; REGISTER

Re: How do I load JSON in Pig?

2012-11-19 Thread Russell Jurney
It seems that everyone can build elephant-bird but me: https://github.com/kevinweil/elephant-bird/issues/272 On Sun, Nov 18, 2012 at 7:31 PM, Arian Pasquali ar...@arianpasquali.comwrote: I dont think you really need to build it. you can find it at any maven repository. Arian Rodrigo

Re: PigStorage

2012-11-19 Thread Bill Graham
+1 as well, but I'd suggest we do the following: - Keep mProtoTuple private and add protected getters/setters instead with javadocs describing expected usage. - Rename mProtoTuple and the getters/setters to something more descriptive than mProtoTuple. On Fri, Nov 16, 2012 at 2:15 PM, Dmitriy

Re: PigStorage

2012-11-19 Thread pablomar
sure. My initial (and dirty) idea changed only 2 lines. I completely agree with you On Mon, Nov 19, 2012 at 12:16 PM, Bill Graham billgra...@gmail.com wrote: +1 as well, but I'd suggest we do the following: - Keep mProtoTuple private and add protected getters/setters instead with javadocs

Re: Reading Avro files with Pig

2012-11-19 Thread Cheolsoo Park
Hi Bart, Please try to print out the schema of 'avro' using 'DESCRIBE avro'. This will show you the field names in the relation. avro = load '/import/2012-01-04-deflate.**avro' USING AvroStorage(); DESCRIBE avro; Given your description, I suppose that changing 'trace.terminalid' to

Re: How do I load JSON in Pig?

2012-11-19 Thread Russell Jurney
Got it building. Are google collections and json-simple external deps? On Mon, Nov 19, 2012 at 11:23 AM, Russell Jurney russell.jur...@gmail.comwrote: It seems that everyone can build elephant-bird but me: https://github.com/kevinweil/elephant-bird/issues/272 On Sun, Nov 18, 2012 at 7:31

Re: How do I load JSON in Pig?

2012-11-19 Thread Russell Jurney
Talking to myself... never mind, guava and json-simple are included with Pig. On Mon, Nov 19, 2012 at 2:27 PM, Russell Jurney russell.jur...@gmail.comwrote: Got it building. Are google collections and json-simple external deps? On Mon, Nov 19, 2012 at 11:23 AM, Russell Jurney

Re: How do I load JSON in Pig?

2012-11-19 Thread Russell Jurney
Wait... com.twitter.elephantbird.pig.load.JsonLoader() does not infer the schema from a record. This is what I was looking for. Looks like I have to write that myself. And yes, I understand the tradeoffs in doing so. Assuming a sample is the overall schema is a big assumption. On Mon, Nov 19,

Re: How do I load JSON in Pig?

2012-11-19 Thread Russell Jurney
Ok, its even worse. My data is a big array. Am I being negative in saying that JSON and Pig is like a nightmare? On Mon, Nov 19, 2012 at 2:33 PM, Russell Jurney russell.jur...@gmail.comwrote: Wait... com.twitter.elephantbird.pig.load.JsonLoader() does not infer the schema from a record. This

Re: How do I load JSON in Pig?

2012-11-19 Thread Deepak Tiwari
I also ran into same dilemma..here is something that I found easier and working for me .. I compiled some sources from http://www.json.org/java/ import java.io.IOException; import java.io.UnsupportedEncodingException; import java.util.List; import org.apache.pig.EvalFunc; import

RE: Intermittent NullPointerException

2012-11-19 Thread Malcolm Tye
Hi Cheolsoo, The patch works as expected. We've not seen one error in the test system since we installed the new jar file. We're only processing ~200 rows at the most when we run the script, not sure if that helps you narrow down the cause. I assume we just use the patch you gave

Re: PigStorage

2012-11-19 Thread Jonathan Coveney
Make a JIRA and attach the patch, please. 2012/11/19 pablomar pablo.daniel.marti...@gmail.com hi all, I did it as simple as I could. What about this changes ? PigStorage.java original: private void readField(byte[] buf, int start, int end) { if (start == end) {

Re: PigStorage

2012-11-19 Thread pablomar
done. PIG-3057 https://issues.apache.org/jira/browse/PIG-3057 On Mon, Nov 19, 2012 at 6:32 PM, Jonathan Coveney jcove...@gmail.comwrote: Make a JIRA and attach the patch, please. 2012/11/19 pablomar pablo.daniel.marti...@gmail.com hi all, I did it as simple as I could. What about

[pig trunk]:We need to upgrade junit to at least 4.8

2012-11-19 Thread lulynn_2008
Hi Cheolsoo, I think We need to upgrade junit to at leaset 4.8. As HBase-0.94 is using junit-4.10, and we got following warnings during ant ... javadoc. [javadoc] org/apache/hadoop/hbase/mapreduce/TestWALPlayer.class(org/apache/hadoop/hbase/mapreduce:TestWALPlayer.class): warning: Cannot

Re: need help about pig script on this case

2012-11-19 Thread Jonathan Coveney
In pure Pig, you wouldn't do something like this. However, PIg supports control flow in Python (I really should get on making the JRuby wrapper, but I digress). You can find docs for this on the pig website. Basically the control flow is in Python, and you launch jobs from there. 2012/11/19

Re: need help about pig script on this case

2012-11-19 Thread jamal sasha
On a different context, I was once stuck with the same problem but was able to navigate this using bincond operator. http://ofps.oreilly.com/titles/9781449302641/intro_pig_latin.html Not sure, how you would hack in here.. but i have a feeling it can be pulled off. On Mon, Nov 19, 2012 at 8:49

Re: Intermittent NullPointerException

2012-11-19 Thread Cheolsoo Park
Hi Malcolm, Thank you for sharing it. I am glad to hear that it worked. :-) We're only processing ~200 rows at the most when we run the script, not sure if that helps you narrow down the cause. Very interesting. That's surprisingly small. In my test, I used 10m rows of random integers as

Re:Re: Re: Pig UT last nearly 8 hours and TestEvalPipeline2 lasts for 37 minutes

2012-11-19 Thread lulynn_2008
Maybe we can run some UT paralleled. At 2012-11-15 03:12:27,Johnny Zhang xiao...@cloudera.com wrote: Hi, lulynn_2008: I am not aware of how to shorten the time. Johnny On Tue, Nov 13, 2012 at 7:27 PM, lulynn_2008 lulynn_2...@163.com wrote: Thanks. Then my environment is normal. Is

[#PIG-3059] Global configurable minimum 'bad record' thresholds - ASF JIRA

2012-11-19 Thread Russell Jurney
https://issues.apache.org/jira/browse/PIG-3059 I wanted to make sure people saw this JIRA, as I think it will dramatically improve Pig. Discussion of this issue is available here: http://www.quora.com/Big-Data/In-Big-Data-ETL-how-many-records-are-an-acceptable-loss Russell Jurney