Re: Pig/Avro Question

2012-02-02 Thread Russell Jurney
I have the same bug. I read the code... there is no obvious fix. Arg. On Feb 2, 2012, at 10:07 PM, Something Something wrote: > In my Pig script I have something like this... > > %default MY_SCHEMA '/user/xyz/my-schema.json'; > > %default MY_AVRO > 'org.apache.pig.piggybank.storage.avro.Avr

Re: Non-standard grouping

2012-02-02 Thread Dmitriy Ryaboy
Ah, yeah, if you can shrink data down that much, going outside of Pig (or doing things in a UDF) is the way to go. D On Thu, Feb 2, 2012 at 3:45 PM, Grig Gheorghiu wrote: > Hey Dmitriy! Unfortunately that't the requirement. The solution I > found so far is to do all the pre-filtering and groupin

Way of determining the source of data

2012-02-02 Thread Ranjan Bagchi
Hi, I've a bunch of [for example] apache logfiles that I'm searching through. I can process them with: logs = load 's3://bucket/directory/*' USING LogLoader as (remoteAddr, remoteLogname, user, time :chararray, method, uri :chararray, proto, status, bytes, referer, userAgent); Is there any w

Re: Problem with Pig AvroStorage, with Avros that work in Ruby and Python

2012-02-02 Thread Russell Jurney
Cleaned up my environment by unsetting HADOOP_HOME, and removing some old jacksons in my CLASSPATH and Pig's AvroStorage works again. Woot! On Thu, Feb 2, 2012 at 3:47 PM, Russell Jurney wrote: > Spoken too soon... this happens no matter what avros I load now. I can't > figure that anything has

Re: Cannot STORE from Pig to HBase 0.9x

2012-02-02 Thread Royston Sellman
Thanks for your reply. WritableByteArrayComparable is in the same place in HBase 0.93. I also registered my HBase jar from within Pig i.e register /opt/hbase/hbase-0.93.jar and that command returned without error. I also edited the line in the pig startup script that specifies the path to HBase.

Re: Problem with Pig AvroStorage, with Avros that work in Ruby and Python

2012-02-02 Thread Russell Jurney
Spoken too soon... this happens no matter what avros I load now. I can't figure that anything has changed regarding jars, etc. Confused. I think this happens when Avro is parsing the schema? Pig Stack Trace --- ERROR 2998: Unhandled internal error. org.codehaus.jackson.JsonFactory.e

Re: Non-standard grouping

2012-02-02 Thread Grig Gheorghiu
Hey Dmitriy! Unfortunately that't the requirement. The solution I found so far is to do all the pre-filtering and grouping I can in Pig, and then run Python on the output file generated by Pig. That file is ~ 300 MB, so it's not a problem to just run through Python. Thanks for getting back to me.

Re: Non-standard grouping

2012-02-02 Thread Dmitriy Ryaboy
"records before" is kind of hard do define in an MR paradigm. I suppose you could group and then run the records through an accumulative UDF. But this is feeling very hacky. Is there a more scalable (order-independent) way you can do what you need? On Thu, Jan 26, 2012 at 4:32 PM, Grig Gheorghiu w

Re: Cannot STORE from Pig to HBase 0.9x

2012-02-02 Thread Dmitriy Ryaboy
"Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.hbase.filter.WritableByteArrayComparable" indicates that you don't have HBase on your classpath, or that the version of HBase you are testing against moved this class someplace else. We've tested against the 0.90 series, but not 0.92+.

Cannot STORE from Pig to HBase 0.9x

2012-02-02 Thread Royston Sellman
Hi, I'm trying to use Pig 0.9.2 with HBase 0.93 (i.e. the latest from HBase trunk) and following the tutorial. This line loads the sample file from HDFS successfully: raw = LOAD 'test/excite-small.log' USING PigStorage('\t') AS (user, time, query); This line seems to work: T= FOREACH

Re: Question on generate semantics

2012-02-02 Thread Xiaomeng Wan
a bag of one tuple is still a bag, you need to flatten it generate group, FLATTEN(biggest); Shawn On Wed, Feb 1, 2012 at 10:07 PM, Sid Stuart wrote: > Hi, > > I'm using Pig to analyze some log files. We would like to find the last > time a URL has been accessed. I've pulled out the path and the

Re: issue with partioning sdf

2012-02-02 Thread Alan Gates
On Feb 1, 2012, at 5:04 PM, Aleksandr Elbakyan wrote: > Hello All, > > I am trying to understand how does pig group partitioning work, I was not > able to find any documentation regarding what happen under the hood. > > > For example > > B = GROUP A BY age; > > Does pig partition data by a

Re: How to use tuples ?

2012-02-02 Thread praveenesh kumar
Okie so its wierd. I was able to run a pig query using $0.$0 the pig script I wrote for the data (tmp.txt) : (1,2,3) (2,4,5) (2,3,4) (2,3,5) z = load 'tmp.txt'; x = foreach z generate $0.$0; dump x; It ran fine for first time. But now its giving me error : ERROR 1066: Unable to open iterator

Re: How to use tuples ?

2012-02-02 Thread praveenesh kumar
Okie got it.Thanks for guiding. Without schema. we can refer through $0.$0 or $1.$0 and so on based on the positions.. Thanks, Praveenesh On Thu, Feb 2, 2012 at 3:28 PM, praveenesh kumar wrote: > One more thing, suppose I have data - tmp.txt lie > (1,2,3) (2,4,5) > (2,3,4) (2,3,5) > > So if I w

Re: How to use tuples ?

2012-02-02 Thread praveenesh kumar
One more thing, suppose I have data - tmp.txt like (1,2,3) (2,4,5) (2,3,4) (2,3,5) So if I will use Z1 = Load 'tmp.txt' The data will get stored in a bag (right?) ( (1,2,3), (2,4,5) ) ( (2,3,4), (2,3,5) ) Now I can refer to the fields in this case ( without schema ) ? B = Foreach Z1 generate

Re: How to use tuples ?

2012-02-02 Thread praveenesh kumar
thanks Daniel, so it means for all other complex datatypes, we need the file contents to be in that format like tuples in ( ), bag in { } , map in [ ] On Thu, Feb 2, 2012 at 2:49 PM, Daniel Dai wrote: > Hi, Praveenesh, > Your tmp.txt should be: > (1,2,3,4) > (2,3,4,5) > (4,5,5,6) > > And you c

Re: How to use tuples ?

2012-02-02 Thread Daniel Dai
Hi, Praveenesh, Your tmp.txt should be: (1,2,3,4) (2,3,4,5) (4,5,5,6) And you cannot use "," as a delimit for PigStorage, otherwise, PigStorage will split the line with comma first then parse the tuple. Daniel On Thu, Feb 2, 2012 at 1:05 AM, praveenesh kumar wrote: > Hi, > > I am trying to lear