Re: Reading SequenceFile

2014-05-25 Thread abhishek dodda
Please try this.Elephant bird project for reading sequence files https://github.com/kevinweil/elephant-bird You can get this jars from maven central repository http://search.maven.org/#artifactdetails%7Ccom.twitter.elephantbird%7Celephant-bird-pig%7C4.5%7Cjar REGISTER /home/xyz/elephant-bird-pi

Re: Reading sequence file in pig

2014-05-21 Thread abhishek dodda
hishek On Wed, May 21, 2014 at 12:14 PM, abhishek dodda wrote: > Not working yet > > A = load '/etl/table=04' using > com.twitter.elephantbird.pig.load.SequenceFileLoader > >> ('-c com.twitter.elephantbird.pig.util.TextConverter','-c > com.twitter

Re: Reading sequence file in pig

2014-05-21 Thread abhishek dodda
at 11:54 AM, Pradeep Gollakota wrote: > That is because null is not a datatype in Pig. > http://pig.apache.org/docs/r0.12.1/basic.html#data-types > > If fact, you don't need to specify a type at all for aliases. > > Try, (key, value: chararray). > > > On Wed, May 21, 2014

Re: Reading sequence file in pig

2014-05-21 Thread abhishek dodda
ed to build it. The > artifact exists in maven central. > > > http://search.maven.org/#artifactdetails%7Ccom.twitter.elephantbird%7Celephant-bird-pig%7C4.5%7Cjar > > Hope this helps. > > > On Tue, May 20, 2014 at 1:44 PM, abhishek dodda > wrote: > &

Re: Reading sequence file in pig

2014-05-20 Thread abhishek dodda
This File output from org.apache.hcatalog.pig.HCatStorer function On Tue, May 20, 2014 at 10:44 AM, abhishek dodda wrote: > Iam getting this error > > A = load '/a/part-m-' using > org.apache.pig.piggybank.storage.SequenceFileLoader(); > > org.apache.pig.backe

Re: Reading sequence file in pig

2014-05-20 Thread abhishek dodda
at java.security.AccessController.doPrivileged(Native Method) On Tue, May 20, 2014 at 5:41 AM, Pradeep Gollakota wrote: > You can use the SequenceFileLoader from the piggybank. > > > http://pig.apache.org/docs/r0.12.0/api/org/apache/pig/piggybank/storage/SequenceFileLoader.html > > > On Tue, May 20,

Reading sequence file in pig

2014-05-19 Thread abhishek dodda
Hi All, I have trouble building code for this project. https://github.com/kevinweil/elephant-bird can some one tell how to read sequence files in pig. -- Thanks, Abhishek

Pig Job Failure With More Number Of Input Files

2014-05-06 Thread abhishek dodda
Hi all, There is a pig job which is failing. *Pig Script* Register /opt/cloudera/parcels/CDH-4.2.1-1.cdh4.2.1.p0.5/lib/pig/piggybank.jar; /* READ LAST 30 DAYS DATA */ /* xyz table is partitioned with dt*/ au = LOAD 'xyz' USING org.apache.hcatalog.pig.HCatLoader(); ad = FILTER au by ($a) and $

Re: pig union with avro

2013-06-15 Thread abhishek dodda
ig.tools.grunt.Grunt - ERROR 1051 : cannot cast to byte array* In the pig logs the error is *ERROR 1056 problem while casting inputs of union*. Script was running fine before, but it is failing now with the above error Regards abhishek On Sat, Jun 15, 2013 at 7:44 PM, abhishek dodda wrote: >

pig union with avro

2013-06-15 Thread abhishek dodda
hello, I am doing this DEFINE AVRO_LOAD org.apache.pig.piggybank.strorage.avro.AvroStorage(); A = load '/user/abhi/a.txt' using AVRO_LOAD; B = load '/user/abhi/b.txt' using AVRO_LOAD; C = UNION A , B; here script is failing with the following error ERROR org.apache.pig.tools.grunt.Grunt - ER

Re: Pig and Avro Error

2013-06-09 Thread abhishek dodda
t;int"]. According to the error message, one record includes a > float (23.0) instead of integer, and thus, it fails. > > I would try to DESCRIBE and DUMP on final and find which column is causing > the mismatch. It's hard to tell what the exact problem is without seeing > your

Pig and Avro Error

2013-06-09 Thread abhishek dodda
hi all, Running pig with avro storage and facing the below issue pig 0.10 and avro 1.7 * * *org.apache.avro.file.DataFileWriter$AppenderWriteException : java.lang.RuntimeException : Dataum 23.0 is not in union ["null" , "int"]* * * my pig script does the following a = load '/user/abhi/abc.txt' u

Re: Pig optimization rules

2012-10-16 Thread abhishek dodda
ects the join performance, How efficient is Bloom filter compared to Replicated join.Can Bloom filter be applied for Outer join. Regards Abhishek On Tue, Oct 16, 2012 at 10:04 PM, Thejas Nair wrote: > On 10/15/12 8:47 PM, abhishek dodda wrote: > >> hi all, >> >> I

Re: Pig storage and load functions and Cache

2012-10-15 Thread abhishek dodda
hi Dmitriy, Thanks for the information. Can you share your views on the below query. BinStorage() PigDump() PigStorage() TextLoader() Load or storing in which of the above format.Will optimize the queries.Considering i have text files. Regards Abhi On Mon, Oct 8, 2012 at 12:10 AM, Dmitriy R

Re: Optimizations in pig

2012-10-04 Thread abhishek dodda
Thanks for the information Zhu. Regards abhishek On Thu, Oct 4, 2012 at 4:35 PM, TianYi Zhu wrote: > Hi Abhishek, > > http://archive.cloudera.com/cdh4/cdh/4/pig/perf.html > http://ofps.oreilly.com/titles/9781449302641/making_pig_fly.html > > On Fri, Oct 5, 2012 at 8:18 AM, Abhishek wrote: > >>

Re: Optimizations in pig

2012-10-04 Thread abhishek dodda
Thanks for your detailed explanation, I have some doubts which are below please clarify them On Thu, Oct 4, 2012 at 4:59 PM, Dmitriy Ryaboy wrote: > bucketing and partitioning is just setting the files up right. you can > do that explicitly. -- How can i do buckets explicitly i don't get your po