Re: Apache Pig map data structure

2014-09-11 Thread Abhishek Agarwal
od_id' is this correct way > to deference when I want to use raw_customer_data:: prod_id as the key in > the map. -- Regards, Abhishek Agarwal

Re: Storing time in Pig

2014-06-27 Thread Abhishek Agarwal
it from their system. L&T Infotech will not accept > responsibility or liability for the accuracy or completeness of, or the > presence of any virus or disabling code in this e-mail" > -- Regards, Abhishek Agarwal

Re: Reading SequenceFile

2014-05-25 Thread abhishek dodda
rarray);* > > *STORE A into '/user/a/test/output' using PigStorage(',');* > > After I load into a variable and dump/store the variable, I see that the > fields are all concatenated and some records are truncated. > > Please let me know if this is the right way to read a sequencefile with > Gzip (created using HIVE) into Pig. > > Thanks!! > -- Thanks, Abhishek 2018509769

Re: Reading sequence file in pig

2014-05-21 Thread abhishek dodda
/elephant-bird-hadoop-compat-4.5.jar; A = load '/etl/table=04' using com.twitter.elephantbird.pig.load.SequenceFileLoader ('-c com.twitter.elephantbird.pig.util.NullWritableConverter','-c com.twitter.elephantbird.pig.util.TextConverter') AS (key,value:chararray); Thanks Ab

Re: Reading sequence file in pig

2014-05-21 Thread abhishek dodda
at 11:54 AM, Pradeep Gollakota wrote: > That is because null is not a datatype in Pig. > http://pig.apache.org/docs/r0.12.1/basic.html#data-types > > If fact, you don't need to specify a type at all for aliases. > > Try, (key, value: chararray). > > > On Wed, May 21, 2014

Re: Reading sequence file in pig

2014-05-21 Thread abhishek dodda
ed to build it. The > artifact exists in maven central. > > > http://search.maven.org/#artifactdetails%7Ccom.twitter.elephantbird%7Celephant-bird-pig%7C4.5%7Cjar > > Hope this helps. > > > On Tue, May 20, 2014 at 1:44 PM, abhishek dodda > wrote: > &

Re: Reading sequence file in pig

2014-05-20 Thread abhishek dodda
This File output from org.apache.hcatalog.pig.HCatStorer function On Tue, May 20, 2014 at 10:44 AM, abhishek dodda wrote: > Iam getting this error > > A = load '/a/part-m-' using > org.apache.pig.piggybank.storage.SequenceFileLoader(); > > org.apache.pig.backe

Re: Reading sequence file in pig

2014-05-20 Thread abhishek dodda
at java.security.AccessController.doPrivileged(Native Method) On Tue, May 20, 2014 at 5:41 AM, Pradeep Gollakota wrote: > You can use the SequenceFileLoader from the piggybank. > > > http://pig.apache.org/docs/r0.12.0/api/org/apache/pig/piggybank/storage/SequenceFileLoader.html > > > On Tue, May 20,

Reading sequence file in pig

2014-05-19 Thread abhishek dodda
Hi All, I have trouble building code for this project. https://github.com/kevinweil/elephant-bird can some one tell how to read sequence files in pig. -- Thanks, Abhishek

Re: Pig Job Failure With More Number Of Input Files

2014-05-14 Thread Abhishek Agarwal
can you attach your job configuration here? That could help in narrowing down the problem. On Wed, May 7, 2014 at 4:05 AM, abhishek dodda wrote: > Hi all, > > There is a pig job which is failing. > > *Pig Script* > > Register > /opt/cloudera/parcels/CDH-4.2.1

Pig Job Failure With More Number Of Input Files

2014-05-06 Thread abhishek dodda
at org.apache.hadoop.mapred.Child.main(Child.java:262) Caused by: java.lang.ClassNotFoundException: *Class hcfpgchfhdfpgjgefphcgfghgofpgdgefpgehchggeheaabkgdgngmhdfpgjhdhdhcfpgcgjgofphcgfghgofpgdgefpgehchggeheaabngdgng not found* at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:1493) at org.apache.hadoop.mapred.MapTask.getSplitDetails(MapTask.java:346) ... 7 more Any inputs to resolve this issue are appreciated. Thanks for your help -- Thanks, Abhishek 2018509769

Re: Getting current split in EvalFunc

2014-04-18 Thread Abhishek Agarwal
our LoadFunc. > > Take a look at PigStorage.prepareToRead(RecordReader reader, PigSplit > split) implementation. > > > On Sat, Apr 12, 2014 at 12:23 PM, Abhishek Agarwal >wrote: > > > How can I get the current file being processed by mapper in EvalFunc? In > > t

Re: Combiner with COGROUP

2014-04-16 Thread Abhishek Agarwal
/trunk/src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/CombinerOptimizer.java#L120 > > > On Wed, Apr 16, 2014 at 8:25 AM, Abhishek Agarwal >wrote: > > > Does the pig uses combiner if cogroup operator is used for the > aggregation? > > I am currently usin

Combiner with COGROUP

2014-04-16 Thread Abhishek Agarwal
Does the pig uses combiner if cogroup operator is used for the aggregation? I am currently using the version 0.8.1. I was just wondering if the support has been added in the later versions. Any help will be appreciated. -- Regards, Abhishek Agarwal

Getting current split in EvalFunc

2014-04-12 Thread Abhishek Agarwal
How can I get the current file being processed by mapper in EvalFunc? In typical MR Job, this is achieved through ((FileSplit) context.getInputSplit).getPath() -- Regards, Abhishek Agarwal

Re: Pass user configurations/arguments to UDF

2014-04-10 Thread Abhishek Agarwal
> Hi, > > I implemented a custom load function. How to pass some user settings to > this function? > > Any help is appreciated, > > Patcharee > -- Regards, Abhishek Agarwal

Logical Plan - New and Old

2014-04-08 Thread Abhishek Agarwal
What are the differences between new and old logical plan in pig? I am not able to understand when and why I should set pig.usenewlogicalplan to true. -- Regards, Abhishek Agarwal

Re: Issue with Extracting Fields from text to csv file

2014-04-08 Thread Abhishek Agarwal
ove query, I am ready to switch to that tool also but not > JAVA. > > Early help will be appreciated. > > -- Regards, Abhishek Agarwal

Re: java.lang.OutOfMemoryError: Java heap space

2014-02-06 Thread abhishek
Hi Praveenesh, Did you use "replicated join" in your pig script or is it a regular join ?? Regards Abhishek Sent from my iPhone > On Feb 6, 2014, at 11:25 AM, praveenesh kumar wrote: > > Hi all, > > I am running a Pig Script which is running fine for small data. Bu

jobsInProgress | NoSuchFieldException

2013-09-17 Thread Ambastha, Abhishek
run using /opt/cloudera/parcels/CDH-4.3.0-1.cdh4.3.0.p0.22/lib/hadoop/hadoop-common-2.0.0-cdh4.3.0.jar Regards, Abhishek

Create Table + Join + Max 'String' Date

2013-09-09 Thread Ambastha, Abhishek
2 and Table_3. Please suggest how to do this. Regards, Abhishek

RE: Dedupe Logic

2013-08-26 Thread Ambastha, Abhishek
Dear Jacob, Many thanks! It worked perfect. Regards, Abhishek -Original Message- From: Jacob Perkins [mailto:jacob.a.perk...@gmail.com] Sent: 24 August 2013 23:49 To: user@pig.apache.org Subject: Re: Dedupe Logic Abhishek, You should be able to do this by grouping by the three

Date Function in Pig

2013-08-26 Thread Ambastha, Abhishek
Hi, I have columns defined as string where dates are in the format ddMMM (e.g. 01JAN2009). I would like to sort this column in ascending order. For that, I need to convert string into date in the format 'ddMMM'. Please suggest how to do this. Regards, Abhishek

Dedupe Logic

2013-08-24 Thread Ambastha, Abhishek
). Please suggest how to do this. Also, can this be done using rank function? Regards, Abhishek

Adding files to distributed cache

2013-06-25 Thread abhishek
> Hello, > How can I add multiple files to distributed cache in pig UDF. > > Regards > Abhi

Re: pig union with avro

2013-06-15 Thread abhishek
Cheolsoo, Thank you very much , I found the in compatible types in the union. Now it is working fine. Regards Abhishek Sent from my iPhone On Jun 15, 2013, at 11:52 PM, Cheolsoo Park wrote: > What's the output of describe A and B? > > If the schema of A and B are not identic

Re: pig union with avro

2013-06-15 Thread abhishek
te array Thanks, Cheolsoo > > On Sat, Jun 15, 2013 at 7:48 PM, abhishek dodda > wrote: > >> hello, >> >> I am doing this >> >> DEFINE AVRO_LOAD org.apache.pig.piggybank.strorage.avro.AvroStorage(); >> >> A = load '/user/abhi/a.txt

Re: pig union with avro

2013-06-15 Thread abhishek dodda
ig.tools.grunt.Grunt - ERROR 1051 : cannot cast to byte array* In the pig logs the error is *ERROR 1056 problem while casting inputs of union*. Script was running fine before, but it is failing now with the above error Regards abhishek On Sat, Jun 15, 2013 at 7:44 PM, abhishek dodda wrote: >

pig union with avro

2013-06-15 Thread abhishek dodda
hello, I am doing this DEFINE AVRO_LOAD org.apache.pig.piggybank.strorage.avro.AvroStorage(); A = load '/user/abhi/a.txt' using AVRO_LOAD; B = load '/user/abhi/b.txt' using AVRO_LOAD; C = UNION A , B; here script is failing with the following error ERROR org.apache.pig.tools.grunt.Grunt - ER

Re: Pig and Avro Error

2013-06-09 Thread abhishek dodda
t;int"]. According to the error message, one record includes a > float (23.0) instead of integer, and thus, it fails. > > I would try to DESCRIBE and DUMP on final and find which column is causing > the mismatch. It's hard to tell what the exact problem is without seeing > your

Pig and Avro Error

2013-06-09 Thread abhishek dodda
ore final into '/user/abhi/output' using org.apache.pig.piggybank.storage.avro.AvroStorage(); Thanks abhishek

Re: Pig question

2013-05-07 Thread abhishek
; > Your question doesn't make much sense...I think you may have left a piece off? > > > 2013/5/7 abhishek >> Hi all, >> >> In my script >> >> a = load 'data' using PigStorage(); >> >> b = foreach a generate >>

Pig question

2013-05-06 Thread abhishek
Hi all, In my script a = load 'data' using PigStorage(); b = foreach a generate 342 as col1, substring(x,0,4) as col2, ; I want to use col2 later in foreach statement. derived col2 should be used below. Regards Abhi

Fwd: Pig error

2013-01-14 Thread abhishek
>> Hi all, >> >> When am using JOIN operator in pig, am getting following error >> >> Pig joins inner plans can only have one output leaf? >> >> Can any one tell me, why does this occur. >> >> Regards >> Abhi

Re: Group by with count

2012-12-27 Thread abhishek
Thanks for the reply , I got it Sent from my iPhone On Dec 27, 2012, at 2:31 PM, Jonathan Coveney wrote: > a = load 'tab1' as (col1, col2, col3); > b = group a by (col1, col2, col3); > c = foreach b generate FLATTEN(group), COUNT_STAR(a); > > > 2012/12/26 abhish

Re: Group by with count

2012-12-27 Thread abhishek
= load 'tab1' as (col1, col2, col3); >> b = group a by (col1, col2, col3); >> c = foreach b generate FLATTEN(group), COUNT_STAR(a); >> >> >> 2012/12/26 abhishek >> >>> Hi all, >>> >>> How can I achieve above hive que

Group by with count

2012-12-26 Thread abhishek
Hi all, How can I achieve above hive query in pig Create table x as select y.col1,y.col2,y.col3,count(*) as count from tab1 y group by y.col1,y.col2,y.col3 Regards Abhishek

Re: Partitions in pig

2012-12-18 Thread abhishek
Jurney http://datasyndrome.com > > On Dec 18, 2012, at 4:27 PM, abhishek wrote: > >> Directory based partition in hive. >> >> Partition by date >> >> Thanks >> Abhi >> >> Sent from my iPhone >> >> On Dec 18, 2012, at 7:20 PM, Russell J

Re: Partitions in pig

2012-12-18 Thread abhishek
Hi Russell, I will try this and get back to you. Regards Abhishek Sent from my iPhone On Dec 18, 2012, at 7:43 PM, Russell Jurney wrote: > It will work like so: > http://stackoverflow.com/questions/3515481/pig-latin-load-multiple-files-from-a-date-range-part-of-the-directory-st

Re: Partitions in pig

2012-12-18 Thread abhishek
ssell Jurney http://datasyndrome.com > > On Dec 18, 2012, at 4:12 PM, abhishek wrote: > >> Hi Russell, >> >> Thanks for the reply.How RCFile loader is related to partitions? >> >> I did not get your point in this. >> >> Regards >> Abhi >>

Re: Partitions in pig

2012-12-18 Thread abhishek
ta > > However, there is an RCFile loader in Piggybank: > http://svn.apache.org/viewvc/pig/trunk/contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/HiveColumnarLoader.java?view=markup > > Russell Jurney http://datasyndrome.com > > On Dec 18, 2012, at 2:39 PM, a

Partitions in pig

2012-12-18 Thread abhishek
in pig? Regards Abhishek

Re: Pig optimization rules

2012-10-16 Thread abhishek dodda
ects the join performance, How efficient is Bloom filter compared to Replicated join.Can Bloom filter be applied for Outer join. Regards Abhishek On Tue, Oct 16, 2012 at 10:04 PM, Thejas Nair wrote: > On 10/15/12 8:47 PM, abhishek dodda wrote: > >> hi all, >> >> I

Re: Pig storage and load functions and Cache

2012-10-15 Thread abhishek dodda
yet made it into Pig yet; I believe a general compute > caching framework would be very useful, and look forward to someone > taking up that challenge.. > > D > > On Fri, Oct 5, 2012 at 2:51 PM, Abhishek wrote: >> BinStorage() >> PigDump() >> PigStorage() >> Tex

Re: Small question

2012-10-12 Thread Abhishek
n Pig you could do something like --- " (x is null ? y : z) This a > standard ternary if/else. Don't see if the 0.00 actually plays any > other role than the null check, right? > > On Fri, Oct 12, 2012 at 10:05 AM, Abhishek wrote: >> Hi all, >> >> How to

Small question

2012-10-12 Thread Abhishek
Hi all, How to Hive coalesce statement in pig Example: Case when Coalesce(x,0.00)=0.00 then y else z How to write this in pig Regards Abhi

Pig config properties

2012-10-09 Thread Abhishek
Sent from my iPhone

Pig storage and load functions and Cache

2012-10-05 Thread Abhishek
BinStorage() PigDump() PigStorage() TextLoader() Load or storing in which of the above format.Will optimize the queries. Can cache be any where in pig.How can the cache be use ful in pig. Regards Abhi

Pig query vs netezza

2012-10-05 Thread Abhishek
Hi all, Did any one compare the performance of querys written in. Hadoop-pig vs netezza Any Benchmark tests, on small data setComparing the performance between Hadoop-pig and netezza Regards Abhi Sent from my iPhone

Re: Optimizations in pig

2012-10-04 Thread abhishek dodda
Thanks for the information Zhu. Regards abhishek On Thu, Oct 4, 2012 at 4:35 PM, TianYi Zhu wrote: > Hi Abhishek, > > http://archive.cloudera.com/cdh4/cdh/4/pig/perf.html > http://ofps.oreilly.com/titles/9781449302641/making_pig_fly.html > > On Fri, Oct 5, 2012 at 8:18 A

Re: Optimizations in pig

2012-10-04 Thread abhishek dodda
t; > Setting any properties can be achieved via "set property.name value;" -- Generally what kind of property's you override in pig grunt shell, important properties to over ride. Regards Abhi > > D > > > On Thu, Oct 4, 2012 at 4:35 PM, TianYi Zhu >

Optimizations in pig

2012-10-04 Thread Abhishek
Hi all, I am new to pig. In hive we can optimize the code by using Indexing Bucketing Partitions Storing the file in different formats, such as Rc file,sequence file Overriding some property in the hive shell. By using Set property name = value; Override some default property in grunt shell

Small question in pig

2012-10-02 Thread Abhishek
hi all, I am fairly new to pig.Can any one tell me how to write below hive query in pig latin. In this query iam using Cartesian join to achieve instring or contains in java. Example col1 -- 145678341212 col2 -- %67834% insert into table t1 select t2.col1, t3.col2 from table2 t2 join table