od_id' is this correct way
> to deference when I want to use raw_customer_data:: prod_id as the key in
> the map.
--
Regards,
Abhishek Agarwal
it from their system. L&T Infotech will not accept
> responsibility or liability for the accuracy or completeness of, or the
> presence of any virus or disabling code in this e-mail"
>
--
Regards,
Abhishek Agarwal
rarray);*
>
> *STORE A into '/user/a/test/output' using PigStorage(',');*
>
> After I load into a variable and dump/store the variable, I see that the
> fields are all concatenated and some records are truncated.
>
> Please let me know if this is the right way to read a sequencefile with
> Gzip (created using HIVE) into Pig.
>
> Thanks!!
>
--
Thanks,
Abhishek
2018509769
/elephant-bird-hadoop-compat-4.5.jar;
A = load '/etl/table=04' using
com.twitter.elephantbird.pig.load.SequenceFileLoader
('-c com.twitter.elephantbird.pig.util.NullWritableConverter','-c
com.twitter.elephantbird.pig.util.TextConverter')
AS (key,value:chararray);
Thanks
Ab
at 11:54 AM, Pradeep Gollakota wrote:
> That is because null is not a datatype in Pig.
> http://pig.apache.org/docs/r0.12.1/basic.html#data-types
>
> If fact, you don't need to specify a type at all for aliases.
>
> Try, (key, value: chararray).
>
>
> On Wed, May 21, 2014
ed to build it. The
> artifact exists in maven central.
>
>
> http://search.maven.org/#artifactdetails%7Ccom.twitter.elephantbird%7Celephant-bird-pig%7C4.5%7Cjar
>
> Hope this helps.
>
>
> On Tue, May 20, 2014 at 1:44 PM, abhishek dodda > wrote:
>
&
This File output from org.apache.hcatalog.pig.HCatStorer function
On Tue, May 20, 2014 at 10:44 AM, abhishek dodda
wrote:
> Iam getting this error
>
> A = load '/a/part-m-' using
> org.apache.pig.piggybank.storage.SequenceFileLoader();
>
> org.apache.pig.backe
at java.security.AccessController.doPrivileged(Native Method)
On Tue, May 20, 2014 at 5:41 AM, Pradeep Gollakota wrote:
> You can use the SequenceFileLoader from the piggybank.
>
>
> http://pig.apache.org/docs/r0.12.0/api/org/apache/pig/piggybank/storage/SequenceFileLoader.html
>
>
> On Tue, May 20,
Hi All,
I have trouble building code for this project.
https://github.com/kevinweil/elephant-bird
can some one tell how to read sequence files in pig.
--
Thanks,
Abhishek
can you attach your job configuration here? That could help in narrowing
down the problem.
On Wed, May 7, 2014 at 4:05 AM, abhishek dodda wrote:
> Hi all,
>
> There is a pig job which is failing.
>
> *Pig Script*
>
> Register
> /opt/cloudera/parcels/CDH-4.2.1
at org.apache.hadoop.mapred.Child.main(Child.java:262)
Caused by: java.lang.ClassNotFoundException: *Class
hcfpgchfhdfpgjgefphcgfghgofpgdgefpgehchggeheaabkgdgngmhdfpgjhdhdhcfpgcgjgofphcgfghgofpgdgefpgehchggeheaabngdgng
not found*
at
org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:1493)
at org.apache.hadoop.mapred.MapTask.getSplitDetails(MapTask.java:346)
... 7 more
Any inputs to resolve this issue are appreciated. Thanks for your help
--
Thanks,
Abhishek
2018509769
our LoadFunc.
>
> Take a look at PigStorage.prepareToRead(RecordReader reader, PigSplit
> split) implementation.
>
>
> On Sat, Apr 12, 2014 at 12:23 PM, Abhishek Agarwal >wrote:
>
> > How can I get the current file being processed by mapper in EvalFunc? In
> > t
/trunk/src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/CombinerOptimizer.java#L120
>
>
> On Wed, Apr 16, 2014 at 8:25 AM, Abhishek Agarwal >wrote:
>
> > Does the pig uses combiner if cogroup operator is used for the
> aggregation?
> > I am currently usin
Does the pig uses combiner if cogroup operator is used for the aggregation?
I am currently using the version 0.8.1. I was just wondering if the support
has been added in the later versions. Any help will be appreciated.
--
Regards,
Abhishek Agarwal
How can I get the current file being processed by mapper in EvalFunc? In
typical MR Job, this is achieved through ((FileSplit)
context.getInputSplit).getPath()
--
Regards,
Abhishek Agarwal
> Hi,
>
> I implemented a custom load function. How to pass some user settings to
> this function?
>
> Any help is appreciated,
>
> Patcharee
>
--
Regards,
Abhishek Agarwal
What are the differences between new and old logical plan in pig? I am not
able to understand when and why I should set pig.usenewlogicalplan to true.
--
Regards,
Abhishek Agarwal
ove query, I am ready to switch to that tool also but not
> JAVA.
>
> Early help will be appreciated.
>
>
--
Regards,
Abhishek Agarwal
Hi Praveenesh,
Did you use "replicated join" in your pig script or is it a regular join ??
Regards
Abhishek
Sent from my iPhone
> On Feb 6, 2014, at 11:25 AM, praveenesh kumar wrote:
>
> Hi all,
>
> I am running a Pig Script which is running fine for small data. Bu
run using
/opt/cloudera/parcels/CDH-4.3.0-1.cdh4.3.0.p0.22/lib/hadoop/hadoop-common-2.0.0-cdh4.3.0.jar
Regards,
Abhishek
2 and Table_3.
Please suggest how to do this.
Regards,
Abhishek
Dear Jacob,
Many thanks! It worked perfect.
Regards,
Abhishek
-Original Message-
From: Jacob Perkins [mailto:jacob.a.perk...@gmail.com]
Sent: 24 August 2013 23:49
To: user@pig.apache.org
Subject: Re: Dedupe Logic
Abhishek,
You should be able to do this by grouping by the three
Hi,
I have columns defined as string where dates are in the format ddMMM (e.g.
01JAN2009). I would like to sort this column in ascending order. For that, I
need to convert string into date in the format 'ddMMM'.
Please suggest how to do this.
Regards,
Abhishek
). Please suggest how to do this.
Also, can this be done using rank function?
Regards,
Abhishek
> Hello,
> How can I add multiple files to distributed cache in pig UDF.
>
> Regards
> Abhi
Cheolsoo,
Thank you very much , I found the in compatible types in the union.
Now it is working fine.
Regards
Abhishek
Sent from my iPhone
On Jun 15, 2013, at 11:52 PM, Cheolsoo Park wrote:
> What's the output of describe A and B?
>
> If the schema of A and B are not identic
te array
Thanks,
Cheolsoo
>
> On Sat, Jun 15, 2013 at 7:48 PM, abhishek dodda
> wrote:
>
>> hello,
>>
>> I am doing this
>>
>> DEFINE AVRO_LOAD org.apache.pig.piggybank.strorage.avro.AvroStorage();
>>
>> A = load '/user/abhi/a.txt
ig.tools.grunt.Grunt - ERROR 1051 : cannot cast to byte
array*
In the pig logs the error is
*ERROR 1056 problem while casting inputs of union*.
Script was running fine before, but it is failing now with the above error
Regards
abhishek
On Sat, Jun 15, 2013 at 7:44 PM, abhishek dodda
wrote:
>
hello,
I am doing this
DEFINE AVRO_LOAD org.apache.pig.piggybank.strorage.avro.AvroStorage();
A = load '/user/abhi/a.txt' using AVRO_LOAD;
B = load '/user/abhi/b.txt' using AVRO_LOAD;
C = UNION A , B;
here script is failing with the following error
ERROR org.apache.pig.tools.grunt.Grunt - ER
t;int"]. According to the error message, one record includes a
> float (23.0) instead of integer, and thus, it fails.
>
> I would try to DESCRIBE and DUMP on final and find which column is causing
> the mismatch. It's hard to tell what the exact problem is without seeing
> your
ore final into '/user/abhi/output' using
org.apache.pig.piggybank.storage.avro.AvroStorage();
Thanks
abhishek
;
> Your question doesn't make much sense...I think you may have left a piece off?
>
>
> 2013/5/7 abhishek
>> Hi all,
>>
>> In my script
>>
>> a = load 'data' using PigStorage();
>>
>> b = foreach a generate
>>
Hi all,
In my script
a = load 'data' using PigStorage();
b = foreach a generate
342 as col1,
substring(x,0,4) as col2,
;
I want to use col2 later in foreach statement.
derived col2 should be used below.
Regards
Abhi
>> Hi all,
>>
>> When am using JOIN operator in pig, am getting following error
>>
>> Pig joins inner plans can only have one output leaf?
>>
>> Can any one tell me, why does this occur.
>>
>> Regards
>> Abhi
Thanks for the reply , I got it
Sent from my iPhone
On Dec 27, 2012, at 2:31 PM, Jonathan Coveney wrote:
> a = load 'tab1' as (col1, col2, col3);
> b = group a by (col1, col2, col3);
> c = foreach b generate FLATTEN(group), COUNT_STAR(a);
>
>
> 2012/12/26 abhish
= load 'tab1' as (col1, col2, col3);
>> b = group a by (col1, col2, col3);
>> c = foreach b generate FLATTEN(group), COUNT_STAR(a);
>>
>>
>> 2012/12/26 abhishek
>>
>>> Hi all,
>>>
>>> How can I achieve above hive que
Hi all,
How can I achieve above hive query in pig
Create table x as select y.col1,y.col2,y.col3,count(*) as count from tab1 y
group by y.col1,y.col2,y.col3
Regards
Abhishek
Jurney http://datasyndrome.com
>
> On Dec 18, 2012, at 4:27 PM, abhishek wrote:
>
>> Directory based partition in hive.
>>
>> Partition by date
>>
>> Thanks
>> Abhi
>>
>> Sent from my iPhone
>>
>> On Dec 18, 2012, at 7:20 PM, Russell J
Hi Russell,
I will try this and get back to you.
Regards
Abhishek
Sent from my iPhone
On Dec 18, 2012, at 7:43 PM, Russell Jurney wrote:
> It will work like so:
> http://stackoverflow.com/questions/3515481/pig-latin-load-multiple-files-from-a-date-range-part-of-the-directory-st
ssell Jurney http://datasyndrome.com
>
> On Dec 18, 2012, at 4:12 PM, abhishek wrote:
>
>> Hi Russell,
>>
>> Thanks for the reply.How RCFile loader is related to partitions?
>>
>> I did not get your point in this.
>>
>> Regards
>> Abhi
>>
ta
>
> However, there is an RCFile loader in Piggybank:
> http://svn.apache.org/viewvc/pig/trunk/contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/HiveColumnarLoader.java?view=markup
>
> Russell Jurney http://datasyndrome.com
>
> On Dec 18, 2012, at 2:39 PM, a
in pig?
Regards
Abhishek
ects the join
performance, How efficient is Bloom filter compared to Replicated join.Can
Bloom filter be applied for Outer join.
Regards
Abhishek
On Tue, Oct 16, 2012 at 10:04 PM, Thejas Nair wrote:
> On 10/15/12 8:47 PM, abhishek dodda wrote:
>
>> hi all,
>>
>> I
yet made it into Pig yet; I believe a general compute
> caching framework would be very useful, and look forward to someone
> taking up that challenge..
>
> D
>
> On Fri, Oct 5, 2012 at 2:51 PM, Abhishek wrote:
>> BinStorage()
>> PigDump()
>> PigStorage()
>> Tex
n Pig you could do something like --- " (x is null ? y : z) This a
> standard ternary if/else. Don't see if the 0.00 actually plays any
> other role than the null check, right?
>
> On Fri, Oct 12, 2012 at 10:05 AM, Abhishek wrote:
>> Hi all,
>>
>> How to
Hi all,
How to Hive coalesce statement in pig
Example:
Case when
Coalesce(x,0.00)=0.00 then y else z
How to write this in pig
Regards
Abhi
Sent from my iPhone
BinStorage()
PigDump()
PigStorage()
TextLoader()
Load or storing in which of the above format.Will optimize the queries.
Can cache be any where in pig.How can the cache be use ful in pig.
Regards
Abhi
Hi all,
Did any one compare the performance of querys written in.
Hadoop-pig vs netezza
Any Benchmark tests, on small data setComparing the performance between
Hadoop-pig and netezza
Regards
Abhi
Sent from my iPhone
Thanks for the information Zhu.
Regards
abhishek
On Thu, Oct 4, 2012 at 4:35 PM, TianYi Zhu
wrote:
> Hi Abhishek,
>
> http://archive.cloudera.com/cdh4/cdh/4/pig/perf.html
> http://ofps.oreilly.com/titles/9781449302641/making_pig_fly.html
>
> On Fri, Oct 5, 2012 at 8:18 A
t;
> Setting any properties can be achieved via "set property.name value;"
-- Generally what kind of property's you override in pig grunt shell,
important properties to over ride.
Regards
Abhi
>
> D
>
>
> On Thu, Oct 4, 2012 at 4:35 PM, TianYi Zhu
>
Hi all,
I am new to pig.
In hive we can optimize the code by using
Indexing
Bucketing
Partitions
Storing the file in different formats, such as Rc file,sequence file
Overriding some property in the hive shell.
By using
Set property name = value;
Override some default property in grunt shell
hi all,
I am fairly new to pig.Can any one tell me how to write below hive
query in pig latin.
In this query iam using Cartesian join to achieve instring or contains in java.
Example
col1 -- 145678341212
col2 -- %67834%
insert into table t1
select t2.col1, t3.col2
from table2 t2
join table
53 matches
Mail list logo