Re: Congratulations to Cheolsoo Park the new Apache Pig project chair

2014-03-20 Thread Aniket Mokashi
Woo!! Congrats Cheolsoo... On Thu, Mar 20, 2014 at 4:25 AM, Rohini Palaniswamy rohini.adi...@gmail.com wrote: Thanks Julien. Great job last year. Congratulations, Cheolsoo!!! Well deserved. Great job past 2 years with awesome number of commits and reviews. On Thu, Mar 20, 2014 at 2:07

Re: Welcome to the new Pig PMC member Aniket Mokashi

2014-01-25 Thread Aniket Mokashi
: Congrats Aniket! On Tue, Jan 14, 2014 at 7:01 PM, Jarek Jarcec Cecho jar...@apache.org wrote: Congratulations Aniket, good work! Jarcec On Tue, Jan 14, 2014 at 06:52:10PM -0800, JULIEN LE DEM wrote: It's my pleasure to announce that Aniket Mokashi became

Re: ERROR: java.lang.Long cannot be cast to java.lang.String

2013-08-18 Thread Aniket Mokashi
Hi Sonia, Try adding another pair of parenthesis- eg- ((int)(RegexMatch((chararray) genre_id, '\\d+')) == 1 ? (chararray)genre_id : '-1001') as genre_id On Thu, Aug 15, 2013 at 4:28 PM, sonia gehlot sonia.geh...@gmail.comwrote: Hi, I have pigscript in which I am flattening it and assign

Re: Pig and Storm

2013-07-24 Thread Aniket Mokashi
Following projects might interest you: Pig and Spark: https://github.com/twitter/pig/tree/spork Storm and Hadoop: https://speakerdeck.com/sritchie/summingbird-streaming-mapreduce-at-twitter Thanks, Aniket On Tue, Jul 23, 2013 at 11:18 PM, Russell Jurney russell.jur...@gmail.comwrote: I think

Re: Replicated Join and OOM errors

2013-07-21 Thread Aniket Mokashi
Pig does not currently have a way to do this. The development of feature like this is tracked at - https://issues.apache.org/jira/browse/PIG-2784. Feel free to add a subtask and take a stab at it. ~Aniket On Fri, Jul 19, 2013 at 12:58 PM, Mehmet Tepedelenlioglu mehmets...@yahoo.com wrote:

Re: Single Output file from STORE command

2013-06-03 Thread Aniket Mokashi
You can use pig to do what hadoop fs -getmerge is doing in a separate pig script. It will still be one reducer though. On Tue, May 28, 2013 at 8:29 AM, Alan Gates ga...@hortonworks.com wrote: Nothing that uses MapReduce as an underlying execution engine creates a single file when running

Re: Pig architecture explanation?

2013-03-21 Thread Aniket Mokashi
Also- https://cwiki.apache.org/confluence/display/PIG/Guide+for+new+contributors ~Aniket On Sun, Mar 17, 2013 at 4:37 PM, Prashant Kommireddi prash1...@gmail.comwrote: Hi Gardner, This paper would be a good starting point http://infolab.stanford.edu/~olston/publications/vldb09.pdf

Re: Unable to upload to S3

2013-03-04 Thread Aniket Mokashi
What's BCCKIAJV5KGMZVA:xmw5F7I4AWd6rDRA@? To work with S3- 1. Your path should be - s3n://bucket-name/key 2. Have your aws keys in core-site.xml On Mon, Mar 4, 2013 at 3:32 PM, Mohit Anchlia mohitanch...@gmail.comwrote: I am trying to upload to S3 using pig but I get: grunt store A

Fwd: Replicated join: is there a setting to make this better?

2013-02-21 Thread Aniket Mokashi
I think the email was filtered out. Resending. -- Forwarded message -- From: Aniket Mokashi aniket...@gmail.com Date: Wed, Feb 20, 2013 at 1:18 PM Subject: Replicated join: is there a setting to make this better? To: d...@pig.apache.org d...@pig.apache.org Hi devs, I

Re: [ANNOUNCE] Welcome Bill Graham to join Pig PMC

2013-02-20 Thread Aniket Mokashi
Congrats Bill !! On Wed, Feb 20, 2013 at 9:44 AM, Julien Le Dem jul...@twitter.com wrote: Congrats! On Wed, Feb 20, 2013 at 6:45 AM, Gianmarco De Francisci Morales g...@gdfm.me wrote: Congrats Bill! :) -- Gianmarco On Wed, Feb 20, 2013 at 10:00 AM, Jonathan Coveney

Re: [ANNOUNCE] Welcome new Apache Pig Committers Rohini Palaniswamy

2012-10-31 Thread Aniket Mokashi
Congrats Rohini... On Mon, Oct 29, 2012 at 11:31 AM, Julien Le Dem jul...@twitter.com wrote: Congrats Rohini ! On Sun, Oct 28, 2012 at 9:42 AM, Bill Graham billgra...@gmail.com wrote: Congrats Rohini! Great news indeed. On Saturday, October 27, 2012, Jon Coveney wrote: Wonderful

Re: Welcome our newest committer Cheolsoo Park

2012-10-31 Thread Aniket Mokashi
Congrats Cheolsoo... On Fri, Oct 26, 2012 at 4:26 PM, Santhosh M S santhosh_mut...@yahoo.comwrote: Congratulations Cheolsoo! Looking forward to more from you. Regards, Santhosh From: Julien Le Dem jul...@twitter.com To: d...@pig.apache.org;

Re: Reading BytesWritable in sequence file

2012-09-14 Thread Aniket Mokashi
For a simpler use case, something similar to following should work- public class PigSequenceFileLoader extends PigStorage { @SuppressWarnings(rawtypes) @Override public InputFormat getInputFormat() { return new SequenceFileInputFormatByteWritable, Text(); } } Thanks, Aniket On Thu, Sep

Re: Input and output path

2012-09-14 Thread Aniket Mokashi
You can do something similar to - https://cwiki.apache.org/PIG/faq.html#FAQ-Q%253AIloaddatafromadirectorywhichcontainsdifferentfile.HowdoIfindoutwherethedatacomesfrom%253F Get input path from pig and then substitute the values for date, hour etc. You have to also override getSchema method so that

Re: Counters from Python UDF

2012-08-24 Thread Aniket Mokashi
for the DEV list but ... Is it even possible / feasible. Could it be done by calling the Java classes from within Jython? I guess I would ask the same about algebraic and accumulator UDF which I know are available in Ruby. -Original Message- From: Aniket Mokashi [mailto:aniket

Re: https://cwiki.apache.org/PIG/how-to-set-up-eclipse-environment.html did not work for eclipse on windows

2012-08-14 Thread Aniket Mokashi
I remember debugging this earlier. It looks like grunt gets EOF on windows machine. I am not sure why either. Thanks, Aniket On Fri, Aug 10, 2012 at 3:25 AM, lulynn_2008 lulynn_2...@163.com wrote: Hi, I can run pig main successfully in eclipse on linux. But I find I can not run pig main in

Re: Import libraries in Jython UDFs

2012-07-25 Thread Aniket Mokashi
/browse/PIG-2665 -Chun On 7/23/12 11:26 PM, Russell Jurney russell.jur...@gmail.com wrote: ls /me/jython2.5.2/Lib/ tons of class files... email/ This is in local mode, atm. I add this directory to my java classpath, check. On Mon, Jul 23, 2012 at 11:10 PM, Aniket Mokashi

Re: Pig out of memory error

2012-06-18 Thread Aniket Mokashi
export HADOOP_HEAPSIZE=something more than what it is right now Thanks, Aniket On Sun, Jun 17, 2012 at 11:16 PM, Pankaj Gupta pan...@brightroll.comwrote: Hi, I am getting an out of memory error while running Pig. I am running a pretty big job with one master node and over 100 worker nodes.

Re: Number of reduce tasks

2012-06-18 Thread Aniket Mokashi
Pankaj, are you using hcatalog? On Fri, Jun 1, 2012 at 5:24 PM, Prashant Kommireddi prash1...@gmail.comwrote: Right. And the documentation provides a list of operations that can be parallelized. On Jun 1, 2012, at 4:50 PM, Dmitriy Ryaboy dvrya...@gmail.com wrote: That being said, some

Re: Copying files to Amazon S3 using Pig is slow

2012-06-08 Thread Aniket Mokashi
http://docs.amazonwebservices.com/ElasticMapReduce/latest/DeveloperGuide/UsingEMR_s3distcp.html On Fri, Jun 8, 2012 at 4:40 AM, James Newhaven james.newha...@gmail.comwrote: I want to copy 26,000 HDFS files generated by a pig script to Amazon S3. I am using the copyToLocal command, but I

Re: While/CROSS/FOREACH loop

2012-05-25 Thread Aniket Mokashi
This might be helpful for this use case - http://hortonworks.com/blog/new-apache-pig-features-part-2-embedding/ On Tue, May 22, 2012 at 11:31 PM, Russell Jurney russell.jur...@gmail.comwrote: I need to repeatedly CROSS a data set, then FOREACH it, reduce it with a filter, then group/test it to

Re: Load Pig metadata from file?

2012-05-15 Thread Aniket Mokashi
I think you need to play with some quotes, its more likely a bash problem. one way to debug is bash -x pig -f script.pig -param md=$(cat metadata.dat) and check what does hadoop jar gets in the end. try - md=$(cat metadata.dat) or -md='$(cat metadata.dat)' (single quote inside double quote and

Re: Exploding a Hive arraystring in Pig from an RCFile

2012-04-12 Thread Aniket Mokashi
Hi Malcolm, arrays are converted to tuples and flatten should directly work on it. I think you need not worry about the delimiter (assuming hive knows how to deserialize it). Btw, does RCFile require delimiter to store arrays? I am not sure about that. Thanks, Aniket On Wed, Apr 11, 2012 at

Re: Welcome Pig's newest committer, Bill Graham!

2012-04-05 Thread Aniket Mokashi
Congrats Bill... On Thu, Apr 5, 2012 at 3:04 PM, Prashant Kommireddi prash1...@gmail.comwrote: Congrats Bill. Sent from my iPhone On Apr 5, 2012, at 2:55 PM, Dmitriy Ryaboy dvrya...@gmail.com wrote: Hi all, On behalf of the Pig PMC, I'm very happy to announce that Bill Graham has

Re: [ANNOUNCE] Welcome new Apache Pig Committers and PMC members

2012-03-20 Thread Aniket Mokashi
Congrats Jonathan and Julien! :) On Mon, Mar 19, 2012 at 6:36 PM, Russell Jurney russell.jur...@gmail.comwrote: congratulations! On Mon, Mar 19, 2012 at 5:03 PM, Daniel Dai da...@hortonworks.com wrote: Pig users and developers, The Apache Pig PMCs is pleased to announce the new

Re: python modules

2012-03-12 Thread Aniket Mokashi
I spent some time debugging this. The reason is -- Sys.path on TT for jython is - ['__classpath__', '__pyclasspath__/'] And for client is ['', '/users/lib/Lib', '/users/lib/jython_simplejson.jar/Lib', '__classpath__', '__pyclasspath__/'] I am still figuring out why CLASSPATH (java.class.path

Re: python modules

2012-03-12 Thread Aniket Mokashi
This looks like a bug to me. Jython cuts out jython.jar location from classpath and appends Lib to it. But, in general on TT jython,jar is not available and its merged into job.jar by pig. Hence, imports will always fail. ~Aniket On Mon, Mar 12, 2012 at 12:54 AM, Aniket Mokashi aniket

Re: how to set one var equal to another

2012-03-10 Thread Aniket Mokashi
Hi Colleen, I'm not sure whats your use case, but you may want to watch https://issues.apache.org/jira/browse/PIG-438. Thanks, Aniket On Sat, Mar 10, 2012 at 11:33 AM, Jonathan Coveney jcove...@gmail.comwrote: It's important to remember that the aliases to the left of the equals are not

Re: View Map-Reduce payload

2012-03-06 Thread Aniket Mokashi
http://pig.apache.org/docs/r0.7.0/piglatin_ref2.html#EXPLAIN On Tue, Mar 6, 2012 at 5:28 AM, shan shan mysub...@gmail.com wrote: Hi Can I see the user-payload for the MapReduce job that is created by Pig. How? i.e. the Map and Reduce function code that is generated by Pig script.. Thanks,

Re: Scalars can only be used within projections

2012-03-01 Thread Aniket Mokashi
I think you are looking for- C = join FILTERED_A by key1, B by key1; C1 = filter C by some condition; if key1 equality is not your join condition, you may have to go for a CROSS. Thanks, Aniket On Thu, Mar 1, 2012 at 4:26 AM, mete efk...@gmail.com wrote: Hello folks, i am new to pig-latin

Re: Jython UDF problem

2012-02-05 Thread Aniket Mokashi
Looks like this is jython bug. Btw, afaik, the return type of this function would be a bytearray if decorator is not specified. Thanks, Aniket On Sat, Feb 4, 2012 at 9:39 PM, Russell Jurney russell.jur...@gmail.comwrote: Why am I having tuple objects in my python udfs? This isn't how the

Re: LOWER

2012-02-04 Thread Aniket Mokashi
I think pig UDFs are just classnames (case sensitive, LOWER is all capitals in built-in). Are you suggesting to add something like function registry to pig? That would be a good idea. As a workaround (or solution), we have pigrc and pigbootup to rename functions. Thanks, Aniket On Sat, Feb 4,

Re: Concat multiple strings

2012-01-22 Thread Aniket Mokashi
Alan, I just noticed its Pig 0.8 and later. https://issues.apache.org/jira/browse/PIG-1420 Am I missing something? Thanks, Aniket On Thu, Jan 19, 2012 at 8:04 AM, Alan Gates ga...@hortonworks.com wrote: In Pig 0.9 and later CONCAT accepts more than two strings or bytearrays. Alan. On Jan

Re: Concat multiple strings

2012-01-22 Thread Aniket Mokashi
in 0.8 had an issue. Thanks, Prashant Sent from my iPhone On Jan 22, 2012, at 5:44 PM, Aniket Mokashi aniket...@gmail.com wrote: Alan, I just noticed its Pig 0.8 and later. https://issues.apache.org/jira/browse/PIG-1420 Am I missing something? Thanks, Aniket On Thu, Jan 19, 2012

Re: getWrappedSplit() is incorrectly returning the first split

2012-01-09 Thread Aniket Mokashi
Thanks so much for finding this out. I was using @Override public void prepareToRead(@SuppressWarnings(rawtypes) RecordReaderreader, PigSplit split) throws IOException { this.in = reader; partValues = ((DataovenSplit)split.getWrappedSplit()).getPartitionInfo().getPartitionValues(); in

Re: getWrappedSplit() is incorrectly returning the first split

2012-01-09 Thread Aniket Mokashi
on. 2012/1/9 Prashant Kommireddi prash1...@gmail.com Is this critical enough to make it back into 0.9.1? -Prashant On Mon, Jan 9, 2012 at 4:44 PM, Aniket Mokashi aniket...@gmail.com wrote: Thanks so much for finding this out. I was using @Override

Re: Choosing output directory based on field value

2012-01-09 Thread Aniket Mokashi
Pig has MultiStorage in piggybank. https://github.com/apache/pig/blob/trunk/contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/MultiStorage.java I think it has some limitation. You can check the javadoc/jiras for it. Thanks, Aniket On Mon, Jan 9, 2012 at 10:21 PM, IGZ Nick

Re: macros and global variables

2011-12-27 Thread Aniket Mokashi
. Thanks, Daniel On Thu, Dec 22, 2011 at 3:39 PM, Aniket Mokashi aniket...@gmail.com wrote: Hi, I was wondering if there is a place to store common macros and global parameters in pig (pigrc?). This should be available to all the users accessing pig via grunt or script. Please let

Re: Possible Pig 9.1 globing bug in parameter substitution

2011-12-27 Thread Aniket Mokashi
I tried pig --param input=s3n://bucket_path/*/ test.pig It worked for me. I am on EMR Pig 0.9.1. Thanks, Aniket On Tue, Dec 27, 2011 at 3:35 PM, Corbin Hoenes cor...@tynt.com wrote: I am not sure Ayon doesn't have something here. I am seeing a similar problem with the 0.9.1 build of pig.

macros and global variables

2011-12-22 Thread Aniket Mokashi
Hi, I was wondering if there is a place to store common macros and global parameters in pig (pigrc?). This should be available to all the users accessing pig via grunt or script. Please let me know if you have any pointers. Thanks, Aniket

Re: My notes for running Pig from EC2 to EMR

2011-12-16 Thread Aniket Mokashi
Amazon supports pig 0.9.1 now. Take a look- http://aws.amazon.com/releasenotes/Elastic-MapReduce/1044996466833146 Also, I am not very sure about copying EMR jars to EC2. You should check that with Amazon. Thanks, Aniket On Fri, Dec 16, 2011 at 12:02 PM, Ayon Sinha ayonsi...@yahoo.com wrote:

Re: Pig counters with PigServer strange behavior

2011-12-06 Thread Aniket Mokashi
There is a good blog article on this- http://squarecog.wordpress.com/2010/12/24/incrementing-hadoop-counters-in-apache-pig/ Thanks, Aniket On Tue, Dec 6, 2011 at 1:49 PM, Charles Menguy cmen...@proclivitysystems.com wrote: Hi All, I'm trying to play with counters with PigServer and have a

Re: Flatten a bag to a specific datatype

2011-06-22 Thread Aniket Mokashi
Hi, I think UDF BagToTuple should do it for you. From some old email thread, I find (I think you will have to change getBagField to get etc)-- public class BagToTuple extends EvalFuncTuple{ @Override public void exec(Tuple input, Tuple output) throws IOException{ DataBag bag =

Re: viewing current relationships loaded on the grunt shell

2011-06-13 Thread Aniket Mokashi
Hi Jeremy, If I understand it correctly, you would want to get a list of all the aliases loaded in the grunt. Is there a use case for this scenario/command? .pig_history can fetch you the last few commands fired on the grunt. Also, explaining the most dependent alias would fetch you all the

Re: Filter on contents of other dataset

2011-04-14 Thread Aniket Mokashi
' as (hkey:chararray, hdata:chararray); filtered = FILTER huge BY my_udf(hkey, hdata); Where my_udf returns true if there exists some skey in smalldata for which F(hdata, skey) is true - as you defined. Regards, Mridul On Friday 15 April 2011 08:51 AM, Aniket Mokashi wrote: Hi, What

Re: CDH3 fail python udf

2011-04-01 Thread Aniket Mokashi
and myudf.py in classpath and also register jython.jar in our pig script. It worked well before the upgrading, only failed after. Regards, Shawn On Thu, Mar 31, 2011 at 4:38 PM, Aniket Mokashi amoka...@andrew.cmu.edu wrote: I think this might be because when you start in hadoop mode, your

Re: CDH3 fail python udf

2011-03-31 Thread Aniket Mokashi
I think this might be because when you start in hadoop mode, your classpath configuration does not have jython.jar. Can you put that explicitly in classpath and check it out? Thanks, Aniket On Thu, March 31, 2011 6:07 pm, Xiaomeng Wan wrote: Hi, We recently updated our hadoop from CDH2 to

Re: UDF problem: Java Heap space

2011-02-24 Thread Aniket Mokashi
at 7:49 PM, Aniket Mokashi amoka...@andrew.cmu.eduwrote: I ve written a simple UDF that parses a chararray (which looks like ...[a].[b]...[a]...) to capture stuff inside brackets and return them as String a=2;b=1; and so on. The input chararray are rarely more than 1000 characters

Re: UDF problem: Java Heap space

2011-02-24 Thread Aniket Mokashi
and the loop would continue always. Thanks again, Aniket On Thu, February 24, 2011 7:47 pm, Aniket Mokashi wrote: This is a map side udf. pig script loads a log file and grabs contents inside angle brackets. a = load; b = foreach a generate F(a); dump b; I see following on tasktrackers- 2011

UDF problem: Java Heap space

2011-02-23 Thread Aniket Mokashi
I ve written a simple UDF that parses a chararray (which looks like ...[a].[b]...[a]...) to capture stuff inside brackets and return them as String a=2;b=1; and so on. The input chararray are rarely more than 1000 characters and are not more than 10 (I ve added log.warn in my udf to ensure