Woo!! Congrats Cheolsoo...
On Thu, Mar 20, 2014 at 4:25 AM, Rohini Palaniswamy rohini.adi...@gmail.com
wrote:
Thanks Julien. Great job last year.
Congratulations, Cheolsoo!!! Well deserved. Great job past 2 years with
awesome number of commits and reviews.
On Thu, Mar 20, 2014 at 2:07
:
Congrats Aniket!
On Tue, Jan 14, 2014 at 7:01 PM, Jarek Jarcec Cecho
jar...@apache.org
wrote:
Congratulations Aniket, good work!
Jarcec
On Tue, Jan 14, 2014 at 06:52:10PM -0800, JULIEN LE DEM wrote:
It's my pleasure to announce that Aniket Mokashi became
Hi Sonia,
Try adding another pair of parenthesis-
eg-
((int)(RegexMatch((chararray) genre_id, '\\d+')) == 1 ? (chararray)genre_id
:
'-1001') as genre_id
On Thu, Aug 15, 2013 at 4:28 PM, sonia gehlot sonia.geh...@gmail.comwrote:
Hi,
I have pigscript in which I am flattening it and assign
Following projects might interest you:
Pig and Spark: https://github.com/twitter/pig/tree/spork
Storm and Hadoop:
https://speakerdeck.com/sritchie/summingbird-streaming-mapreduce-at-twitter
Thanks,
Aniket
On Tue, Jul 23, 2013 at 11:18 PM, Russell Jurney
russell.jur...@gmail.comwrote:
I think
Pig does not currently have a way to do this. The development of feature
like this is tracked at - https://issues.apache.org/jira/browse/PIG-2784.
Feel free to add a subtask and take a stab at it.
~Aniket
On Fri, Jul 19, 2013 at 12:58 PM, Mehmet Tepedelenlioglu
mehmets...@yahoo.com wrote:
You can use pig to do what hadoop fs -getmerge is doing in a separate pig
script. It will still be one reducer though.
On Tue, May 28, 2013 at 8:29 AM, Alan Gates ga...@hortonworks.com wrote:
Nothing that uses MapReduce as an underlying execution engine creates a
single file when running
Also-
https://cwiki.apache.org/confluence/display/PIG/Guide+for+new+contributors
~Aniket
On Sun, Mar 17, 2013 at 4:37 PM, Prashant Kommireddi prash1...@gmail.comwrote:
Hi Gardner,
This paper would be a good starting point
http://infolab.stanford.edu/~olston/publications/vldb09.pdf
What's BCCKIAJV5KGMZVA:xmw5F7I4AWd6rDRA@?
To work with S3-
1. Your path should be - s3n://bucket-name/key
2. Have your aws keys in core-site.xml
On Mon, Mar 4, 2013 at 3:32 PM, Mohit Anchlia mohitanch...@gmail.comwrote:
I am trying to upload to S3 using pig but I get:
grunt store A
I think the email was filtered out. Resending.
-- Forwarded message --
From: Aniket Mokashi aniket...@gmail.com
Date: Wed, Feb 20, 2013 at 1:18 PM
Subject: Replicated join: is there a setting to make this better?
To: d...@pig.apache.org d...@pig.apache.org
Hi devs,
I
Congrats Bill !!
On Wed, Feb 20, 2013 at 9:44 AM, Julien Le Dem jul...@twitter.com wrote:
Congrats!
On Wed, Feb 20, 2013 at 6:45 AM, Gianmarco De Francisci Morales
g...@gdfm.me wrote:
Congrats Bill! :)
--
Gianmarco
On Wed, Feb 20, 2013 at 10:00 AM, Jonathan Coveney
Congrats Rohini...
On Mon, Oct 29, 2012 at 11:31 AM, Julien Le Dem jul...@twitter.com wrote:
Congrats Rohini !
On Sun, Oct 28, 2012 at 9:42 AM, Bill Graham billgra...@gmail.com wrote:
Congrats Rohini! Great news indeed.
On Saturday, October 27, 2012, Jon Coveney wrote:
Wonderful
Congrats Cheolsoo...
On Fri, Oct 26, 2012 at 4:26 PM, Santhosh M S santhosh_mut...@yahoo.comwrote:
Congratulations Cheolsoo! Looking forward to more from you.
Regards,
Santhosh
From: Julien Le Dem jul...@twitter.com
To: d...@pig.apache.org;
For a simpler use case, something similar to following should work-
public class PigSequenceFileLoader extends PigStorage {
@SuppressWarnings(rawtypes)
@Override
public InputFormat getInputFormat() {
return new SequenceFileInputFormatByteWritable, Text();
}
}
Thanks,
Aniket
On Thu, Sep
You can do something similar to -
https://cwiki.apache.org/PIG/faq.html#FAQ-Q%253AIloaddatafromadirectorywhichcontainsdifferentfile.HowdoIfindoutwherethedatacomesfrom%253F
Get input path from pig and then substitute the values for date, hour etc.
You have to also override getSchema method so that
for the DEV list but ... Is it even
possible
/ feasible. Could it be done by calling the Java classes from within
Jython?
I guess I would ask the same about algebraic and accumulator UDF which I
know are available in Ruby.
-Original Message-
From: Aniket Mokashi [mailto:aniket
I remember debugging this earlier. It looks like grunt gets EOF on windows
machine. I am not sure why either.
Thanks,
Aniket
On Fri, Aug 10, 2012 at 3:25 AM, lulynn_2008 lulynn_2...@163.com wrote:
Hi,
I can run pig main successfully in eclipse on linux. But I find I can not
run pig main in
/browse/PIG-2665
-Chun
On 7/23/12 11:26 PM, Russell Jurney russell.jur...@gmail.com wrote:
ls /me/jython2.5.2/Lib/
tons of class files...
email/
This is in local mode, atm. I add this directory to my java classpath,
check.
On Mon, Jul 23, 2012 at 11:10 PM, Aniket Mokashi
export HADOOP_HEAPSIZE=something more than what it is right now
Thanks,
Aniket
On Sun, Jun 17, 2012 at 11:16 PM, Pankaj Gupta pan...@brightroll.comwrote:
Hi,
I am getting an out of memory error while running Pig. I am running a
pretty big job with one master node and over 100 worker nodes.
Pankaj, are you using hcatalog?
On Fri, Jun 1, 2012 at 5:24 PM, Prashant Kommireddi prash1...@gmail.comwrote:
Right. And the documentation provides a list of operations that can be
parallelized.
On Jun 1, 2012, at 4:50 PM, Dmitriy Ryaboy dvrya...@gmail.com wrote:
That being said, some
http://docs.amazonwebservices.com/ElasticMapReduce/latest/DeveloperGuide/UsingEMR_s3distcp.html
On Fri, Jun 8, 2012 at 4:40 AM, James Newhaven james.newha...@gmail.comwrote:
I want to copy 26,000 HDFS files generated by a pig script to Amazon S3.
I am using the copyToLocal command, but I
This might be helpful for this use case -
http://hortonworks.com/blog/new-apache-pig-features-part-2-embedding/
On Tue, May 22, 2012 at 11:31 PM, Russell Jurney
russell.jur...@gmail.comwrote:
I need to repeatedly CROSS a data set, then FOREACH it, reduce it with
a filter, then group/test it to
I think you need to play with some quotes, its more likely a bash problem.
one way to debug is bash -x pig -f script.pig -param md=$(cat
metadata.dat) and check what does hadoop jar gets in the end.
try - md=$(cat metadata.dat)
or -md='$(cat metadata.dat)' (single quote inside double quote
and
Hi Malcolm,
arrays are converted to tuples and flatten should directly work on it. I
think you need not worry about the delimiter (assuming hive knows how to
deserialize it). Btw, does RCFile require delimiter to store arrays? I am
not sure about that.
Thanks,
Aniket
On Wed, Apr 11, 2012 at
Congrats Bill...
On Thu, Apr 5, 2012 at 3:04 PM, Prashant Kommireddi prash1...@gmail.comwrote:
Congrats Bill.
Sent from my iPhone
On Apr 5, 2012, at 2:55 PM, Dmitriy Ryaboy dvrya...@gmail.com wrote:
Hi all,
On behalf of the Pig PMC, I'm very happy to announce that Bill Graham
has
Congrats Jonathan and Julien! :)
On Mon, Mar 19, 2012 at 6:36 PM, Russell Jurney russell.jur...@gmail.comwrote:
congratulations!
On Mon, Mar 19, 2012 at 5:03 PM, Daniel Dai da...@hortonworks.com wrote:
Pig users and developers,
The Apache Pig PMCs is pleased to announce the new
I spent some time debugging this. The reason is --
Sys.path on TT for jython is - ['__classpath__', '__pyclasspath__/']
And for client is ['', '/users/lib/Lib',
'/users/lib/jython_simplejson.jar/Lib', '__classpath__', '__pyclasspath__/']
I am still figuring out why CLASSPATH (java.class.path
This looks like a bug to me. Jython cuts out jython.jar location from
classpath and appends Lib to it. But, in general on TT jython,jar is not
available and its merged into job.jar by pig. Hence, imports will always
fail.
~Aniket
On Mon, Mar 12, 2012 at 12:54 AM, Aniket Mokashi aniket
Hi Colleen,
I'm not sure whats your use case, but you may want to watch
https://issues.apache.org/jira/browse/PIG-438.
Thanks,
Aniket
On Sat, Mar 10, 2012 at 11:33 AM, Jonathan Coveney jcove...@gmail.comwrote:
It's important to remember that the aliases to the left of the equals are
not
http://pig.apache.org/docs/r0.7.0/piglatin_ref2.html#EXPLAIN
On Tue, Mar 6, 2012 at 5:28 AM, shan shan mysub...@gmail.com wrote:
Hi
Can I see the user-payload for the MapReduce job that is created by Pig.
How?
i.e. the Map and Reduce function code that is generated by Pig script..
Thanks,
I think you are looking for-
C = join FILTERED_A by key1, B by key1;
C1 = filter C by some condition;
if key1 equality is not your join condition, you may have to go for a CROSS.
Thanks,
Aniket
On Thu, Mar 1, 2012 at 4:26 AM, mete efk...@gmail.com wrote:
Hello folks,
i am new to pig-latin
Looks like this is jython bug.
Btw, afaik, the return type of this function would be a bytearray if
decorator is not specified.
Thanks,
Aniket
On Sat, Feb 4, 2012 at 9:39 PM, Russell Jurney russell.jur...@gmail.comwrote:
Why am I having tuple objects in my python udfs? This isn't how the
I think pig UDFs are just classnames (case sensitive, LOWER is all capitals
in built-in). Are you suggesting to add something like function registry to
pig? That would be a good idea. As a workaround (or solution), we have
pigrc and pigbootup to rename functions.
Thanks,
Aniket
On Sat, Feb 4,
Alan, I just noticed its Pig 0.8 and later.
https://issues.apache.org/jira/browse/PIG-1420
Am I missing something?
Thanks,
Aniket
On Thu, Jan 19, 2012 at 8:04 AM, Alan Gates ga...@hortonworks.com wrote:
In Pig 0.9 and later CONCAT accepts more than two strings or bytearrays.
Alan.
On Jan
in 0.8 had an issue.
Thanks,
Prashant
Sent from my iPhone
On Jan 22, 2012, at 5:44 PM, Aniket Mokashi aniket...@gmail.com wrote:
Alan, I just noticed its Pig 0.8 and later.
https://issues.apache.org/jira/browse/PIG-1420
Am I missing something?
Thanks,
Aniket
On Thu, Jan 19, 2012
Thanks so much for finding this out.
I was using
@Override
public void prepareToRead(@SuppressWarnings(rawtypes)
RecordReaderreader, PigSplit split)
throws IOException {
this.in = reader;
partValues =
((DataovenSplit)split.getWrappedSplit()).getPartitionInfo().getPartitionValues();
in
on.
2012/1/9 Prashant Kommireddi prash1...@gmail.com
Is this critical enough to make it back into 0.9.1?
-Prashant
On Mon, Jan 9, 2012 at 4:44 PM, Aniket Mokashi aniket...@gmail.com
wrote:
Thanks so much for finding this out.
I was using
@Override
Pig has MultiStorage in piggybank.
https://github.com/apache/pig/blob/trunk/contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/MultiStorage.java
I think it has some limitation. You can check the javadoc/jiras for it.
Thanks,
Aniket
On Mon, Jan 9, 2012 at 10:21 PM, IGZ Nick
.
Thanks,
Daniel
On Thu, Dec 22, 2011 at 3:39 PM, Aniket Mokashi aniket...@gmail.com
wrote:
Hi,
I was wondering if there is a place to store common macros and global
parameters in pig (pigrc?). This should be available to all the users
accessing pig via grunt or script.
Please let
I tried
pig --param input=s3n://bucket_path/*/ test.pig
It worked for me. I am on EMR Pig 0.9.1.
Thanks,
Aniket
On Tue, Dec 27, 2011 at 3:35 PM, Corbin Hoenes cor...@tynt.com wrote:
I am not sure Ayon doesn't have something here. I am seeing a similar
problem with the 0.9.1 build of pig.
Hi,
I was wondering if there is a place to store common macros and global
parameters in pig (pigrc?). This should be available to all the users
accessing pig via grunt or script.
Please let me know if you have any pointers.
Thanks,
Aniket
Amazon supports pig 0.9.1 now. Take a look-
http://aws.amazon.com/releasenotes/Elastic-MapReduce/1044996466833146
Also, I am not very sure about copying EMR jars to EC2. You should check
that with Amazon.
Thanks,
Aniket
On Fri, Dec 16, 2011 at 12:02 PM, Ayon Sinha ayonsi...@yahoo.com wrote:
There is a good blog article on this-
http://squarecog.wordpress.com/2010/12/24/incrementing-hadoop-counters-in-apache-pig/
Thanks,
Aniket
On Tue, Dec 6, 2011 at 1:49 PM, Charles Menguy
cmen...@proclivitysystems.com wrote:
Hi All,
I'm trying to play with counters with PigServer and have a
Hi,
I think UDF BagToTuple should do it for you.
From some old email thread, I find (I think you will have to change
getBagField to get etc)--
public class BagToTuple extends EvalFuncTuple{
@Override
public void exec(Tuple input, Tuple output) throws IOException{
DataBag bag =
Hi Jeremy,
If I understand it correctly, you would want to get a list of all the
aliases loaded in the grunt. Is there a use case for this scenario/command?
.pig_history can fetch you the last few commands fired on the grunt. Also,
explaining the most dependent alias would fetch you all the
' as (hkey:chararray, hdata:chararray); filtered =
FILTER huge BY my_udf(hkey, hdata);
Where my_udf returns true if there exists some skey in smalldata for
which F(hdata, skey) is true - as you defined.
Regards,
Mridul
On Friday 15 April 2011 08:51 AM, Aniket Mokashi wrote:
Hi,
What
and myudf.py in classpath and also register
jython.jar in our pig script. It worked well before the upgrading, only
failed after.
Regards,
Shawn
On Thu, Mar 31, 2011 at 4:38 PM, Aniket Mokashi amoka...@andrew.cmu.edu
wrote:
I think this might be because when you start in hadoop mode, your
I think this might be because when you start in hadoop mode, your
classpath configuration does not have jython.jar. Can you put that
explicitly in classpath and check it out?
Thanks,
Aniket
On Thu, March 31, 2011 6:07 pm, Xiaomeng Wan wrote:
Hi,
We recently updated our hadoop from CDH2 to
at 7:49 PM, Aniket Mokashi
amoka...@andrew.cmu.eduwrote:
I ve written a simple UDF that parses a chararray (which looks like
...[a].[b]...[a]...) to capture stuff inside brackets and return
them as String a=2;b=1; and so on. The input chararray are rarely more
than 1000 characters
and the loop would continue always.
Thanks again,
Aniket
On Thu, February 24, 2011 7:47 pm, Aniket Mokashi wrote:
This is a map side udf.
pig script loads a log file and grabs contents inside angle brackets. a =
load; b = foreach a generate F(a); dump b;
I see following on tasktrackers-
2011
I ve written a simple UDF that parses a chararray (which looks like
...[a].[b]...[a]...) to capture stuff inside brackets and return them
as String a=2;b=1; and so on. The input chararray are rarely more than
1000 characters and are not more than 10 (I ve added log.warn in my
udf to ensure
50 matches
Mail list logo