Re: jython not working in cluster mode

2011-06-06 Thread Mridul Muralidharan
You might want to raise a JIRA on this - both abs and rel paths should be supported ... Regards, Mridul On Friday 03 June 2011 11:15 PM, Daniel Eklund wrote: Shawn... excellent!.. thank you. it worked. interestingly, I remember having to use the absolute path in local mode daniel On

Re: release strategy

2011-06-06 Thread Alan Gates
I like 0.9.0 over beta. The code has undergone a lot of testing, just not as much as previous x.0 releases. My other concern is that in the future we may end up with beta2 and beta3 releases, and with arguments about whether a given release is a beta or ga, and what makes a release beta bs ga (t

Pig meetup after the Hadoop summit

2011-06-06 Thread Alan Gates
I've created a meetup at http://www.meetup.com/PigUser/events/ 21215831/ for the Pig user meetup on 6/30, the day after the Hadoop summit. We already have some great discussions lined up on Elephant Bird, embedding Pig in Python, and integrating Pig and Cassandra. There will also be time f

Fwd: Travel Assistance applications now open for ApacheCon NA 2011

2011-06-06 Thread Alan Gates
Begin forwarded message: From: Gavin McDonald Date: June 6, 2011 1:03:24 AM PDT To: "p...@apache.org" Subject: Travel Assistance applications now open for ApacheCon NA 2011 Reply-To: "priv...@incubator.apache.org" , "ga...@16degrees.com.au" > Hi PMC folks, could you please kindly redist

Re: Finding near duplicates in data set

2011-06-06 Thread Gianmarco
I can give you a (biased) reference on how to do it with Hadoop. *Document Similarity Self-Join with MapReduce*. http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=5694030 In theory it is doable also with Pig. You just need to implement the right partitioners and UDFs. Cheers, -- Gianmarco De Fr