Re: cross product of 2 data sets

2011-09-01 Thread Alan Gates
http://ofps.oreilly.com/titles/9781449302641/advanced_pig_latin.html search on cross matches Alan. On Sep 1, 2011, at 11:44 AM, Marc Sturlese wrote: Hey there, I would like to do the cross product of two data sets, any of them feeds in memory. I've seen pig has the cross operation. Can

Re: 回复: Can pig-0.8.1 can work with junit 4.3.1 or 4.8.1 or 4.8.2?

2011-08-22 Thread Alan Gates
When I download the Pig 0.8.1 tarball I don't find any junit class files, just a license file (which probably doesn't need to be there). If you build it it will pull those via Ivy, but I they are not in the tarball. AFAIK it will work with any Junit 4.x, but 4.5 is what we use in our testing.

Re: Research projects with Hadoop

2010-09-07 Thread Alan Gates
Luan, Pig keeps a list at http://wiki.apache.org/pig/PigJournal of all the Pig projects we know of. Many of these are more project based, but some could be turned into actual research. If you do choose one of these, please let us know (over on pig-...@hadoop.apache.org) so we can mark

Re: Why hadoop-u...@lucene.a.o ?

2010-06-18 Thread Alan Gates
Ancient history. Hadoop started as a subproject of Lucene. Alan. On Jun 17, 2010, at 10:22 PM, Otis Gospodnetic wrote: Hello, I've noticed people send emails to the following address: hadoop-u...@lucene.apache.org Why? Is this supposed to be related to common-user@hadoop.apache.org

Re: Bible Code and some input format ideas

2010-01-12 Thread Alan Gates
I'm guessing that you want to set the width of the text to avoid the issue where if you split by block, then all splits but the first will have an unknown offset. Most texts have natural divisions in them which I'm guessing you'll want to respect anyway. In the Bible this would be the

Re: map side Vs. Reduce side join

2009-07-14 Thread Alan Gates
Usually doing a join on the map side depends on exploiting some characteristic of the data (such as one input is small enough it can fit in memory and be replicated to every map, or both inputs are already sorted on the same key, or both inputs are already partitioned into same number of