Re: Sharing data in a mapper for all values

2011-10-31 Thread Anthony Urso
Arko: If you have keyed both the big blob and the input files similarly, and you can output both streams to HDFS sorted by key, then you can reformulate this whole process as a map-side join. It will be a lot simpler and more efficient than scanning the whole blob for each input. Also, do

Re: Is multiple Emits from Single Map function possible ?

2011-10-15 Thread Anthony Urso
Moving to MR-user. You will need to make a composite value class that contains the real value and a nonce that indicates which behavior is intended for the given emit. Cheers, Anthony On Sat, Oct 15, 2011 at 5:32 PM, Nachiappan A nachi...@gmail.com wrote: Hi All, I am writing a small Hadoop

Re: Map output records/reducer input records mismatch

2011-08-16 Thread Anthony Urso
Are you looking at reduce input groups or reduce input records? Are you running a combiner? On Tue, Aug 16, 2011 at 11:37 AM, Vyacheslav Zholudev vyacheslav.zholu...@gmail.com wrote: Hi, I'm having multiple hadoop jobs that use the avro mapred API. Only in one of the jobs I have a visible

Re: Creating a custom composite key

2011-08-12 Thread Anthony Urso
This is fairly common. Just write your key as a Java class, implement WritableComparable, and do the right thing with your compareTo() and hashCode()/equals() methods. SecondarySort.IntPair in the examples may be inspirational. On Fri, Aug 12, 2011 at 3:49 PM, Roger Chen rogc...@ucdavis.edu

Re: mapred.tasktracker.map.tasks.maximum is not taking into effect

2011-07-01 Thread Anthony Urso
On Fri, Jul 1, 2011 at 1:03 PM, praveen.pe...@nokia.com wrote: Hi all, I am using hadoop 0.20.2. I am setting the property mapred.tasktracker.map.tasks.maximum = 4 (same for reduce also) on my job conf but I am still seeing max of only 2 map and reduce tasks on each node. I know my machine

Predicting how many values will I see in a call to reduce?

2010-11-07 Thread Anthony Urso
Is there any way to know how many values I will see in a call to reduce without first counting through them all with the iterator? Under 0.21? 0.20? 0.19? Thanks, Anthony

Announcing Sizzle, a compiler and runtime for the Sawzall language

2010-11-05 Thread Anthony Urso
I am pleased to announce the v0.0 release of Sizzle, a compiler and runtime for the Sawzall language. Sizzle targets Hadoop directly, by compiling Sawzall programs into Hadoop job jars that can be run anywhere Hadoop is installed, without requiring a Sawzall interpreter to also be present.

Re: ClassCastException

2010-10-06 Thread Anthony Urso
Hadoop is attempting to cast a Date object to WritableComparable, which Date does not implement, and is causing that exception. Your keys must implement WritableComparable and your values must implement Comparable. On Wed, Oct 6, 2010 at 8:02 PM, Johannes.Lichtenberger