Re: How do I sum by Key in the Reduce Phase AND keep the initial value

2010-01-12 Thread Amogh Vasekar
Hi Stephen, I'm pretty sure the re-iterable reducer works by storing in memory and spilling to disk once a certain threshold limit is reached. I'm don't know how they decide the limit though ( probably a parameter like io.sort.mb? ), but the patch will throw some light on this. The pattern you

Re: java.io.FileNotFoundException

2010-01-12 Thread Rekha Joshi
I think maybe the tmp setting is not correctly done or there is a permission access issue. Cheers, /R On 1/12/10 6:31 PM, "ruslan usifov" wrote: Hello Im running on compound cluster (some nodes win32, some freebsd) and on win32 nodes i see follow errors (and failed tasks) java.io.FileNotFound

Re: Should mapreduce.ReduceContext reuse same object in nextKeyValue?

2010-01-12 Thread Eric Sammer
On 1/12/10 6:53 PM, Wilkes, Chris wrote: > I created my own Writable class to store 3 pieces of information. In my > mapreducer.Reducer class I collect all of them and then process as a > group, ie: > > reduce(key, values, context) { > List myFoos =new ArrayList(); > for (Foo value : values)

Should mapreduce.ReduceContext reuse same object in nextKeyValue?

2010-01-12 Thread Wilkes, Chris
I created my own Writable class to store 3 pieces of information. In my mapreducer.Reducer class I collect all of them and then process as a group, ie: reduce(key, values, context) { List myFoos =new ArrayList(); for (Foo value : values) { myFoos.add(value); } } I was perplexed whe

RE: how to load big files into Hbase without crashing?

2010-01-12 Thread Clements, Michael
It's true that my specific case is specific to HBase. But there is also a more general question of how to set the # of mappers for a particular job. There may be reasons other than HBase to do this. For example, a job may need to be a singleton per machine due to resources it uses, statics, etc. I

Re: how to load big files into Hbase without crashing?

2010-01-12 Thread Jean-Daniel Cryans
Michael, This question should be addressed to the hbase-user mailing list as it is strictly about HBase's usage of MapReduce, the framework itself doen't have any knowledge of how the region servers are configured. I CC'd it. Uploading into an empty table is always a problem as you saw since ther

RE: how to load big files into Hbase without crashing?

2010-01-12 Thread Clements, Michael
P.S. I tried getting the config from the Job and setting max tasks to one per machine, but that didn't work: That is, this: jobConfig = job.getConfiguration(); jobConfig.setInt("mapred.tasktracker.map.tasks.maximum", 1); Does not work. It compiles & runs, but ignores the setting and creates lots

RE: how to load big files into Hbase without crashing?

2010-01-12 Thread Clements, Michael
This leads to one quick & easy question: how does one reduce the number of map tasks for a job? My goal is to limit the # of Map tasks so they don't overwhelm the HBase region servers. The Docs point in several directions. There's a method job.setNumReduceTasks(), but no setNumMapTasks(). There

Re: How do I sum by Key in the Reduce Phase AND keep the initial value

2010-01-12 Thread Stephen Watt
Thanks for responding Amogh. I'm using Hadoop 0.20.1 and see by the JIRA you mentioned its resolved in 0.21. Bummer... I've thought about the same thing you mentioned however, its my understanding that keeping those values or records in memory is dangerous as you can run out of memory dependin

Re: How do I sum by Key in the Reduce Phase AND keep the initial value

2010-01-12 Thread Amogh Vasekar
Hi, I ran into a very similar situation quite some time back and had then encountered this : http://issues.apache.org/jira/browse/HADOOP-475 After speaking to a few Hadoop folks, they had said complete cloning was not a straightforward option for some optimization reasons. There were a few things

how to load big files into Hbase without crashing?

2010-01-12 Thread Clements, Michael
I have 15-node Hadoop cluster that is working for most jobs. But every time I upload large data files into HBase, the job fails. I surmise that this file (15GB in size) is big enough, there are so many tasks (about 55 at once), they swamp the region server processes. Each cluster node is also an

How do I sum by Key in the Reduce Phase AND keep the initial value

2010-01-12 Thread Stephen Watt
The Key Value pairs coming into my Reducer are as Follows KEY(Text) VALUE(IntWritable) A 11 A 9 B 2 B 3 I want my reducer to sum the Values for each input key and then output the key with a Text Value containing the original value and

Re: How to use an alternative connector to SSH ?

2010-01-12 Thread Eric Sammer
On 1/12/10 9:19 AM, Emmanuel Jeanvoine wrote: > Hello, > > I would to use the Hadoop framework with MapReduce and I have a > question concerning the use of SSH. > Is it possible to use another connector than SSH to launch remote > commands ? > > I quickly checked the code but I think this is ha

How to use an alternative connector to SSH ?

2010-01-12 Thread Emmanuel Jeanvoine
Hello, I would to use the Hadoop framework with MapReduce and I have a question concerning the use of SSH. Is it possible to use another connector than SSH to launch remote commands ? I quickly checked the code but I think this is hardcoded since only SSH options seem to be customizable. Rega

java.io.FileNotFoundException

2010-01-12 Thread ruslan usifov
Hello Im running on compound cluster (some nodes win32, some freebsd) and on win32 nodes i see follow errors (and failed tasks) java.io.FileNotFoundException: File D:/tmp/hadoop-sergeyn/mapred/local/taskTracker/jobcache/job_20100605_0043/attempt_20100605_0043_m_37_0/work/tmp does not