Hi Stephen,
I'm pretty sure the re-iterable reducer works by storing in memory and
spilling to disk once a certain threshold limit is reached. I'm don't know how
they decide the limit though ( probably a parameter like io.sort.mb? ), but the
patch will throw some light on this.
The pattern you
I think maybe the tmp setting is not correctly done or there is a permission
access issue.
Cheers,
/R
On 1/12/10 6:31 PM, "ruslan usifov" wrote:
Hello
Im running on compound cluster (some nodes win32, some freebsd) and on win32
nodes i see follow errors (and failed tasks)
java.io.FileNotFound
On 1/12/10 6:53 PM, Wilkes, Chris wrote:
> I created my own Writable class to store 3 pieces of information. In my
> mapreducer.Reducer class I collect all of them and then process as a
> group, ie:
>
> reduce(key, values, context) {
> List myFoos =new ArrayList();
> for (Foo value : values)
I created my own Writable class to store 3 pieces of information. In
my mapreducer.Reducer class I collect all of them and then process as
a group, ie:
reduce(key, values, context) {
List myFoos =new ArrayList();
for (Foo value : values) {
myFoos.add(value);
}
}
I was perplexed whe
It's true that my specific case is specific to HBase. But there is also
a more general question of how to set the # of mappers for a particular
job. There may be reasons other than HBase to do this. For example, a
job may need to be a singleton per machine due to resources it uses,
statics, etc.
I
Michael,
This question should be addressed to the hbase-user mailing list as it
is strictly about HBase's usage of MapReduce, the framework itself
doen't have any knowledge of how the region servers are configured. I
CC'd it.
Uploading into an empty table is always a problem as you saw since
ther
P.S.
I tried getting the config from the Job and setting max tasks to one per
machine, but that didn't work:
That is, this:
jobConfig = job.getConfiguration();
jobConfig.setInt("mapred.tasktracker.map.tasks.maximum", 1);
Does not work.
It compiles & runs, but ignores the setting and creates lots
This leads to one quick & easy question: how does one reduce the number
of map tasks for a job? My goal is to limit the # of Map tasks so they
don't overwhelm the HBase region servers.
The Docs point in several directions.
There's a method job.setNumReduceTasks(), but no setNumMapTasks().
There
Thanks for responding Amogh.
I'm using Hadoop 0.20.1 and see by the JIRA you mentioned its resolved in
0.21. Bummer... I've thought about the same thing you mentioned however,
its my understanding that keeping those values or records in memory is
dangerous as you can run out of memory dependin
Hi,
I ran into a very similar situation quite some time back and had then
encountered this : http://issues.apache.org/jira/browse/HADOOP-475
After speaking to a few Hadoop folks, they had said complete cloning was not a
straightforward option for some optimization reasons.
There were a few things
I have 15-node Hadoop cluster that is working for most jobs. But every
time I upload large data files into HBase, the job fails.
I surmise that this file (15GB in size) is big enough, there are so many
tasks (about 55 at once), they swamp the region server processes.
Each cluster node is also an
The Key Value pairs coming into my Reducer are as Follows
KEY(Text) VALUE(IntWritable)
A 11
A 9
B 2
B 3
I want my reducer to sum the Values for each input key and then output the
key with a Text Value containing the original value and
On 1/12/10 9:19 AM, Emmanuel Jeanvoine wrote:
> Hello,
>
> I would to use the Hadoop framework with MapReduce and I have a
> question concerning the use of SSH.
> Is it possible to use another connector than SSH to launch remote
> commands ?
>
> I quickly checked the code but I think this is ha
Hello,
I would to use the Hadoop framework with MapReduce and I have a
question concerning the use of SSH.
Is it possible to use another connector than SSH to launch remote
commands ?
I quickly checked the code but I think this is hardcoded since only
SSH options seem to be customizable.
Rega
Hello
Im running on compound cluster (some nodes win32, some freebsd) and on win32
nodes i see follow errors (and failed tasks)
java.io.FileNotFoundException: File
D:/tmp/hadoop-sergeyn/mapred/local/taskTracker/jobcache/job_20100605_0043/attempt_20100605_0043_m_37_0/work/tmp
does not
15 matches
Mail list logo