I would like to merge some SequenceFiles as well, so any help would be great!
Although the solution with the single reducer works great, the files are small
so I don't need distribution.
I think I will create a simple java program that will read these files and
merge them.
> From: christoph.sc
Hi,
I have few input splits that are few MB in size.
I want to submit 1 GB of input to every mapper. How can I do it ?
Currently each mapper gets one input split that results in many small
map-output files.
I tried setting -Dmapred.map.min.split.size= , but still it does not
take effect.
Thanks,
Dear Harsh,
my purpose is to build a java map reduce framework based on a p2p model
to provide dynamic failure recovery of master, slave and task.
I've developed the p2p logic capable to track all the information needed to
allow these recovery,
but now I need to add a map reduce module to execute
Francesco,
The system is already dynamic right now, unless a configuration limits
that. Slave nodes may enter/exit any time. But you have to keep in
mind that the loss of a DataNode can lead to a lot of network traffic
for rebalancing/re-replicating blocks lost. A TaskTracker loss is not
so expens
I found the solution. The problem was that I've misspelled the
parameter "mapred.tasktracker.map.tasks.maximum".
On Tue, May 24, 2011 at 11:06 AM, Pedro Costa wrote:
> I think it's important to say that it exists 2 cpus per node and 12
> core(s) per cpu.
>
> On Tue, May 24, 2011 at 11:02 AM, Pedr
Is it possible to dynamically specify the slave nodes instead to specify
them statically through the configuration files?
I want to build a dynamic environment in which each node can enter and exit,
so i need to specify the new
entered node as a new slave node for hadoop framework
Thanks
I need to track every state change in a Job and in all its tasks.
In particular i also need to track Job and Task failures.
Does exists any API to do this? Somewhat like Publish-Subscribe paradigm?
Thanks
I think it's important to say that it exists 2 cpus per node and 12
core(s) per cpu.
On Tue, May 24, 2011 at 11:02 AM, Pedro Costa wrote:
> And all the nodes have the same configuration. A job has 5000 map tasks.
>
> On Tue, May 24, 2011 at 10:57 AM, Pedro Costa wrote:
>> The values are:
>> #map
And all the nodes have the same configuration. A job has 5000 map tasks.
On Tue, May 24, 2011 at 10:57 AM, Pedro Costa wrote:
> The values are:
> #map tasks: 8
> #reduce tasks: 10
> Map task capacity:10
> Reduce task capacity:10
>
>
> On Tue, May 24, 2011 at 8:01 AM, Harsh J wrote:
>> How many t
The values are:
#map tasks: 8
#reduce tasks: 10
Map task capacity:10
Reduce task capacity:10
On Tue, May 24, 2011 at 8:01 AM, Harsh J wrote:
> How many tasks are present in your job? Do all tasktrackers carry this
> configuration? What is the total reported slot capacity on the JT UI?
>
> On Mon
How many tasks are present in your job? Do all tasktrackers carry this
configuration? What is the total reported slot capacity on the JT UI?
On Mon, May 23, 2011 at 10:28 PM, Pedro Costa wrote:
> I think I've to rephrase the question.
>
> I set the "mapred.tasktracker.map.tasks.maximum" to 8, hop
11 matches
Mail list logo