RE: AW: How to merge several SequenceFile into one?

2011-05-24 Thread Panayotis Antonopoulos
I would like to merge some SequenceFiles as well, so any help would be great! Although the solution with the single reducer works great, the files are small so I don't need distribution. I think I will create a simple java program that will read these files and merge them. > From: christoph.sc

how to use mapred.min.split.size option ?

2011-05-24 Thread Mapred Learn
Hi, I have few input splits that are few MB in size. I want to submit 1 GB of input to every mapper. How can I do it ? Currently each mapper gets one input split that results in many small map-output files. I tried setting -Dmapred.map.min.split.size= , but still it does not take effect. Thanks,

Re: Dynamically specify slave nodes

2011-05-24 Thread Francesco De Luca
Dear Harsh, my purpose is to build a java map reduce framework based on a p2p model to provide dynamic failure recovery of master, slave and task. I've developed the p2p logic capable to track all the information needed to allow these recovery, but now I need to add a map reduce module to execute

Re: Dynamically specify slave nodes

2011-05-24 Thread Harsh J
Francesco, The system is already dynamic right now, unless a configuration limits that. Slave nodes may enter/exit any time. But you have to keep in mind that the loss of a DataNode can lead to a lot of network traffic for rebalancing/re-replicating blocks lost. A TaskTracker loss is not so expens

Re: How set number of map and reduce can run simultaneously

2011-05-24 Thread Pedro Costa
I found the solution. The problem was that I've misspelled the parameter "mapred.tasktracker.map.tasks.maximum". On Tue, May 24, 2011 at 11:06 AM, Pedro Costa wrote: > I think it's important to say that it exists 2 cpus per node and 12 > core(s) per cpu. > > On Tue, May 24, 2011 at 11:02 AM, Pedr

Dynamically specify slave nodes

2011-05-24 Thread Francesco De Luca
Is it possible to dynamically specify the slave nodes instead to specify them statically through the configuration files? I want to build a dynamic environment in which each node can enter and exit, so i need to specify the new entered node as a new slave node for hadoop framework Thanks

Tracking job and task evolving

2011-05-24 Thread Francesco De Luca
I need to track every state change in a Job and in all its tasks. In particular i also need to track Job and Task failures. Does exists any API to do this? Somewhat like Publish-Subscribe paradigm? Thanks

Re: How set number of map and reduce can run simultaneously

2011-05-24 Thread Pedro Costa
I think it's important to say that it exists 2 cpus per node and 12 core(s) per cpu. On Tue, May 24, 2011 at 11:02 AM, Pedro Costa wrote: > And all the nodes have the same configuration. A job has 5000 map tasks. > > On Tue, May 24, 2011 at 10:57 AM, Pedro Costa wrote: >> The values are: >> #map

Re: How set number of map and reduce can run simultaneously

2011-05-24 Thread Pedro Costa
And all the nodes have the same configuration. A job has 5000 map tasks. On Tue, May 24, 2011 at 10:57 AM, Pedro Costa wrote: > The values are: > #map tasks: 8 > #reduce tasks: 10 > Map task capacity:10 > Reduce task capacity:10 > > > On Tue, May 24, 2011 at 8:01 AM, Harsh J wrote: >> How many t

Re: How set number of map and reduce can run simultaneously

2011-05-24 Thread Pedro Costa
The values are: #map tasks: 8 #reduce tasks: 10 Map task capacity:10 Reduce task capacity:10 On Tue, May 24, 2011 at 8:01 AM, Harsh J wrote: > How many tasks are present in your job? Do all tasktrackers carry this > configuration? What is the total reported slot capacity on the JT UI? > > On Mon

Re: How set number of map and reduce can run simultaneously

2011-05-24 Thread Harsh J
How many tasks are present in your job? Do all tasktrackers carry this configuration? What is the total reported slot capacity on the JT UI? On Mon, May 23, 2011 at 10:28 PM, Pedro Costa wrote: > I think I've to rephrase the question. > > I set the "mapred.tasktracker.map.tasks.maximum" to 8, hop