object Writable and Serialization

2010-12-10 Thread psdc1978
Hi, I know that Hadoop MR don't use the java object Serialization and use instead the object Writable, and I understand the reasons that the Hadoop MR team chose that. I was doing my modifications to the Hadoop MR, and I was trying to transfer my own object via RPC method call between the

Scheduler in Hadoop MR

2010-12-07 Thread psdc1978
Hi, I'm trying to understand how the scheduler works in the Hadoop MR, and I've got the following questions: 1 - When we've two JobTrackers running simultaneously, each JobTracker is running in a separate process? 2 - The default scheduler used to assign Map and Reduce tasks in the Hadoop

FILE_BYTES_WRITTEN and HDFS_BYTES_WRITTEN

2010-11-30 Thread psdc1978
When an hadoop MapReduce example is executed, at the end of the example it's showed a table with all the information about the execution, like the number of Map and Reduce tasks executed, the number of bytes read and written. In this information it exists 2 fields FILE_BYTES_WRITTEN and H

GridMix2 preparations

2010-11-30 Thread psdc1978
Hi, To run gridmix2 (rungridmix_2) at ${HADOOP_HOME}/src/benchmarks/gridmix2 , do I need to run previously the generateGridmix2data.sh script file? Thanks, Pedro

Problems running GridMix2

2010-11-30 Thread psdc1978
Hi, 1 - I'm trying to run GridMix2 (rungridmix_2) in a cluster, but it happens nothing. A Job isn't created. It simple appears the message: [code] GridMix results: Total num of Jobs: 0 ExecutionTime: 0 [/code] Is there a way to know what is happenning with gridmix? Here's the gridmix-env-2

Split files, index files and input files

2010-06-09 Thread psdc1978
Hi, I'm facing difficulty in understanding all the concepts in Hadoop MR. 1 - Input files in MR contains index files. What's the purpose of the index files in hadoop? 2 - MR uses split files. A split file is an input file? Regards, -- Pedro

TaskTracker vs TaskInProgress

2010-06-09 Thread psdc1978
Hi, What's the difference between a TaskTracker and a TaskInProgress? Regards, -- Pedro

hadoop tasktracker

2010-06-09 Thread psdc1978
Hi, If I define in mapred-site.xml the property mapred.reduce.tasks to 1, how many reduce tasks will actually run? I think it will run 2 and I don't know why. But in a log that I've added, the two constructors of the ReduceTask.java class will run ( ReduceTask() and ReduceTask(with parameters) ).

FetchOutputs method understanding.

2010-05-27 Thread psdc1978
Hi, I'm looking to the method fetchOutputs from the ReduceTask.java, but there's a part of the method that I don't understand. Inside the method, in the synchronized (scheduledCopies) {...}, I don't understand what's happening inside the curly brackets? What's the purpose of this part of code? He

Not a host:port pair: local

2010-05-21 Thread psdc1978
Hi, I defined in the mapred-site.xml the following property: mapred.job.tracker local (...) But when I start map reduce I get the error "Not a host:port pair: local". How can I run MR with the local property? / STARTUP_MSG: Sta

Re: Trying to relate a split file to a input file

2010-05-18 Thread psdc1978
text.getInputSplit(), this will be a FileSplit > in your case. From there you can do a getPath() to see the both the > directory structure and the split value. > > > On May 18, 2010, at 10:01 AM, psdc1978 wrote: > > Hi, >> >> I'm study the MapReduce code, and

Trying to relate a split file to a input file

2010-05-18 Thread psdc1978
Hi, I'm study the MapReduce code, and I've the following questions: 1 - I'm running the wordcount example. I've 3 txt files as input. Each txt file is about 120Mb. During the execution of the map tasks, a number of map tasks will read the txt files. Each file is divided in split files. I would l

Re: How to debug reducer thread?

2010-05-01 Thread psdc1978
I've other idea that I don't know how to do it. Is it possible to set Xdebug parameter to the ReduceTask that is instanced by a JVM of the MapRed? If it's possible, I could connect the debugger to that thread, right? On Sat, May 1, 2010 at 4:43 PM, psdc1978 wrote: > Hi, &g

Re: How to debug reducer thread?

2010-05-01 Thread psdc1978
ebugger to a real task tracker is problematic > because user code is run in separate jvms, etc. It's almost never > worth it. Most debugging (with a real debugger) is better done using > MRUnit and the local job runner. > > Hope this helps and good luck. > > On Tue, Apr 27

How to debug reducer thread?

2010-04-27 Thread psdc1978
Hi, The reduce tasks are threads that are launched by the Reducer. The print below shows the stacktrace of one reduce task. at org.apache.hadoop.mapred.ReduceTask$ReduceCopier.fetchHashesOutputs(ReduceTask.java:2582) at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:395) at org.apache.ha

Re: HDFS and MapReduce and /tmp directory

2010-04-05 Thread psdc1978
b has finished execution on cleanup/abort. > MR is a process which loads/stores data in HDFS. Most of your queries > relate to knowing your default hdfs location. You can find that by “hadoop > dfs –ls”.The path preceding .Trash is your default hdfs location. > > HTH, > / > >

HDFS and MapReduce and /tmp directory

2010-04-05 Thread psdc1978
Hi, When I run an MapReduce example, I've noticed that some temporary directories are buit in /tmp directory. In my case, in the /tmp/hadoop directory it was created the following file directory during the execution of wordcount example: job_201004041803_0002/ |-- attempt_201004041803_0002_m_00

What's the purpose of a setup and cleanup task?

2010-04-02 Thread psdc1978
Hi, I've posted this post last moth, but I haven't got a response. Does anyone knows this question? I would like to understand what's the purpose of a setup and cleanup task. During the start-up of the job tracker, it will be assigned 2 setup tasks and 2 cleanup tasks for map and for the reduce.

job.jar

2010-03-15 Thread psdc1978
Hi, When I'm running an hadoop example: $ bin/hadoop jar build/hadoop-0.20.2-dev-examples.jar wordcount gutenberg gutenberg-output I've noticed that it's created a job.jar file with classes of the hadoop-0.20.2-dev-examples jar. Why job.jar is created? In the "Hadoop-Definitive Guide" book, it e

What's the purpose of a setup and cleanup task?

2010-03-14 Thread psdc1978
Hi, I would like to understand what's the purpose of a setup and cleanup task. During the start-up of the job tracker, it will be assigned 2 setup tasks and 2 cleanup tasks for map and for the reduce. My questions are: - What's the purpose of a setup task? - The setup class runs on the jobtrack

Difference between some MR classes

2010-03-02 Thread psdc1978
Hi, I've look to the hadoop-0.20.1 source and I've the following questions: 1 - As I understand from the source code, LocalJobRunner is a class used to run a map or reduce task. But a MR task is launched by the class JvmTask. What's the difference between the MR from the LocalJobRunner and from t

Where duplicated data is ignored?

2010-02-17 Thread psdc1978
Hi, In Hadoop MapRed, when I define the number of reduce tasks to run, mapred.reduce.tasks 3 I've noticed that during the execution of an MapRed example, the Reduces threads request 9 times the MapOutputServlet on the TaskTracker. The value 9 comes from the 3 reduces tasks time

MapRed ports

2010-02-09 Thread psdc1978
Hi, I've some question about the MapRed ports and how a reduce knows where the map output is to fetch. I know that MapRed uses jetty has a webserver. - The JobTracker send tasks to the TaskTracker execute them through port 50060? - Which port TaskTracker uses to send status about the task that

Questions about JobTracker and TaskTracker

2010-01-11 Thread psdc1978
Hi, I've some questions about hadoop MapRed architecture: 1 - It only exists one TaskTracker to one JobTracker? 2 - The Tasktracker and the JobTracker are two different instances that are started only through the start-mapred.sh script? [snippet of start-mapred.sh] "$bin"/hadoop-daemon.sh --co

Re: Only running hadoop Map tasks

2010-01-06 Thread psdc1978
See my question inline. On Tue, Jan 5, 2010 at 6:32 PM, Owen O'Malley wrote: > > On Jan 5, 2010, at 9:13 AM, psdc1978 wrote: > >> 1 - I would like to see what is output that the Maps is doing on my >> example. Is it possible to put hadoop only running Map tasks, >

Questions about dfs and MapRed in the Hadoop.

2010-01-05 Thread psdc1978
Hi list, I downloaded the Hadoop 0.20.1 and now I'm looking to the source of the MapReduce. I've got the following questions: 1 - What are the difference between the classes: org.apache.hadoop.mapred.Reducer.java and org.apache.hadoop.mapreduce.Reducer.java? In which case the 2 reducers are used?

Only running hadoop Map tasks

2010-01-05 Thread psdc1978
Hi, 1 - I would like to see what is output that the Maps is doing on my example. Is it possible to put hadoop only running Map tasks, excluding the Reduce tasks? 2 - The output of the Maps is written into a temporary file? 3 - How the output of the maps is passed to the reduce tasks? Is using a

Re: How compile hadoop-0.20.1

2009-11-24 Thread psdc1978
 Hi all, 1 - I'm trying to compile hadoop-0.20.1 but I'm facing several errors. Hadoop includes several subprojects, but these subprojects are all compiled in one file called hadoop-0.20.1-core.jar. If I just want to update the code in the MapReduce component, do I have to compile all the projec