Hi,
I know that Hadoop MR don't use the java object Serialization and use
instead the object Writable, and I understand the reasons that the Hadoop
MR team chose that.
I was doing my modifications to the Hadoop MR, and I was trying to transfer
my own object via RPC method call between the
Hi,
I'm trying to understand how the scheduler works in the Hadoop MR, and I've
got the following questions:
1 - When we've two JobTrackers running simultaneously, each JobTracker is
running in a separate process?
2 - The default scheduler used to assign Map and Reduce tasks in the Hadoop
When an hadoop MapReduce example is executed, at the end of the example
it's showed a table with all the information about the execution, like the
number of Map and Reduce tasks executed, the number of bytes read and
written.
In this information it exists 2 fields FILE_BYTES_WRITTEN and
H
Hi,
To run gridmix2 (rungridmix_2) at ${HADOOP_HOME}/src/benchmarks/gridmix2 ,
do I need to run previously the generateGridmix2data.sh script file?
Thanks,
Pedro
Hi,
1 - I'm trying to run GridMix2 (rungridmix_2) in a cluster, but it happens
nothing. A Job isn't created. It simple appears the message:
[code]
GridMix results:
Total num of Jobs: 0
ExecutionTime: 0
[/code]
Is there a way to know what is happenning with gridmix?
Here's the gridmix-env-2
Hi,
I'm facing difficulty in understanding all the concepts in Hadoop MR.
1 -
Input files in MR contains index files. What's the purpose of the index
files in hadoop?
2 -
MR uses split files. A split file is an input file?
Regards,
--
Pedro
Hi,
What's the difference between a TaskTracker and a TaskInProgress?
Regards,
--
Pedro
Hi,
If I define in mapred-site.xml the property mapred.reduce.tasks to 1, how
many reduce tasks will actually run? I think it will run 2 and I don't know
why. But in a log that I've added, the two constructors of the
ReduceTask.java class will run ( ReduceTask() and ReduceTask(with
parameters) ).
Hi,
I'm looking to the method fetchOutputs from the ReduceTask.java, but there's
a part of the method that I don't understand. Inside the method, in the
synchronized (scheduledCopies) {...}, I don't understand what's happening
inside the curly brackets? What's the purpose of this part of code?
He
Hi,
I defined in the mapred-site.xml the following property:
mapred.job.tracker
local
(...)
But when I start map reduce I get the error "Not a host:port pair: local".
How can I run MR with the local property?
/
STARTUP_MSG: Sta
text.getInputSplit(), this will be a FileSplit
> in your case. From there you can do a getPath() to see the both the
> directory structure and the split value.
>
>
> On May 18, 2010, at 10:01 AM, psdc1978 wrote:
>
> Hi,
>>
>> I'm study the MapReduce code, and
Hi,
I'm study the MapReduce code, and I've the following questions:
1 - I'm running the wordcount example. I've 3 txt files as input. Each txt
file is about 120Mb.
During the execution of the map tasks, a number of map tasks will read the
txt files. Each file is divided in split files. I would l
I've other idea that I don't know how to do it. Is it possible to set Xdebug
parameter to the ReduceTask that is instanced by a JVM of the MapRed? If
it's possible, I could connect the debugger to that thread, right?
On Sat, May 1, 2010 at 4:43 PM, psdc1978 wrote:
> Hi,
&g
ebugger to a real task tracker is problematic
> because user code is run in separate jvms, etc. It's almost never
> worth it. Most debugging (with a real debugger) is better done using
> MRUnit and the local job runner.
>
> Hope this helps and good luck.
>
> On Tue, Apr 27
Hi,
The reduce tasks are threads that are launched by the Reducer. The print
below shows the stacktrace of one reduce task.
at
org.apache.hadoop.mapred.ReduceTask$ReduceCopier.fetchHashesOutputs(ReduceTask.java:2582)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:395)
at org.apache.ha
b has finished execution on cleanup/abort.
> MR is a process which loads/stores data in HDFS. Most of your queries
> relate to knowing your default hdfs location. You can find that by “hadoop
> dfs –ls”.The path preceding .Trash is your default hdfs location.
>
> HTH,
> /
>
>
Hi,
When I run an MapReduce example, I've noticed that some temporary
directories are buit in /tmp directory.
In my case, in the /tmp/hadoop directory it was created the following file
directory during the execution of wordcount example:
job_201004041803_0002/
|-- attempt_201004041803_0002_m_00
Hi, I've posted this post last moth, but I haven't got a response.
Does anyone knows this question?
I would like to understand what's the purpose of a setup and cleanup task.
During the start-up of the job tracker, it will be assigned 2 setup tasks
and 2 cleanup tasks for map and for the reduce.
Hi,
When I'm running an hadoop example:
$ bin/hadoop jar build/hadoop-0.20.2-dev-examples.jar wordcount gutenberg
gutenberg-output
I've noticed that it's created a job.jar file with classes of the
hadoop-0.20.2-dev-examples jar. Why job.jar is created?
In the "Hadoop-Definitive Guide" book, it e
Hi,
I would like to understand what's the purpose of a setup and cleanup task.
During the start-up of the job tracker, it will be assigned 2 setup tasks
and 2 cleanup tasks for map and for the reduce. My questions are:
- What's the purpose of a setup task?
- The setup class runs on the jobtrack
Hi,
I've look to the hadoop-0.20.1 source and I've the following questions:
1 - As I understand from the source code, LocalJobRunner is a class used to
run a map or reduce task. But a MR task is launched by the class JvmTask.
What's the difference between the MR from the LocalJobRunner and from t
Hi,
In Hadoop MapRed, when I define the number of reduce tasks to run,
mapred.reduce.tasks
3
I've noticed that during the execution of an MapRed example, the Reduces
threads request 9 times the MapOutputServlet on the TaskTracker. The value 9
comes from the 3 reduces tasks time
Hi,
I've some question about the MapRed ports and how a reduce knows where the
map output is to fetch.
I know that MapRed uses jetty has a webserver.
- The JobTracker send tasks to the TaskTracker execute them through port
50060?
- Which port TaskTracker uses to send status about the task that
Hi,
I've some questions about hadoop MapRed architecture:
1 - It only exists one TaskTracker to one JobTracker?
2 - The Tasktracker and the JobTracker are two different instances that are
started only through the start-mapred.sh script?
[snippet of start-mapred.sh]
"$bin"/hadoop-daemon.sh --co
See my question inline.
On Tue, Jan 5, 2010 at 6:32 PM, Owen O'Malley wrote:
>
> On Jan 5, 2010, at 9:13 AM, psdc1978 wrote:
>
>> 1 - I would like to see what is output that the Maps is doing on my
>> example. Is it possible to put hadoop only running Map tasks,
>
Hi list,
I downloaded the Hadoop 0.20.1 and now I'm looking to the source of
the MapReduce. I've got the following questions:
1 - What are the difference between the classes:
org.apache.hadoop.mapred.Reducer.java and
org.apache.hadoop.mapreduce.Reducer.java? In which case the 2 reducers
are used?
Hi,
1 - I would like to see what is output that the Maps is doing on my
example. Is it possible to put hadoop only running Map tasks,
excluding the Reduce tasks?
2 - The output of the Maps is written into a temporary file?
3 - How the output of the maps is passed to the reduce tasks? Is using
a
Hi all,
1 - I'm trying to compile hadoop-0.20.1 but I'm facing several errors.
Hadoop includes several subprojects, but these subprojects are all
compiled in one file called hadoop-0.20.1-core.jar. If I just want to
update the code in the MapReduce component, do I have to compile all
the projec
28 matches
Mail list logo