Hey, I tried the following :
- created a dir temp for user A and permission 733
- created a dir temp/test for user B and permission 722
- - created a file temp/test/test.txt for user B and permission
722
Now in HDFS, user A can list as well as read the contents of
Stefan Will wrote:
Hi,
I had a flaky machine the other day that was still accepting jobs and
sending heartbeats, but caused all reduce task attempts to fail. This in
turn caused the whole job to fail because the same reduce task was retried 3
times on that particular machine.
What is your
David Kellogg wrote:
I am running Hadoop streaming. After around 42 jobs on an 18-node
cluster, the jobtracker stops responding. This happens on
normally-working code. Here are the symptoms.
1. A job is running, but it pauses with reduce stuck at XX%
2. hadoop job -list hangs or takes a very
Amareshwari Sriramadasu wrote:
Set mapred.jobtracker.retirejob.interval
This is used to retire completed jobs.
and mapred.userlog.retain.hours to higher value.
This is used to discard user logs.
By default, their values are 24 hours. These might be the reason for
failure, though I'm not
Amar Kamat wrote:
Amareshwari Sriramadasu wrote:
Set mapred.jobtracker.retirejob.interval
This is used to retire completed jobs.
and mapred.userlog.retain.hours to higher value.
This is used to discard user logs.
As Amareshwari pointed out, this might be the cause. Can you increase
Looks like the reducer is able to fetch map output files from the local
box but fails to fetch it from the remote box. Can you check if there is
no firewall issue or /etc/hosts entries are correct?
Amar
Jagadesh_Doddi wrote:
Hi
I have changed the configuration to run Name node and job tracker
bjday wrote:
Good morning everyone,
I have a question about correct setup for hadoop. I have 14 Dell
computers in a lab. Each connected to the internet and each
independent of each other. All run CentOS. Logins are handled by
NIS. If userA logs into the master and starts the daemons
Stefan Will wrote:
Hi,
I¹m using the new persistent job state feature in 0.19.0, and it¹s worked
really well so far. However, this morning my JobTracker died with and OOM
error (even though the heap size is set to 768M). So I killed it and all the
TaskTrackers.
Any specific reason why you
Mark Kerzner wrote:
Hi,
Hi,
why is hadoop suddenly telling me
Retrying connect to server: localhost/127.0.0.1:8020
with this configuration
configuration
property
namefs.default.name/name
valuehdfs://localhost:9000/value
/property
property
namemapred.job.tracker/name
Ruyue Ma wrote:
Our hadoop version is checked out from truck version at 2008.10.22.
When this problem appeared, our cluster didn't have any job running. The
cluster was idle!
How many jobs (total) were submitted to the cluster? What is the average
size of each job? Have you changed any
Vadim Zaliva wrote:
Hi!
I am observing strange situation in my Hadoop cluster. While running
task, eventually it gets into
this strange mode where:
1. JobTracker reports 0 task trackers.
2. Task tracker processes are alive but log file is full of repeating
messages like this:
2009-02-08
Andrew wrote:
For now, I use such code blocks in all my MR jobs:
try {
JobClient.runJob(job);
JobClient jc = new JobClient(job);
jc.submitJob(job); // submits a job and comes out
} catch (IOException exc) {
LOG.info(Job failed, exc);
}
System.exit(0);
But this code
Ryan,
From the MR (JobTracker) side we have a failover support.
If a large job is submitted and the JobTracker fails midway then you can start
the JobTracker on the same host and resume
the job. Look at https://issues.apache.org/jira/browse/HADOOP-3245 for more
details. Hope that helps.
Amar
Hey, nice work and nice writeup. Keep it up.
Comments inline.
Amar
-Original Message-
From: Ricky Ho [mailto:[EMAIL PROTECTED]
Sent: Fri 11/28/2008 9:45 AM
To: core-user@hadoop.apache.org
Subject: RE: Hadoop Internal Architecture writeup
Amar, thanks a lot. This is exactly the kind of
Ricky Ho wrote:
I put together an article describing the internal architecture of Hadoop (HDFS,
MapRed). I'd love to get some feedback if you see anything inaccurate or
missing ...
http://horicky.blogspot.com/2008/11/hadoop-mapreduce-implementation.html
Few comments on MR :
1) The
Ramya R wrote:
Hi all,
I want to retrieve the Rack ID of every datanode. How can I do this?
I tried using getNetworkLocation() in
org.apache.hadoop.hdfs.protocol.DatanodeInfo. I am getting /default-rack
as the output for all datanodes.
Have you setup the cluster to be rack-aware?
the standard deviation offline.
So avg = B / N = 10/4 = 2.5
Hence the std deviation would be
sqrt( (A - N * avg^2) / N) = sqrt ((30 - 4*6.25)/4) = *1.11803399
*Using the main formula the answer is *1.11803399*
Amar
On Mon, Nov 10, 2008 at 4:22 AM, Amar Kamat [EMAIL PROTECTED] wrote:
Amar Kamat wrote
Goel, Ankur wrote:
Hi Folks,
I am looking for some advice on some the ways / techniques
that people are using to get around namenode failures (Both disk and
host).
We have a small cluster with several job scheduled for periodic
execution on the same host where name server runs.
Elia Mazzawi wrote:
Hello,
I'm writing hadoop programs in Java,
I have 2 hadooop map/reduce programs that have the same map, but a
different reduce methods.
Look how MultipleOutputFormat is used. This provides the facility to
write to multiple files.
Amar
can i run them in a way so that
some speed wrote:
I was wondering if it was possible to read the input for a map function from
2 different files:
1st file --- user-input file from a particular location(path)
2nd file=--- A resultant file (has just one key,value pair) from a
previous MapReduce job. (I am implementing a
Prasad Pingali wrote:
I am using 0.18.1-dev, upgraded from 0.18.0. I am also using compression for
map outputs.
I think this is fixed in 19. Look here
https://issues.apache.org/jira/browse/HADOOP-3131. We see this with
compression turned ON.
Amar
- Prasad.
On Wednesday 17 September 2008
Edward J. Yoon wrote:
Hi communities,
Do you have any idea how to get the pairs of all row key combinations
w/o repetition on Map/Reduce as describe below?
Input : (MapFile or Hbase Table)
Key1, Value or RowResult
Key2, Value or RowResult
Key3, Value or RowResult
Key4, Value or RowResult
Edward J. Yoon wrote:
Hi communities,
Do you have any idea how to get the pairs of all row key combinations
w/o repetition on Map/Reduce as describe below?
Input : (MapFile or Hbase Table)
Key1, Value or RowResult
Key2, Value or RowResult
Key3, Value or RowResult
Key4, Value or RowResult
Amar Kamat wrote:
Edward J. Yoon wrote:
Hi communities,
Do you have any idea how to get the pairs of all row key combinations
w/o repetition on Map/Reduce as describe below?
Input : (MapFile or Hbase Table)
Key1, Value or RowResult
Key2, Value or RowResult
Key3, Value or RowResult
Key4
Pratyush Banerjee wrote:
Hi All,
I have been using hadoop on a 50 machine cluster for sometime now and
just wondered why the mapred.tasktracker.expiry.interval defaulted to
10 minutes.
If I want to reduce it to 1 min i.e. 6000 msec, should that cause any
problems.
I have seen the opposite case where the maps are shown as 100% done
while there are still some maps running. I have seen this on trunk and
there were some failed/killed tasks.
Amar
Andreas Kostyrka wrote:
On Wednesday 09 July 2008 05:56:28 Amar Kamat wrote:
Andreas Kostyrka wrote
I think the JobTracker can easily detect this. The case where a high
priority job is starved as there are no slots/resources. Preemption
should probably kick in where tasks from a low priority job might get
scheduled even though the high priority job has some tasks to run.
Amar
Goel, Ankur
of the reducers
taking up the reduce slots.
@Ankur/Murli,
Plz open a jira if you guys feel its important.
Amar
Murali, can you try this if it works !
-Original Message-
From: Amar Kamat [mailto:[EMAIL PROTECTED]
Sent: Wednesday, July 16, 2008 8:01 PM
To: core-user@hadoop.apache.org
(mapred.max.map.failures.percent/mapred.max.reduce.failures.percent :
default is 0) then the job is considered failed.
Amar
On Jul 1, 2008, at 10:06 PM, Amar Kamat wrote:
Mori Bellamy wrote:
hey all,
i've got a mapreduce task that works on small (~1G) input. when i
try to run the same task on large
jerrro wrote:
Hello,
I was wondering - could someone tell me what are the reasons that I could
get failure with certain map tasks on a node?
Well, that depends on the kind of errors you are seeing. Could you plz
post the logs/error messages?
Amar
Any idea that comes to mind
would work (it
Mori Bellamy wrote:
hey all,
i've got a mapreduce task that works on small (~1G) input. when i try
to run the same task on large (~100G) input, i get the following error
around when the map tasks are almost done (~98%)
2008-07-01 13:10:59,231 INFO org.apache.hadoop.mapred.ReduceTask:
=
http://'$tthost':'$port'/mapOutput?job='$jobid'map='$mapid'reduce='$reduce-partition-id'
'$var' is what you have to substitute.
Amar
Thanks,
Taran
On Thu, Jun 19, 2008 at 11:43 PM, Amar Kamat [EMAIL PROTECTED] wrote:
Yeah. With 2 nodes the reducers will go up to 16% because the reducer
Heshan Lin wrote:
Hi,
I'm trying to configure RandomWriter to generate less data than does
the default configuration.
bin/hadoop jar hadoop-*-examples.jar randomwriter
-Dtest.randomwrite.bytes_per_map=value
-Dtest.randomwrite.total_bytes=value
-Dtest.randomwriter.maps_per_host=value
Xuan Dzung Doan wrote:
Hi,
I'm a Hadoop newbie. My question is as follows:
The level of parallelism of a job, with respect to mappers, is largely the
number of map tasks spawned, which is equal to the number of InputSplits. But
within each InputSplit, there may be many records (many input
Yeah. With 2 nodes the reducers will go up to 16% because the reducer
are able to fetch maps from the same machine (locally) but fails to copy
it from the remote machine. A common reason in such cases is the
*restricted machine access* (firewall etc). The web-server on a
machine/node hosts map
Sayali Kulkarni wrote:
Hello,
I have been getting
Too many fetch failures (in the map operation)
and
shuffle error (in the reduce operation)
Can you post the reducer logs. How many nodes are there in the cluster?
Are you seeing this for all the maps and reducers? Are the reducers
Daniel Leffel wrote:
Why not just combine them? How do I do that?
Consider a case where the cluster (of n nodes) is configured to process
just one task per node. Let there be (n-1) reducers. Lets assume that
the map phase is complete and the reducers are shuffling. There will be
(n-1)
Looks like the reduce task is not able to fetch the map output from the
other machine. My guess is that the reduce task is able to pull data
from the same machine making the progress upto 16% but fails to get the
data from the other machine. This could be a firewall issue. Is it
possible for
Fabrizio detto Mario wrote:
How does Hadoop manage the failure of the JobTracker (Master Node)?
For example, Google Map/Reduce version aborts the MapReduce computation if
the master fails.
Currently there is no recovery/backup strategy inplace to take care of
this. We are currently working
Fabrizio detto Mario wrote:
Hello Hadoop community,
I read about Hadoop framework (
http://hadoop.apache.org/core/docs/r0.16.3/mapred_tutorial.html) this
phrase:
The Map-Reduce framework consists of a single master JobTracker and one
slave TaskTracker per cluster-node...
Is The Job Tracker
Amar Kamat wrote:
Fabrizio detto Mario wrote:
Hello Hadoop community,
I read about Hadoop framework (
http://hadoop.apache.org/core/docs/r0.16.3/mapred_tutorial.html) this
phrase:
The Map-Reduce framework consists of a single master JobTracker and one
slave TaskTracker per cluster-node
hemal patel wrote:
Hello ,
Can u help me to solve this problem..
When I am trying to run this program it give me error like this.
bin/hadoop jar hadoop-*-examples.jar grep input output 'dfs[a-z.]+'
08/05/12 17:32:59 INFO mapred.FileInputFormat: Total input paths to process
: 12
From the logs it looks like the reducer is able to fetch the data from
the slave on the master node ('cse' machine) but is not able to fetch it
from the other node ('mtech' machine here). The 16% shown in the reducer
is fetched from the local machine. It seems like the jetty on the
'mtech'
Jeremy Chow wrote:
Hi list,
I want to output my reduced results into several files according to some
types the results blongs to. How can I implement this?
There was a similar query earlier. The reply is here
[http://mail-archives.apache.org/mod_mbox/hadoop-core-user/200804.mbox/[EMAIL
Derek Shaw wrote:
Hey,
From the examples that I have seen thus far, all of the results from the reduce
function are being written to a file. Instead of writing results to a file, I
want to store them
What do you mean by store and inspect?
and inspect them after the job is completed. (I
It can be made a part of ClusterStatus.
Amar
Devaraj Das wrote:
No, currently, there is no way to get that from the JobClient. Yes, please
submit a patch.
-Original Message-
From: Pete Wyckoff [mailto:[EMAIL PROTECTED]
Sent: Wednesday, April 30, 2008 8:21 AM
To:
Sridhar Raman wrote:
I am trying to run K-Means using Hadoop. I first wanted to test it within a
single-node cluster. And this was the error I got. What could be the
problem?
$ bin/hadoop jar clustering.jar
com.company.analytics.clustering.mr.core.KMeansDriver
Iteration 0
You can override the configure() method in the map class to get the
output filename. Use FileOutputFormat.getOutputPath(conf) to get the
output filename. This will work for 0.17 and later. For earlier versions
you can use conf.getOutputPath().
Amar
chaitanya krishna wrote:
Hi,
Is there any
Apurva Jadhav wrote:
Hi,
I have a 4 node hadoop 0.15.3 cluster. I am using the default config
files. I am running a map reduce job to process 40 GB log data.
How many maps and reducers are there? Make sure that there are
sufficient number of reducers. Look at conf/hadoop-default.xml (see
Vibhooti Verma wrote:
Has any one tried setting number of reduce to zero and getting map's output
as the final output?
Look at the RandomWriter example
(src/examples/org/apache/hadoop/examples/RandomWriter.java).
Amar
I tried doing the same but my map output does not come to specified
Amar Kamat wrote:
Jiaqi Tan wrote:
Hi,
Will Hadoop ever interleave multiple maps/reduces from different jobs
on the same tasktracker?
No.
Suppose I have 2 jobs submitted to a jobtracker, one after the other.
Must all maps/reduces from the first submitted job be completed before
list).
On 4/16/08 9:04 PM, Amar Kamat [EMAIL PROTECTED] wrote:
Ted Dunning wrote:
The easiest solution is to not worry too much about running an extra MR
step.
So,
- run a first pass to get the counts. Use word count as the pattern. Store
the results in a file.
- run the second pass
Chaman Singh Verma wrote:
Hello,
I think the question was slightly misinterpreted. What I meant by 3-4
different task is that there are
3 different Reduce functionalities ( each reduce funtionalities could be
done by many task slaves, may
be 100). I want to reuse the output of Map for different
Earlier someone asked a similar question. See
http://mail-archives.apache.org/mod_mbox/hadoop-core-user/200804.mbox/[EMAIL PROTECTED]
for the reply.
I dont think that the framework directly supports this.
Amar
Vibhooti Verma wrote:
Can we configure two(multiple) reduce for the same map, so
.
Thanks,
On Wed, Apr 16, 2008 at 8:33 AM, Amar Kamat [EMAIL PROTECTED]
wrote:
Aayush Garg wrote:
Hi,
Are you sure that another MR is required for eliminating some rows?
Can't I
just somehow eliminate from main() when I know the keys which are
needed
Shirley Cohen wrote:
Dear Hadoop Users,
I'm writing to find out what you think about being able to
incrementally re-execute a map reduce job. My understanding is that
the current framework doesn't support it and I'd like to know whether,
in your opinion, having this capability could help to
Natarajan, Senthil wrote:
Hi,
How to read configuration file in Hadoop.
I tried by copying the file in HDFS and also placing within the jar file.
Do you intend to read the job's config file or a separate file? In case
of accessing the job specific config, overload the configure(JobConf)
One way to do this is to write your own (file) input format. See
src/java/org/apache/hadoop/mapred/FileInputFormat.java. You need to
override listPaths() in order to have selectivity amongst the files in
the input folder.
Amar
Alfonso Olias Sanz wrote:
Hi
I have a general purpose input folder
A simpler way is to use FileInputFormat.setInputPathFilter(JobConf,
PathFilter). Look at org.apache.hadoop.fs.PathFilter for details on
PathFilter interface.
Amar
Alfonso Olias Sanz wrote:
Hi
I have a general purpose input folder that it is used as input in a
Map/Reduce task. That folder
Andrey Pankov wrote:
Hi all,
Currently I'm able to run map-reduce jobs from box where NameNode and
JobTracker are running. But I'd like to run my jobs from separate box,
from which I have access to HDFS. I have updated params
fs.default.name and mapred.job.tracker in local hadoop dir to
On Thu, 27 Mar 2008, Natarajan, Senthil wrote:
Hi,
I have small Hadoop cluster, one master and three slaves.
When I try the example wordcount on one of our log file (size ~350 MB)
Map runs fine but reduce always hangs (sometime around 19%,60% ...) after
very long time it finishes.
I am
On Wed, 26 Mar 2008, Aayush Garg wrote:
HI,
I am developing the simple inverted index program frm the hadoop. My map
function has the output:
word, doc
and the reducer has:
word, list(docs)
Now I want to use one more mapreduce to remove stop and scrub words from
Use distributed cache as
On Tue, 25 Mar 2008, Nate Carlson wrote:
Is it possible to have a single slave process jobs for multiple masters?
There are two types of slaves and 2 corresponding masters in Hadoop. The 2
masters are Namenode and JobTracker while the slaves are datanodes and
tasktrackers resp. Each slave when
On Sun, 23 Mar 2008, Chaman Singh Verma wrote:
Hello,
I am exploring Hadoop and MapReduce and I have one very simple question.
I have 500GB dataset on my local disk and I have written both Map-Reduce
functions. Now how should I start ?
1. I copy the data from local disk to DFS. I have
On Mon, 10 Mar 2008, Naama Kraus wrote:
Hi,
In our system, we plan to upload data into Hadoop from external sources and
use it later on for analysis tasks. The interface to the external
repositories allows us to fetch pieces of data in chunks. E.g. get n records
at a time. Records are
What is the heap size you are using for your tasks? Check
'mapred.child.java.opts' in your hadoop-default.xml. Try increasing it.
This will happen if you try running the random-writer + sort examples with
default parameters. The maps are not able to spill the data to the disk.
Btw what version
The job file i.e job.jar is copied to the dfs by the job client. When the
task tracker prepares for new task it makes a local copy of the job.
On Sat, 8 Mar 2008, Ben Kucinich wrote:
I am interested to know the internal working of Hadoop regarding
distribution of jobs. How are the jobs copied
Look at WordCount.java in src/examples/org/apache/hadoop/examples. Whether
you need a new InputFormat depends on what you want to do.
Amar
On Fri, 7 Mar 2008, Prasan Ary wrote:
Hi All,
I am running a Map/Reduce on a textfile.
Map takes Text,Text as (key,value) input pair , and outputs
On Fri, 7 Mar 2008, Dan Tamowski wrote:
Hello,
First, I am currently subscribed to the digest, could you please cc me at
[EMAIL PROTECTED] with any replies. I really appreciate it.
I have a few questions regarding input formats. Specifically, I want to use
one complete text file per input
:05 PM, Amar Kamat wrote:
3) Lastly, it would seem beneficial for jobs that have significant
startup overhead and memory requirements to not be run in separate
JVMs for each task. Along these lines, it looks like someone
submitted a patch for JVM-reuse a while back, but it wasn't
commited? https
work correctly under scale changes, but *fixed* delays are almost never
correct.
Delays may work as a band-aid in the short run, but eventually you have to
take the band-aid off.
On 3/3/08 8:46 AM, Amar Kamat [EMAIL PROTECTED] wrote:
HADOOP is not meant for real time applications. Its more or less
See http://incubator.apache.org/pig/. Hope that helps. Not sure how joins
could be done in Hadoop.
Amar
On Fri, 22 Feb 2008, Chuck Lan wrote:
Hi,
I'm currently looking into how to better scale the performance of our
calculations involving large sets of financial data. It is currently using
a
Zhang, jian wrote:
Hi, All
I have a small question about configuration.
In Hadoop Documentation page, it says
Typically you choose one machine in the cluster to act as the NameNode
and one machine as to act as the JobTracker, exclusively. The rest of
the machines act as both a
Output of every mapreduce job in Hadoop gets stored in the DFS i.e made
visible. You can run back to back jobs (i.e job chaining) but the output
wont be temporary. Look at Grep.java as Hairong suggested for more
details on job chaining. As of now there is no support for job chaining
in Hadoop.
Hi,
I totally missed what you wanted to convey. What you want is that the
maps(the tasks) should be able to share their caches across jobs. In
hadoop each task is separate JVM. So sharing caches across tasks is
sharing across JVM's and that too over time (i.e to make cache a
separate higher
keyValBuffer = null;
+comparator.clearBuffer();
}
//A compare method that references the keyValBuffer through the indirect
//pointers
- Original Message
From: Amar Kamat [EMAIL PROTECTED]
To: core-user@hadoop.apache.org
Sent: Tuesday, February 5, 2008 12:08:48 AM
Subject: Re
Hi,
Yes, you are correct. The reference to the old keyval buffers are still
there even after the buffers are re-initialized but the reference is
there just between the consecutive spills. The scenario before
HADOOP-1965 was that the memory used for one sort-spill phase is
io.sort.mb causing
Ben Kucinich wrote:
I am new to Hadoop. I want to know a few things.
I have a Hadoop cluster of 1 master node and N - 1 slave nodes. I am putting
files into the DFS. If one of the slave node goes down, the data is still
accessible due to proper replication.
There are 2 masters in hadoop,
78 matches
Mail list logo