Look into JobHistory Class !
On Wed, Mar 6, 2013 at 2:37 PM, Mirko Kämpf mirko.kae...@gmail.com wrote:
Hi,
please have a look on the Starfish project.
http://www.cs.duke.edu/starfish/
Best wishes
Mirko
2013/3/6 claytonly clayto...@163.com
Hello ,all
I was using hadoop-1.0.0 in
Not that I know. If so you should be able to identify each volume and as of
now this isn't the case.
BUT it can be done without Hadoop knowing about it, at the OS level, using
different partitions/mounts for datanode and jobtracker stuff. That should
solve your problem.
Regards
Bertrand
On Mon,
1) check whehter you can ssh to other node from namenode
set your configuration carefully
property
namefs.default.name/name
valuelocalhost:9000/value
/property
replace localhost with name node having namnode ruuning and shoulde
resolvable (try ping to that node from other
got it
Thanx Mahesh.
On Tue, Mar 5, 2013 at 1:35 PM, Mahesh Balija balijamahesh@gmail.comwrote:
What Harsh means by that is, you should create a custom partitioner which
should take care of partitioning the records based on the input record data
(Key, Value). i.e., if you have multiple
I am writing an application in c++, which uses API provided by libhdfs
to manipulate Hadoop DFS.
I could run the application with 1.0.4 and 1.1.1; setting classpath
equal to
$(hadoop classpath).
For Hadoop 2.0.3; setting CLASSPATH=$(hadoop classpath) does not load
necessary classes required
Hi,
I have created a hadoop cluster with two nodes (A and B). 'A' act both as
namenode and datanode, and 'B' act as datanode only. With this setup, I could
store, read files. Now, I added one more datanode 'C' and relieved 'A' from
datanode duty. This means, 'A' act only as namenode, and both
Just do it
$ hadoop dfsadmin -safemode leave
From: AMARNATH, Balachandar [mailto:balachandar.amarn...@airbus.com]
Sent: 06 March 2013 15:21
To: user@hadoop.apache.org
Subject: Issue: Namenode is in safe mode
Hi,
I have created a hadoop cluster with two nodes (A and B). 'A' act both as
Now I came out of the safe mode through admin command. I tried to put a file
into hdfs and encountered this error.
org.apache.hadoop.ipc.RemoteException: java.io.IOException: File
/user/hadoop/hosts could only be replicated to 0 nodes, instead of 1
Any hint to fix this,
This happens when the
RTFM?
Yes you can do this. See Oozie.
When you have a cryptic name, you get a cryptic answer.
Sent from a remote device. Please excuse any typos...
Mike Segel
On Mar 5, 2013, at 5:35 PM, Public Network Services
publicnetworkservi...@gmail.com wrote:
Hi...
I have an application that
Hi,
we are writing our fsimage and edits file on the namenode and secondary
namenode and additional on a nfs share.
In these folders we found a a lot of fsimage.ckpt_0
. files, the oldest is from 9. Aug 2012.
As far a i know these files should be deleted after the secondary
what is your replication factor?
when you removed node A as datanode .. did you first mark it for
retirement? if you just removed it from service then the blocks from that
datanode are missing and namenode when starts up it checks for the blocks.
Unless it reaches its threshold value it will not
How was the namenode A relieved of its duty and what was the default
replication factor?
If the replication factor was 1 and the datanode A was unplugged without
any care then you lost half of your files and your namenode is not really
happy about it (and is waiting for you to correct the mistake
Hi All,
I am writing an application in c++, which uses API provided by libhdfs
to manipulate Hadoop DFS.
I could run the application with 1.0.4 and 1.1.1; setting classpath
equal to
$(hadoop classpath).
For Hadoop 2.0.3; setting CLASSPATH=$(hadoop classpath) does not load
necessary classes
Sandy,
Remember KISS.
Don't try to read it in as anything but just a text line.
Its really a 3x3 matrix in what looks to be grouped by columns.
Your output will drop the initial key, and you then parse the lines and then
output it.
Without further explanation, it looks like each tuple is
Have you tried using distcp?
Sent from a remote device. Please excuse any typos...
Mike Segel
On Mar 5, 2013, at 8:37 AM, Subroto ssan...@datameer.com wrote:
Hi,
Its not because there are too many recursive folders in S3 bucket; in-fact
there is no recursive folder in the source.
If I
The repliation factor was 1 when I removed the entry of A in slaves file. I did
not mark it retirement. I do not know yet how to mark a node for retirement. I
waited for few minutes and then I could see thte namenode running again
From: Nitin Pawar [mailto:nitinpawar...@gmail.com]
Sent: 06
Hi Mike,
I have tries distcp as well and it ended up with exception:
13/03/06 05:41:13 INFO tools.DistCp:
srcPaths=[s3n://acessKey:acesssec...@dm.test.bucket/srcData]
13/03/06 05:41:13 INFO tools.DistCp: destPath=/test/srcData
13/03/06 05:41:18 INFO tools.DistCp: /test/srcData does not exist.
Hi,
I could successfully install hadoop cluster with three nodes (2 datanodes and 1
namenode). However, when I tried to store a file, I get the following error.
13/03/06 16:45:56 WARN hdfs.DFSClient: Error Recovery for block null bad
datanode[0] nodes == null
13/03/06 16:45:56 WARN
Hi all,
I thought the below issue is coming because of non availability of enough
space. Hence, I replaced the datanodes with other nodes with more space and it
worked.
Now, I have a working HDFS cluster. I am thinking of my application where I
need to execute 'a set of similar instructions'
in hadoop you don't have to worry about data locality. Hadoop job tracker
will by default try to schedule the job where the data is located in case
it has enough compute capacity. Also note that datanode just store the
blocks of file and multiple datanodes will have different blocks of the
file.
Unsubscribe me
How many more times, I have to mail u
Hi Ashish,
It's operation you have to do on your side
Have you tried google?
https://www.google.ca/search?q=unsubscribe+hadoop.apache.orgaq=foq=unsubscribe+hadoop.apache.orgaqs=chrome.0.57.2271sourceid=chromeie=UTF-8
JM
2013/3/6 ashish_kumar_gu...@students.iitmandi.ac.in:
Unsubscribe
In my opinion, another 2782829 times, give or take a few.
Or try reading and understanding http://hadoop.apache.org/mailing_lists.html
otherwise which tells you to send an email to user-unsubscr...@hadoop.apache.org
Cheers
Kai
Am 06.03.2013 um 14:03 schrieb
lol...
as long as u dnt mail to
user-unsubscr...@hadoop.apache.org
noobs...
On Wed, Mar 6, 2013 at 2:03 PM,
ashish_kumar_gu...@students.iitmandi.ac.inwrote:
Unsubscribe me
How many more times, I have to mail u
--
Regards,
Ouch Whisper
010101010101
The question is, how much more of this must we endure before the mailing
list server gets smarter? How about making it respond to any short
message that includes the word unsubscribe with a message reminding the
noob how to manage his subscription and how to send an email with the word
Hello,
I have a file of size 9GB and having approximately 109.5 million records.
I execute a pig script on this file that is doing:
1. Group by on a field of the file
2. Count number of records in every group
3. Store the result in a CSV file using normal PigStorage(,)
The job is completed
You can also try the following two commands:
1, hadoop job -status job-id
For example:
hadoop job -status job_201303021057_0004
I will get the following output:
Job: job_201303021057_0004
file:
hdfs://master:54310/user/ec2-user/.staging/job_201303021057_0004/job.xml
tracking URL:
Nitin is right. The hadoop Job tracker will schedule a job based on the
data block location and the computing power of the node.
Based on the number of data blocks, the job tracker will split a job into
map tasks. Optimally, map tasks should be scheduled on nodes with local
data. And also because
I used to have similar problem. Looks like there is a recursive folder
creation bug. How about you try remove the srcData from the dst, for
example use the following command:
*hadoop fs -cp s3n://acessKey:acesssec...@bucket.name/srcData /test/*
Or with distcp:
*hadoop distcp
Hi guys: I'm getting an odd error involving a file called toBeDeleted.
I've never seen this - somehow its blocking my task trackers from starting.
2013-03-06 16:19:24,657 ERROR org.apache.hadoop.mapred.TaskTracker: Can not
start task tracker because java.lang.RuntimeException: Cannot find root
Thanks for the reply. This sounds like it has potential, but also seems to be a
rather duct-tape type of work around. It would be nice if there was a mod to
dfs.datanode.du.reserved that worked within Hadoop, so that would imply that
hadoop was a little more certain to adhere to it.
I
To decommission a live datanode from the cluster, you can do the following
steps:
1, edit configuration file $HADOOP_HOME/conf/hdfs-site.xml, and add the
following property:
property
namedfs.hosts.exclude/name
value$HADOOP_HOME/conf/dfs-exclude.txt/value
/property
2, put the host name of the
You can always print out the hadoop classpath before running the hadoop
command, for example by editing the $HADOOP_HOME/bin/hadoop file.
HTH.
On Wed, Mar 6, 2013 at 5:01 AM, shubhangi shubhangi.g...@oracle.com wrote:
Hi All,
I am writing an application in c++, which uses API provided by
you can not directly remove a datanode from a cluster its not a proper way.
you need to decommission nodes and wait till the data from the datanode to
be removed are copied to other nodes.
just read document for proper decomissioning of nodes
When you constructed the classpath with the full path did you also add
slf4j-log4j12-*.jar(http://www.slf4j.org/codes.html#StaticLoggerBinder) to the
classpath. The jar should be in HADOOP_HOME/lib. This should help with SLF4J
issue.
13/03/04 11:17:23 WARN util.NativeCodeLoader: Unable to
Oozie for mapreduce job flow management can be a good choice. It can be too
heavy weight for your problem.
Based on your description. I am simply assuming that you are processing
some static data files, for example, the files will not change on the way
of processing, and there are no
interesting: The solution was simply to delete the toBeDeleted directory
manually:
rm -rf /usr/lib/hadoop/mapred/toBeDeleted/
I guess maybe somehow I changed the privileges of the
/usr/lib/hadoop/mapred directory so that it was unreadable or something.
Nevertheless, it was a cryptic error
Hello,
I was wondering if the Hadoop job tracker had an API, such as a web service
or xml feed? I'm trying to track Hadoop jobs as they progress. Right now,
I'm parsing the HTML of the Running Jobs section at
http://hadoop:50030/jobtracker.jsp, but this is definitely not desired if
there is a
Dear Hadoop Users,
I recently noticed there is a difference between the File System Counter
HDFS_BYTES_READ and the actual size of input files in map-reduce jobs.
And the difference seems to increase as the size of each key,value pairs
increases.
For example, I'm running the same job on two
Jeff,
Probably because records are split across blocks, so some of the data has to be
read twice. Assuming you have a 64 MB block size and 128 GB of data, I'd
estimate the overhead at 1 GB for 1 MB record size, and 32 GB for 32 MB record
size. Your overhead is about 75% of that, maybe my
Hi Kyle,
Maybe this can help you
http://stackoverflow.com/questions/2616524/tracking-hadoop-job-status-via-web-interface-exposing-hadoop-to-internal-clien/4156387#4156387
Regards,
Dino
On Mar 6, 2013 7:43 PM, Kyle B kbi...@gmail.com wrote:
Hello,
I was wondering if the Hadoop job tracker
I will go with first case because if data size is large then it will
distribute data across multiple nodes.
On Tue, Mar 5, 2013 at 10:57 AM, samir das mohapatra
samir.help...@gmail.com wrote:
Hi All,
I have one scenario where our organization is trying to implement
hadoop.
Scenario
I would sugest Hive in these cases because it is easy to manage multiple
data sources, it uses SQL like syntax, it scales because of Hadoop and it
has joining implemented and optimized
Regards
Dino
On Mar 6, 2013 8:46 PM, Vikas Jadhav vikascjadha...@gmail.com wrote:
I will go with first case
We have 5 nodes cluster(Hadoop 1.0.4), It hung a couple of times while
running big jobs. Basically all the nodes are dead, from that
trasktracker's log looks it went into some kinds of loop forever.
All the log entries like this when problem happened.
Any idea how to debug the issue?
Thanks in
Hi,
I want the jobtracker to prioritize the assignment of tasks to certain
tasktrackers.
eg: If a tasktracker meets certain criteria better than other ones, I want
to assign task to that tasktracker first (ideally I want the jobtracker to
sort tasktrackers based on certain criteria (eg cpu
Hi Dino,
Thanks for the response. I've seen this before, but was hoping to avoid
getting locked into the Java road. Do you happen to know if there is an
open API for the job tacker included with Hadoop? Something I could call
from a variety of languages, like a web service?
-Kyle
On Wed, Mar 6,
Hi Kyle,
There is only JobTracker servlet which u can use as web service but you
need to parse HTML response or you can build small java ws using code from
stackoverflow.
Regards
Dino
On Mar 6, 2013 10:06 PM, Kyle B kbi...@gmail.com wrote:
Hi Dino,
Thanks for the response. I've seen this
If you've not changed any configs, look under /tmp/hadoop-${user.name}/ perhaps.
On Thu, Mar 7, 2013 at 3:19 AM, Sayan Kole sayank...@gmail.com wrote:
Hi,
I cannot find the log files for the wordcount job: job_local_0001 when I
run it through eclipse. I am getting the standard output on
The Java API of JobClient class lets you query all jobs and provides
some task-level info as a public API.
In YARN (2.x onwards), the MRv2's AM publishes a REST API that lets
you query it (the RM lets you get a list of such AMs as well, as a
first step). This sounds more like what you need.
A
Hey John,
Ideas, comments and patches are welcome on
https://issues.apache.org/jira/browse/HDFS-1564 for achieving this!
On Wed, Mar 6, 2013 at 9:56 PM, John Meza j_meza...@hotmail.com wrote:
Thanks for the reply. This sounds like it has potential, but also seems to
be a rather duct-tape
Hi,
my Hadop cluster needs help: some tasks have to be done by a Windows server
with specialized closed-source software. How do I add them to the mix? For
example, I can run Tomcat, and the mapper would be calling a servlet there.
Is there anything better, which would be closer to the
Hello,
I'm using hadoop 1.1.1 and run into unexpected complication with partitioned
file. The file itself is the result of map-reduce task.
Here is code I'm using to read the file:
try (SequenceFile.Reader reader = new SequenceFile.Reader(fs, path,
conf)) {
// skipped
Hi All,
I am working on converting a sequence file to mapfile and just discovered
something I wasn't aware of.
For example, suppose I am working on a 2-node cluster, one
master/namenode/datanode, one slave/datanode. If I do hadoop dfs -cp
/data/file1 /data/file2 (a 1G file) from the master, and
Yes, the simple copy is a client operation. Client reads bytes from
source and writes to the destination, thereby being in control of
failures, etc.. However, if you want your cluster to do the copy (and
if the copy is a big set), consider using the DistCp
(distributed-copy) MR job to do it.
On
Can the mapper not directly talk to whatever application server the
Windows server runs? Is the work needed to be done in the map step
(i.e. per record)? If not, you can perhaps also consider the SSH
action of Oozie (although I've never tried it with a Windows machine)
under a workflow.
On Thu,
Okay,
then there is nothing wrong with the mapper directly talking to the server,
and failing the map task if the service does not work out.
Thank you,
Mark
On Wed, Mar 6, 2013 at 11:21 PM, Harsh J ha...@cloudera.com wrote:
Can the mapper not directly talk to whatever application server the
I am wondering what the correct behaviour is of this parameter? If it's set
to 4 does it mean job should fail if a job has more than 4 failures?
Hi All ,
Try to give the full path for the file such as :
/users/ivanovd/1.2a8b1a9c-47de-4631-8013-f0dd3e096036.cvsp/part-r-0
if the job is producing lots of files and there is a need to setup the number
of mappers more than one. A file crusher utility may be the best option here to
The only thing wrong would be what is said for the DB-talking jobs as
well: Distributed mappers talking to a single point of service can
bring it down.
On Thu, Mar 7, 2013 at 10:59 AM, Mark Kerzner mark.kerz...@shmsoft.com wrote:
Okay,
then there is nothing wrong with the mapper directly
Hi Mohit ,
This is The number of failed tasks in specified job after which the job will
not run on the task tracker. The job's tasks will no longer be assigned to the
task tracker. However, If the same task failed more than 4 times , The job will
fail regardless.
Hope this helps.
Thanks
It is a per-job config which controls the automatic job-level
blacklist: If, for a single job, a specific tracker has failed 4 (or
X) total tasks, then as prevent scheduling anymore of the job's tasks
to that tracker (but we don't eliminate more than 25% of the available
trackers this way, as for
No, its the number of task failures in a job after which that
particular tasktracker can be blacklisted *for that job*! Note that it
can take tasks from other jobs!
On Thu, Mar 7, 2013 at 11:21 AM, Mohit Anchlia mohitanch...@gmail.com wrote:
I am wondering what the correct behaviour is of this
Subroto and Shumin
Try adding a slash to to the s3n source:
- hadoop fs -cp s3n://acessKey:acesssec...@bucket.name/srcData
/test/srcData
+ hadoop fs -cp s3n://acessKey:acesssec...@bucket.name/srcData/
/test/srcData
Without the slash, it will keep listing srcData each time it is
listed,
63 matches
Mail list logo