hadoop directory can't add and remove

2014-07-02 Thread EdwardKing
I want to remove hadoop directory,so I use hadoop fs -rmr,but it can't remove,why? [hdfs@localhost hadoop-2.2.0]$ hadoop fs -ls Found 2 items drwxr-xr-x - hdfs supergroup 0 2014-07-01 17:52 QuasiMonteCarlo_1404262305436_855154103 drwxr-xr-x - hdfs supergroup 0 2014-07-01

Re: hadoop directory can't add and remove

2014-07-02 Thread unmesha sreeveni
http://www.unmeshasreeveni.blogspot.in/2014/04/name-node-is-in-safe-mode-how-to-leave.html hadoop fs -rm r QuasiMonteCarlo_1404262305436_855154103 On Wed, Jul 2, 2014 at 11:56 AM, EdwardKing zhan...@neusoft.com wrote: I want to remove hadoop directory,so I use hadoop fs -rmr,but it can't

why hadoop-daemon.sh stop itself

2014-07-02 Thread EdwardKing
I use hadoop2.2.0 , I start hadoop-daemon service,like follows: [hdfs@localhost logs]$ hadoop-daemon.sh start namenode [hdfs@localhost logs]$ hadoop-daemon.sh start secondarynamenode [hdfs@localhost logs]$ hadoop-daemon.sh start datanode [hdfs@localhost logs]$ jps 4135 NameNode 4270

Re: why hadoop-daemon.sh stop itself

2014-07-02 Thread Nitin Pawar
pull out the logs from datanode log file it will tell why it stopped On Wed, Jul 2, 2014 at 2:05 PM, EdwardKing zhan...@neusoft.com wrote: I use hadoop2.2.0 , I start hadoop-daemon service,like follows: [hdfs@localhost logs]$ hadoop-daemon.sh start namenode [hdfs@localhost logs]$

Re: why hadoop-daemon.sh stop itself

2014-07-02 Thread EdwardKing
I find logs,but I don't know how to do it. Thanks 2014-07-02 01:07:38,473 FATAL org.apache.hadoop.hdfs.server.datanode.DataNode: Initialization failed for block pool Block pool BP-279671289-127.0.0.1-1404285849267 (storage id DS-601761441-127.0.0.1-50010-1404205370190) service to

Re: why hadoop-daemon.sh stop itself

2014-07-02 Thread Nitin Pawar
see this error ava.io.IOException: Incompatible clusterIDs in /home/yarn/hadoop-2.2.0/hdfs/dn: namenode clusterID = CID-c91ccd10-8ea0-4fb3-9037-d5f57694674e; datanode clusterID = CID-89e2e0b8-2d61-4d6a-9424-ab46e4f83cab Did you format your namenode ? after formatting the namenode did you delete

Re: why hadoop-daemon.sh stop itself

2014-07-02 Thread Nitin Pawar
just use rm -rf command inside the datanode directory On Wed, Jul 2, 2014 at 2:33 PM, EdwardKing zhan...@neusoft.com wrote: I only format namenode,I don't delete the contents for datanode directory, because I don't know which command to delete them. How to do it? Thanks. [hdfs@localhost

Editlog recoverUnclosedStreams not getting called when cluster restarted and its not an UPGRADE

2014-07-02 Thread Nitin Goyal
Hi All, I would like to know the reason behind not calling editLog.recoverUnclosedStreams() in initEditLog method of FSImage.java. public void initEditLog(StartupOption startOpt) throws IOException { Preconditions.checkState(getNamespaceID() != 0, Must know namespace ID before

Pydoop 0.11.1

2014-07-02 Thread Simone Leo
Hello everyone, we're happy to announce that we have just released Pydoop 0.12.0 (http://pydoop.sourceforge.net). The main changes with respect to the previous version are: * support for YARN in CDH * explicit support for CDH 4.4 and 4.5 Mauro Del Rio has done all the work for this

Pydoop 0.12.0

2014-07-02 Thread Simone Leo
Hello everyone, we're happy to announce that we have just released Pydoop 0.12.0 (http://pydoop.sourceforge.net). The main changes with respect to the previous version are: * support for YARN in CDH * explicit support for CDH 4.4 and 4.5 Mauro Del Rio has done all the work for this

Re: hadoop directory can't add and remove

2014-07-02 Thread hadoop hive
Use Hadoop dfsadmin -safemode leave Then you can delete On Jul 2, 2014 6:37 PM, Chris Mawata chris.maw...@gmail.com wrote: The NameNode is in safe mode so it is read only. On Jul 2, 2014 2:28 AM, EdwardKing zhan...@neusoft.com wrote: I want to remove hadoop directory,so I use hadoop fs

Provide feature to limit MRJob's stdout/stderr size

2014-07-02 Thread huozhanf...@gmail.com
Hi,friend: I have submit a JIRA issue on how to limit MRJob's stdout/stderr size and I have done a part of the work. @https://issues.apache.org/jira/browse/YARN-223 This is a very meaningful feature , but now I meet with difficulties . So I need help. Thanks

Re: hadoop directory can't add and remove

2014-07-02 Thread Chris Mawata
Also, investigate why it is happening. Usually it is a block replication issue like a replication factor greater than the number of DataNodes. Chris On Jul 2, 2014 9:32 AM, hadoop hive hadooph...@gmail.com wrote: Use Hadoop dfsadmin -safemode leave Then you can delete On Jul 2, 2014 6:37 PM,

Re: Downloading a jar to hadoop's lib folder (classpath)

2014-07-02 Thread Rajat Jain
I tried adding the dependency in hadoop-project/pom.xml (because thats where many other jars are defined too). But it didn't work for me. Any other ideas? Thanks, Rajat On Tue, Jul 1, 2014 at 10:11 PM, Tsuyoshi OZAWA ozawa.tsuyo...@gmail.com wrote: I added it in the pom.xml file (inside

namenode doesn't receive datanode deactivate event

2014-07-02 Thread MrAsanjar .
Hi all, I have a small hadoop 2.2.0 development cluster consist of a master node ( namenode+resoucemanager ), and 4 slave nodes ( datanodes+nodemanager). My configuration is as such that it enables me dynamically add slave nodes by executing commands: .../sbin/hadoop-daemons.sh start datanode

Re: Downloading a jar to hadoop's lib folder (classpath)

2014-07-02 Thread Rajat Jain
hmm..Its interesting. When I added it in hadoop-hdfs-project/hadoop-hdfs/pom.xml it downloaded the jar locally. Any ideas why it doesn't work inside the yarn project? On Wed, Jul 2, 2014 at 10:09 AM, Rajat Jain rajat...@gmail.com wrote: I tried adding the dependency in hadoop-project/pom.xml

Re: Spark vs. Storm

2014-07-02 Thread Shahab Yunus
Not exactly. There are of course major implementation differences and then some subtle and high level ones too. My 2-cents: Spark is in-memory M/R and it simulated streaming or real-time distributed process for large datasets by micro-batching. The gain in speed and performance as opposed to

Re: Spark vs. Storm

2014-07-02 Thread Stephen Boesch
Spark Streaming discretizes the stream by configurable intervals of no less than 500Milliseconds. Therefore it is not appropriate for true real time processing.So if you need to capture events in the low 100's of milliseonds range or less than stick with Storm (at least for now). If you can

Re: The future of MapReduce

2014-07-02 Thread Shahab Yunus
My personal thoughts on this. I approach this problem in a different way. Map/Reduce is not a framework or a technology. It was a paradigm for distributed and parallel processing which can be implemented in different frameworks and style. So given that, I don't think there is as such any harm in

A Datanode shutdown question?

2014-07-02 Thread MrAsanjar .
If a datanode is shut-downed by calling hadoop-daemons.sh stop datanode, how would the namenode gets notified that a datanode is no longer active? does datanode send a SHUTDOWN_MSG to the namenode? does namenode has to wait for heartbeat timeout?

Re: Big Data tech stack (was Spark vs. Storm)

2014-07-02 Thread Stephen Boesch
You will not be arriving at a generic stack without oversimplifying to the point of serious deficiencies. There are as you say a multitude of options. You are attempting to boil them down to A vs B as opposed to A may work better under the following conditions .. 2014-07-02 13:25 GMT-07:00

Re: YARN creates only 1 container

2014-07-02 Thread Adam Kawa
You also might want to increase values for mapreduce.{map,reduce}.memory.mb to 1280 or 1536 or so (assuming that mapreduce.{map,reduce}.java.opts = -Xmx1024m). mapreduce.{map,reduce}.memory.mb is logical size of the container and it should be larger than mapreduce.{map,reduce}.java.opts that

Re: Big Data tech stack (was Spark vs. Storm)

2014-07-02 Thread Adaryl Bob Wakefield, MBA
From a high level that makes sense but I have no idea how you’d implement that. You can’t have expertise in everything. For example Mongo and Cassandra are both complex databases that require a decent amount of knowledge to make run properly. But they are entirely different in the way they

Re: A Datanode shutdown question?

2014-07-02 Thread varun kumar
Generally Datanode sends it heartbeat to namenode for every 3 seconds. If datanode stops sending its heartbeat to name node with in 3 seconds,Namenode thinks it is dead. On Wed, Jul 2, 2014 at 5:03 PM, MrAsanjar . afsan...@gmail.com wrote: If a datanode is shut-downed by calling

How to limit MRJob's stdout/stderr size(yarn2.3)

2014-07-02 Thread huozhanf...@gmail.com
Hi,friend: When a MRJob print too much stdout or stderr log, the disk will be filled. Now it has influence our platform management. I have improved org.apache.hadoop.mapred.MapReduceChildJVM.java(come from@org.apache.hadoop.mapred.TaskLog) to generate the execute cmd as follows:

RE: namenode doesn't receive datanode deactivate event

2014-07-02 Thread Brahma Reddy Battula
A Heartbeat mechanism Hadoop cluster is a master/slave mode, master includes namenode+resoucemanager, slave includes datanodes+nodemanager Master starts, will open a IPC server there, waiting for the slave heartbeat. Slave startup, will connect to the master, and every 3 seconds to master

What is the correct way to get a string back from a mapper or reducer

2014-07-02 Thread Chris MacKenzie
Hi, I have the following code and am using hadoop 2.4: In my driver: Configuration conf = new Configuration(); conf.set(sub, help); Š.. String s = conf.get(sub²); In my reducer: Configuration conf = context.getConfiguration(); conf.set(sub,

Re: What is the correct way to get a string back from a mapper or reducer

2014-07-02 Thread Bertrand Dechoux
Configuration is from an architecture point of view immutable once the job is started even though the API does not reflect that explicitly. I would say in a record. But the question is : what do you want to achieve? Regards Bertrand Dechoux On Thu, Jul 3, 2014 at 7:37 AM, Chris MacKenzie

Re: Big Data tech stack (was Spark vs. Storm)

2014-07-02 Thread Bertrand Dechoux
I will second Stephen. At best you will arrive at a point where you can tell I don't care about your problems here is the solution. Even though it sounds attractive if you are paid to set up the solution, that's really not the position a 'client' would want you to hold. Bertrand Dechoux On Thu,