Hi
I am Karthik from India. We have been working on a temperature aware yarn
scheduler, where we want to stop the scheduler from awarding new jobs to a
node if its temperature crosses certain threshold. We figured out a simple
way to do so, where we set the health checker script, and set it in yar
You can run a series of map-reduce jobs on your data, if some log line is
related to another line, e.g. based on sessionId, you can emit the
sessionId as the key of your mapper output with the value being on the rows
associated with the sessionId, so on the reducer side data from different
blocks w
Ceph and glusterfs are NOT centralized files systems. Glusterfs can be
used with Hadoop map reduce, but it requires a special plug in, and hdfs 2
can be ha, so it's probably not worth switching. Ymmv.
On Dec 31, 2013 4:01 PM, "Jiayu Ji" wrote:
> I am not very familiar with Ceph and GlusterFS, b
* Name node and secondary name nodes on different machines
* Kerberos was just enabled
* Cloudera CDH 4.5 on Centos
*Secondary name node log (HOST2) shows following*
2013-12-31 22:00:11,728 ERROR
org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException
as:hdfs/<2NN-host>@ (aut
as expected, its failing during shuffle
it seems like hdfs could not resolve the DNS name for slave nodes
have your configured your slaves host names correctly?
2013-12-31 14:27:54,207 INFO org.apache.hadoop.mapred.TaskInProgress: Error
from attempt_201312311107_0003_r_00_0: Shuffle Error: E
Hi
My hdfs-site is configured for 4 nodes. ( One is master and 3 slaves)
dfs.replication
4
start-dfs.sh and stop-mapred.sh doesnt solve the problem.
Also tried to run the program after formatting the namenode(Master) which
also fails.
My jobtracker logs on the master ( name node) is give be
dfs.heartbeat.interval
3
Determines datanode heartbeat interval in
seconds.
and may be you are looking for
*dfs.namenode.stale.datanode.interval<*/name>
3
Default time interval for marking a datanode as "stale", i.e., if
the namenode has not received heartbeat msg fro
I am not very familiar with Ceph and GlusterFS, but I know they are
centralized file systems. In this kinds of FS, compute nodes and the
storage nodes are separated. If the size of your data increases, the
network may eventually become the bottleneck.
Hadoop is a framework includes storage (HDFS)
what does your job log says? is yout hdfs-site configured properly to find
3 data nodes? this could very well getting stuck in shuffle phase
last thing to try : does stop-all and start-all helps? even worse try
formatting namenode
On Tue, Dec 31, 2013 at 11:40 AM, navaz wrote:
> Hi
>
>
> I am
Hi
I am running Hadoop cluster with 1 name node and 3 data nodes.
My HDFS looks like this.
hduser@nm:/usr/local/hadoop$ hadoop fs -ls /user/hduser/getty/gutenberg
Warning: $HADOOP_HOME is deprecated.
Found 7 items
-rw-r--r-- 4 hduser supergroup 343691 2013-12-30 19:12
/user/hduser/getty/
Is there a way to store the same object?
On Mon, Dec 30, 2013 at 7:05 PM, Chris Mawata wrote:
> Not unique to hdfs. The same thing would happen on your local file system
> or anywhere and any way you store the state of the object outside of the
> JVM. That is why singletons should not be serial
Is it happening in map or reduce phase, and are you allocating anything in
your mappers/reducers? For example if you are having a collection in one of
them, this might be causing the heap error. Also what are the specs of your
nodes? How many concurrent map en reduce tasks per tasktracker,... We
ca
12 matches
Mail list logo