No its not appearing from other name node.
Here is the procedure I followed:In NameNode1 I ran following commandsbin/hdfs
dfs -mkdir testbin/hdfs dfs -put dummy.txt test
When ran bin/hdfs -ls test command from NameNode1 its listing file fin hdfs but
if I ran same command from NameNode2 out put
We are using the default replication factor of 3. When new files are put
on HDFS we never override the replication factor. When there is more data
involved it fails on a larger split size.
On Wed, Sep 18, 2013 at 6:34 PM, Harsh J ha...@cloudera.com wrote:
Do your input files carry a
Hi Folks,
I want to use hbase for my data storage on the top of HDFS, Please help me
to find out the best version which i should used , like CDH4
I data size would be around 500gb - 5Tb.
My operations would be write intensive
Thanks
Are there any rough numbers one can give me regarding the latency of creating,
writing, and closing a small HDFS-based file? Does replication have a big
impact? I am trying to decide whether to communicate some modestly-sized
(~200KB) information via HDFS files or go to the trouble of
Hi all,
I am working with Hadoop 2.0.5 (I plan to migrate to 2.1.0 soon).
When I am starting a Job, I notice that some nodes are not used or partially
used.
For example, if my nodes can hold 2 containers, I notice that some nodes are
not running any or just 1 while others are running 2.
All my
We have this job submit property buried in hive that defaults to 10. We
should make that configurable.
On Wed, Sep 18, 2013 at 9:34 PM, Harsh J ha...@cloudera.com wrote:
Do your input files carry a replication factor of 10+? That could be
one cause behind this.
On Thu, Sep 19, 2013 at 6:20
It used to throw a warning in 1.03 and now has become an IOException. I was
more trying to figure out why it is exceeding the limit even though the
replication factor is 3. Also Hive may use CombineInputSplit or some
version of it, are we saying it will always exceed the limit of 10?
On Thu, Sep
I am going through the concepts of resource manager, application master and
node manager. As I undersand resource manager receives the job submission
and launches application master. It also launches node manager to monitor
application master. My questions are:
1. Is Node manager long lived and
Have you looked at -
http://hadoop.apache.org/docs/r2.1.0-beta/hadoop-project-dist/hadoop-hdfs/Federation.html
Let me know if the document is not clear or needs improvements.
Regards,
Suresh
On Thu, Sep 19, 2013 at 12:01 PM, Manickam P manicka...@outlook.com wrote:
Guys,
I need some
Guys,
I need some tutorials to configure fedration. Can you pls suggest me some?
Thanks,
Manickam P
--
This e-mail and any attachments transmitted with it are for the sole use
of the intended recipient(s) and may contain confidential , proprietary or
privileged information. If you are not the intended recipient, please
contact the sender by reply e-mail and destroy all copies of the
How does a YARN application master typically query ongoing status (like
percentage completion) of its tasks?
I would like to be able to ultimately relay information to the user like:
100 tasks are scheduled
10 tasks are complete
4 tasks are running and they are (4%, 10%, 50%, 70%) complete
But,
Please do not cross-post these emails to hdfs-user. The relevant email list
is only cdh-user.
On Thu, Sep 19, 2013 at 1:44 AM, Pavan Kumar Polineni
smartsunny...@gmail.com wrote:
Hi all,
*Name Node High Availability Job tracker high availability* is there in
Cloudera 4.1.1 ?
If not,
I am assuming you have looked at this already:
https://issues.apache.org/jira/browse/MAPREDUCE-5186
You do have a workaround here to increase *mapreduce.job.max.split.locations
*value in hive configuration, or do we need more than that here ?
-Rahul
On Thu, Sep 19, 2013 at 11:00 AM, Murtaza
Hey Harsh,
Here is the more complete stacktrace. I had truncated it earlier since it
was application specific.
Let me know if this helps.
Thrown: java.lang.RuntimeException java.io.IOException
Thrown-StackTrace:
at HadoopJobUpdaterProcess.execute(HadoopJobUpdaterProcess.java:37)
at
What are the ramifications of setting a hard coded value in our scripts and
then changing parameters which influence the input data size. I.e. I want
to run across 1 day worth of data, then a different day I want to run
against 30 days?
On Thu, Sep 19, 2013 at 3:11 PM, Rahul Jain
Matt,
It would be better for you to do an global config update: set
*mapreduce.job.max.split.locations
*to at least the number of datanodes in your cluster, either in
hive-site.xml or mapred-site.xml. Either case, this is a sensible
configuration update if you are going to use
Hi,
Let me clarify few things.
1) you are making container requests which are not explicitly looking for
certain nodes. (No white listing).
2) All nodes are identical in terms of resources (memory/cores) and every
container requires same amount of resources.
3) All nodes have capacity to run say
Thanks, that makes sense.
john
-Original Message-
From: Harsh J [mailto:ha...@cloudera.com]
Sent: Sunday, September 15, 2013 12:39 PM
To: user@hadoop.apache.org
Subject: Re: HDFS performance with an without replication
Write performance improves with lesser replicas (as a result of
Hi Mohit,
answers inline
On Fri, Sep 20, 2013 at 1:33 AM, Mohit Anchlia mohitanch...@gmail.comwrote:
I am going through the concepts of resource manager, application master
and node manager. As I undersand resource manager receives the job
submission and launches application master. It also
Thanks Rahul. Our ops people have implemented the config change.
On Thursday, September 19, 2013, Rahul Jain wrote:
Matt,
It would be better for you to do an global config update: set
*mapreduce.job.max.split.locations
*to at least the number of datanodes in your cluster, either in
Hi John,
YARN tasks can be more than simple executables. In case of MR, for
example, tasks talk to the AM and report their individual progress and
counters back to it, via a specific protocol (over the network),
giving the AM more data to compute an near-accurate global progress.
On Fri, Sep 20,
22 matches
Mail list logo