Re: Allocation of containers to tasks in Hadoop

2019-01-09 Thread Aaron Eng
The settings are very relevant to having an equal number of containers running on each node if you have an idle cluster and want to distribute containers for a single job. An application master submits requests for container allocations to the ResourceManager. The MRAppMaster will request all

Re: Allocation of containers to tasks in Hadoop

2019-01-09 Thread Aaron Eng
Have you checked the yarn.scheduler.fair.assignmultiple and yarn.scheduler.fair.max.assign parameters for the ResourceManager configuration? On Wed, Jan 9, 2019 at 9:49 AM Or Raz wrote: > How can I change/suggest a different allocation of containers to tasks in > Hadoop? Regarding a native

Re: Physical memory (bytes) snapshot counter question - how to get maximum memory used in reduce task

2017-04-06 Thread Aaron Eng
An important consideration is the difference between the RSS of the JVM process vs. the used heap size. Which of those are you looking for? And also, importantly, why/what do you plan to do with that info? A second important consideration is the length of time you are at/around your max

Re: I/O time when reading from HDFS in Hadoop

2016-06-13 Thread Aaron Eng
If you want to measure the effect of turning compression on and off, the most directly observable metric would be the number of bytes written. The actual time it takes to write data is dependent upon many factors. On Sat, Jun 11, 2016 at 10:28 AM, Alexandru Calin < alexandrucali...@gmail.com>

Re: HDFS2 vs MaprFS

2016-06-06 Thread Aaron Eng
cks. On Mon, Jun 6, 2016 at 9:35 AM, Ascot Moss <ascot.m...@gmail.com> wrote: > Hi Aaron, from MapR site, [now HDSF2] "Limit to 50-200 million files", is > it really true? > > On Tue, Jun 7, 2016 at 12:09 AM, Aaron Eng <a...@maprtech.com> wrote: > >>

Re: HDFS2 vs MaprFS

2016-06-06 Thread Aaron Eng
t; where are these features in Mapr-FS? > > On Mon, Jun 6, 2016 at 11:43 PM, Aaron Eng <a...@maprtech.com> wrote: > >> >Since MapR is proprietary, I find that it has many compatibility >> issues in Apache open source projects >> >> This is faulty logic. And rat

Re: HDFS2 vs MaprFS

2016-06-06 Thread Aaron Eng
>Since MapR is proprietary, I find that it has many compatibility issues in Apache open source projects This is faulty logic. And rather than saying it has "many compatibility issues", perhaps you can describe one. Both MapRFS and HDFS are accessible through the same API. The backend

Re: Started learning Hadoop. Which distribution is best for native install in pseudo distributed mode?

2014-08-12 Thread Aaron Eng
On that note, 2 is also misleading/incomplete. You might want to explain which specific features you are referencing so the original poster can figure out if those features are relevant. The inverse of 2 is also true, things like consistent snapshots and full random read/write over NFS are in

Re: Cloudera Vs Hortonworks Vs MapR

2013-09-13 Thread Aaron Eng
will get back to you with more information. Best Regards, Aaron Eng On Thu, Sep 12, 2013 at 10:19 AM, Hadoop Raj hadoop...@yahoo.com wrote: Hi, We are trying to evaluate different implementations of Hadoop for our big data enterprise project. Can the forum members advise on what

Re: rack awareness in hadoop

2013-04-20 Thread Aaron Eng
The problem is probably not related to the JVM memory so much as the Linux memory manager. The exception is in java.lang.UNIXProcess.init(UNIXProcess.java:148) which would imply this is happening when trying to create a new process. The initial malloc for the new process space is being denied by

Re: Hadoop throughput question

2013-01-03 Thread Aaron Eng
If from the same machine, you can read the raw data of the file at 70MB/s and when reading it using SequenceFile you get 26MB/sec, I would presume that the speed difference comes down to the read pattern as well as the Isilon file system implementation. For the 70MB/s, if you are doing something

Re: Hadoop failing jobs non zero exit status 7

2012-09-13 Thread Aaron Eng
this comes up when you spawn it directly from the shell vs. being spawned via TaskTracker is a useful bit of info. If you can't identify the cause, feel free to post in answers.mapr.com or send an email to supp...@mapr.com for some more assistance. Best Regards, Aaron Eng On Thu, Sep 13, 2012 at 5:38 AM

Re: Hadoop on EC2 Managing Internal/External IPs

2012-08-23 Thread Aaron Eng
pass around their external FQDNs, since those will properly resolve to the internal or external IP depending on what machine is asking? Is there no way to just do that? On Aug 23, 2012, at 8:02 PM, Aaron Eng wrote: Hi Igor, Amazon offers a service where you can have a VPN gateway on your

Re: Regarding design of HDFS

2011-09-12 Thread Aaron Eng
The only way to avoid this is to make the data much more cacheable and to have a viable cache coherency strategy. Cache coherency at the meta-data level is difficult. Cache coherency at the block level is also difficult (but not as difficult) because many blocks get moved for balance

Re: HDFS Corruption: How to Troubleshoot or Determine Root Cause?

2011-05-18 Thread Aaron Eng
Hey Tim, Hope everything is good with you. Looks like you're having some fun with hadoop. Can anyone enlighten me? Why is dfs.*.dir default to /tmp a good idea? It's not a good idea, its just how it defaults. You'll find hundreds or probably thousands of these quirks as you work with

Re: HDFS Corruption: How to Troubleshoot or Determine Root Cause?

2011-05-18 Thread Aaron Eng
. On Wed, May 18, 2011 at 4:54 PM, Aaron Eng a...@maprtech.com wrote: Hey Tim, Hope everything is good with you. Looks like you're having some fun with hadoop. Can anyone enlighten me? Why is dfs.*.dir default to /tmp a good idea? It's not a good idea, its just how it defaults. You'll find

Re: Unable to accesss the HDFS hadoop .21 please help

2011-02-04 Thread Aaron Eng
I think it wants you to type a capital Y, as silly as that may sound... On Feb 4, 2011, at 7:38 AM, ahmednagy ahmed_said_n...@hotmail.com wrote: I have a cluster with a master and 7 nodes when i try to start hadoop it starts the mapreduce processes and the hdfs processes on all the nodes.

Re: Benchmarking performance in Amazon EC2/EMR environment

2011-02-01 Thread Aaron Eng
in the us-west and us-east regions and the experience has been the same. For cc1.4xlarge instances I've only tested in us-east. On Tue, Feb 1, 2011 at 7:48 AM, Steve Loughran ste...@apache.org wrote: On 31/01/11 23:22, Aaron Eng wrote: Hi all, I was wondering if any of you have had a similar

Benchmarking performance in Amazon EC2/EMR environment

2011-01-31 Thread Aaron Eng
Hi all, I was wondering if any of you have had a similar experience working with Hadoop in Amazon's environment. I've been running a few jobs over the last few months and have noticed them taking more and more time. For instance, I was running teragen/terasort/teravalidate as a benchmark and

Re: hadoop single user setup

2011-01-10 Thread Aaron Eng
When you run the Hadoop CLI command it spawns a java process which in turn tries to connect to the namenode service. In this case, your client is trying to reach the namenode at localhost on TCP port 9000. That connection is failing. The likely reason is that your namenode service is not running

Re: Hadoop/Elastic MR on AWS

2010-12-09 Thread Aaron Eng
Pros: - Easier to build out and tear down clusters vs. using physical machines in a lab - Easier to scale up and scale down a cluster as needed Cons: - Reliability. In my experience I've had machines die, had machines fail to start up, had network outages between Amazon instances, etc. These

Re: wordcount example using local file system instead of distributed one?

2010-12-08 Thread Aaron Eng
Hi Dean, Try removing the fs.default.name parameter from hdfs-site.xml and put it in core-site.xml On Wed, Dec 8, 2010 at 2:46 PM, Hiller, Dean (Contractor) dean.hil...@broadridge.com wrote: I run the following wordcount example(my hadoop shell seems to always hit the local file system

Re: wordcount example using local file system instead of distributed one?

2010-12-08 Thread Aaron Eng
You will also need to restart services after that, in case that wasn't obvious. On Wed, Dec 8, 2010 at 2:56 PM, Aaron Eng a...@maprtech.com wrote: Hi Dean, Try removing the fs.default.name parameter from hdfs-site.xml and put it in core-site.xml On Wed, Dec 8, 2010 at 2:46 PM, Hiller

Re: wordcount example using local file system instead of distributed one?

2010-12-08 Thread Aaron Eng
if done from wrong node). Thanks, Dean *From:* Aaron Eng [mailto:a...@maprtech.com] *Sent:* Wednesday, December 08, 2010 3:57 PM *To:* hdfs-user@hadoop.apache.org *Subject:* Re: wordcount example using local file system instead of distributed one? You will also need to restart

Re: Not a host:port pair: local

2010-11-24 Thread Aaron Eng
Can you send the mapred-site.xml config for reference? It could be a formatting issue. I've seen that problem when there was a type in the XML after hand-editing. On Tue, Nov 23, 2010 at 10:35 AM, Skye Berghel sberg...@cs.hmc.edu wrote: On 11/19/2010 10:07 PM, Harsh J wrote: How are you

Re: Not a host:port pair: local

2010-11-19 Thread Aaron Eng
Maybe try doing a grep -R local hadoop dir to see if its picking it up from somewhere in there. Also, maybe try specifying an actual IP instead of myserver as a test to see if name resolution is an issue. On Fri, Nov 19, 2010 at 5:56 PM, Skye Berghel sberg...@cs.hmc.edu wrote: I'm trying to

Re: Cluster setup

2010-11-09 Thread Aaron Eng
Hi Fabio, I found this site extremely helpful in explaining how to do a one node setup for a first time user: http://www.michael-noll.com/wiki/Running_Hadoop_On_Ubuntu_Linux_%28Single-Node_Cluster%29 On Tue, Nov 9, 2010 at 10:54 AM, Fabio A. Miranda fabio.a.mira...@gmail.com wrote: Hello,

Re: fs.defaultFS value

2010-11-09 Thread Aaron Eng
Did you set the namenode URI? 2010-11-09 15:38:38,255 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: java.lang.IllegalArgumentException: Invalid URI for NameNode address (check fs.defaultFS): file:/// has no authority. You should have some config defined in the core-site.xml file similar

Re: Single setup documentation error

2010-11-09 Thread Aaron Eng
bin/hadoop jar hadoop-*-examples.jar grep input output 'dfs[a-z]+' Have you tried specifying the actual file name instead of the using the '*' wildcard? On Tue, Nov 9, 2010 at 2:10 PM, Fabio A. Miranda fabio.a.mira...@gmail.comwrote: Give a fresh installation, I followed the Single Node Setup