Honestly, this is going to depend on the jobs you are targeting and the
local market. For example, in the New York market I'm not aware of any job
listings that mention certification as even a nice to have.
For someone who wants to become a Hadoop admin it might be at least mildly
valuable.
I thi
the Hadoop (or some other) infrastructure in a
> simple way to prevent us having to write a scheduler, database schema
> etc.? We can do that but it seems to be solving a problem that has already
> been solved many times.
>
> Thanks again,
>
> Richard
>
>
> *From:* Ravi
I think you're confused as to what these things are.
The fundamental question is do you want to run one job on sub parts of the
data, then stitch their results together (in which case
hive/map-reduce/spark will be for you), or do you essentially already have
splitting to computer-sized chunks figu
The namenode architecture is a source of fragility in HDFS. While a high
availability deployment (with two namenodes, and a failover mechanism)
means you're unlikely to see service interruption, it is still possible to
have a complete loss of filesystem metadata with the loss of two machines.
Seco
The biggest win I've seen for stability of hadoop components is to give
them their own hard disks; or alternatively their own hosts.
Obviously, you'll also want to check the usual suspects or resource and
processor contention.
On Wed, May 4, 2016 at 3:59 PM, Anandha L Ranganathan wrote:
> The R
This is what the capacity scheduler is for.
See my article on working around a particular bug in the current version of
the capacity scheduler
https://medium.com/handy-tech/practical-capacity-scheduling-with-yarn-28548ae4fb88
Note the part on user limits.
On Friday, April 15, 2016, Todd > wrote:
Many people will be aware of YARN-3216 (
https://issues.apache.org/jira/browse/YARN-3216). In short, the capacity
scheduler calculates all values only off the default label and queue.
My colleague Dave and I (and really mostly Dave!) configured our cluster to
work pretty much as desired even with
"yarn" hence 2 if not for a given partition only one would have run (if the
> am resource is less than the mimium size CS allows atleast 1 am to run)
>
> Given that you have already identified the issue, what more are you
> expecting ?
>
> Regards,
> + Naga
> --
Hi All,
We're hitting this issue. If you're a consultant with capacity today (2
March 2016 EST in New York) please feel free to contact me on or off list.
In terms of stack, we're using Yarn 2.7.1.2.3 from the latest Hortonworks
distribution. It's possible we're hitting this bug:
https://issues.a
;
> https://issues.apache.org/jira/browse/AMBARI-13946
>
> If your hdfs-site.xml contains the "non-HA properties" described in that
> issue, then a viable workaround would be to remove those properties.
>
> --Chris Nauroth
>
> From: Anu Engineer
> Date: Monday, January
Hi,
I'm running HDFS in HA mode. I can't run the rebalancer because I get this
message:
java.io.IOException: Another Balancer is running.. Exiting ...
Dec 31, 2015 12:22:09 AM Balancing took 1.159 seconds
Any suggestions as to why that might be; and if it's the obvious that it's
running elsewhe
Hi All,
We have just switched over to HA namenodes with ZK failover, using
HDP-2.3.0.0-2557
(HDFS 2.7.1.2.3). I'm looking for suggestions as to what to investigate to
make this more stable.
Before we went to HA our namenode was reasonably stable. Now, the namenodes
are crashing multiple times a d
12 matches
Mail list logo