Hadoop / HBase hotspotting / overloading specific nodes

2014-10-09 Thread SF Hadoop
I'm not sure if this is an HBase issue or an Hadoop issue so if this is off-topic please forgive. I am having a problem with Hadoop maxing out drive space on a select few nodes when I am running an HBase job. The scenario is this: - The job is a data import using Map/Reduce / HBase - The data

Re: Hadoop / HBase hotspotting / overloading specific nodes

2014-10-09 Thread Bing Jiang
Could you set a reserved room for non-dfs usage? Just to avoid the disk gets full. hdfs-site.xml property namedfs.datanode.du.reserved/name value/value descriptionReserved space in bytes per volume. Always leave this much space free for non dfs use. /description /property 2014-10-09 14:01

Re: Hadoop / HBase hotspotting / overloading specific nodes

2014-10-09 Thread Ted Yu
Looks like the number of regions is lower than the number of nodes in the cluster. Can you split the table such that, after hbase balancer is run, there is region hosted by every node ? Cheers On Oct 8, 2014, at 11:01 PM, SF Hadoop sfhad...@gmail.com wrote: I'm not sure if this is an HBase

Re: Hadoop / HBase hotspotting / overloading specific nodes

2014-10-09 Thread SF Hadoop
This doesn't help because the space is simply reserved for the OS. Hadoop still maxes out its quota and spits out out of space errors. Thanks On Wednesday, October 8, 2014, Bing Jiang jiangbinglo...@gmail.com wrote: Could you set a reserved room for non-dfs usage? Just to avoid the disk gets

Re: Hadoop / HBase hotspotting / overloading specific nodes

2014-10-09 Thread SF Hadoop
Haven't tried this. I'll give it a shot. Thanks On Thursday, October 9, 2014, Ted Yu yuzhih...@gmail.com wrote: Looks like the number of regions is lower than the number of nodes in the cluster. Can you split the table such that, after hbase balancer is run, there is region hosted by every

Standby Namenode and Datanode coexistence

2014-10-09 Thread oc tsdb
Hi, We have cluster with 3 nodes (1 namenode + 2 datanodes). Cluster is running with hadoop 2.4.0 version. We would like to add High Availability(HA) to Namenode using the Quorum Journal Manager. As per the below link, we need two NN machines with same configuration.

RE: ETL using Hadoop

2014-10-09 Thread Andrew Machtolff
The closest thing I can think of to a .NET API would be to set up Hive external tables, and use a vendor’s (Cloudera, et al.) ODBC driver. You could connect from your .NET app using ODBC to the Hive tables, and SELECT/INSERT to read/write. If you’re desperate. ☺ As far as ETL, I’d recommend

Re: ETL using Hadoop

2014-10-09 Thread Alex Kamil
the fastest way to do ETL on Hadoop is via Hbase+Phoenix JDBC driver http://phoenix.apache.org/, as for ODBC mapping you could use Thrift or one of the ODBC-JDBC bridges http://stackoverflow.com/questions/5352956/odbc-jdbc-bridge-that-maps-its-own-calls-to-jdbc-driver On Thu, Oct 9, 2014 at 8:16

Re: Fwd: ETL using Hadoop

2014-10-09 Thread daemeon reiydelle
Hadoop is in effect a massively fast etl with high latency as the tradeoff. Other solutions allow different tradeoffs. And some of those occur in Map phase, some in a reduce phase (e.g. Stream or columnar stores). On Oct 7, 2014 11:32 PM, Dattatrya Moin dattatryam...@gmail.com wrote: Hi , We

Re: Hadoop configuration for cluster machines with different memory capacity / # of cores etc.

2014-10-09 Thread Manoj Samel
So, in that case, the resource manager will allocate containers of different capacity based on node capacity ? Thanks, On Wed, Oct 8, 2014 at 9:42 PM, Nitin Pawar nitinpawar...@gmail.com wrote: you can have different values on different nodes On Thu, Oct 9, 2014 at 4:15 AM, Manoj Samel

Re: Standby Namenode and Datanode coexistence

2014-10-09 Thread Manoj Samel
Quorum services like journal node (and zookeeper) need to have at least 3 instances running On Thu, Oct 9, 2014 at 4:19 AM, oc tsdb oc.t...@gmail.com wrote: Hi, We have cluster with 3 nodes (1 namenode + 2 datanodes). Cluster is running with hadoop 2.4.0 version. We would like to add High

Re: Hadoop configuration for cluster machines with different memory capacity / # of cores etc.

2014-10-09 Thread Nitin Pawar
yes On 9 Oct 2014 22:11, Manoj Samel manojsamelt...@gmail.com wrote: So, in that case, the resource manager will allocate containers of different capacity based on node capacity ? Thanks, On Wed, Oct 8, 2014 at 9:42 PM, Nitin Pawar nitinpawar...@gmail.com wrote: you can have different

Re: Hadoop configuration for cluster machines with different memory capacity / # of cores etc.

2014-10-09 Thread SF Hadoop
Yes. You are correct. Just keep in mind, for every spec X machine you have to have version X of hadoop configs (that only reside on spec X machines). Version Y configs reside on only version Y machines, and so on. But yes, it is possible. On Thu, Oct 9, 2014 at 9:40 AM, Manoj Samel

Re: Standby Namenode and Datanode coexistence

2014-10-09 Thread SF Hadoop
You can run any of the daemons on any machine you want, you just have to be aware of the trade offs you are making with RAM allocation. I am hoping this is a DEV cluster. This is definitely not a configuration you would want to use in production. If you are asking in regards to a production

Re: MapReduce jobs start only on the PC they are typed on

2014-10-09 Thread SF Hadoop
What is in /etc/hadoop/conf/slaves? Something tells me it just says 'localhost'. You need to specify your slaves in that file. On Thu, Oct 9, 2014 at 2:24 PM, Piotr Kubaj pku...@riseup.net wrote: Hi. I'm trying to run Hadoop on a 2-PC cluster (I need to do some benchmarks for my bachelor

Re: MapReduce jobs start only on the PC they are typed on

2014-10-09 Thread Piotr Kubaj
On 10/09/2014 23:44, SF Hadoop wrote: What is in /etc/hadoop/conf/slaves? Something tells me it just says 'localhost'. You need to specify your slaves in that file. Nope, my slaves file is as following: 10.0.0.1 10.0.0.2 signature.asc Description: OpenPGP digital signature

RE: about long time balance stop

2014-10-09 Thread sunww
Maybe it is related tohttps://issues.apache.org/jira/browse/HDFS-5806 From: spe...@outlook.com To: user@hadoop.apache.org Subject: about long time balance stop Date: Thu, 9 Oct 2014 02:49:56 + Hi I'm using Hadoop 2.2.0。After I add some new nodes to cluster, I run balance.After several

Re: Standby Namenode and Datanode coexistence

2014-10-09 Thread oc tsdb
Thank you.We understood. Thanks oc.tsdb On Thu, Oct 9, 2014 at 11:22 PM, SF Hadoop sfhad...@gmail.com wrote: You can run any of the daemons on any machine you want, you just have to be aware of the trade offs you are making with RAM allocation. I am hoping this is a DEV cluster. This is