Re: Hadoop / HBase hotspotting / overloading specific nodes

2014-10-09 Thread Ted Yu
Looks like the number of regions is lower than the number of nodes in the cluster. Can you split the table such that, after hbase balancer is run, there is region hosted by every node ? Cheers On Oct 8, 2014, at 11:01 PM, SF Hadoop wrote: > I'm not sure if this is an HBase issue or an Hadoo

Re: Hadoop / HBase hotspotting / overloading specific nodes

2014-10-09 Thread SF Hadoop
This doesn't help because the space is simply reserved for the OS. Hadoop still maxes out its quota and spits out "out of space" errors. Thanks On Wednesday, October 8, 2014, Bing Jiang wrote: > Could you set a reserved room for non-dfs usage? Just to avoid the disk > gets full. > > > > dfs.

Re: Hadoop / HBase hotspotting / overloading specific nodes

2014-10-09 Thread SF Hadoop
Haven't tried this. I'll give it a shot. Thanks On Thursday, October 9, 2014, Ted Yu wrote: > Looks like the number of regions is lower than the number of nodes in the > cluster. > > Can you split the table such that, after hbase balancer is run, there is > region hosted by every node ? > > Che

Standby Namenode and Datanode coexistence

2014-10-09 Thread oc tsdb
Hi, We have cluster with 3 nodes (1 namenode + 2 datanodes). Cluster is running with hadoop 2.4.0 version. We would like to add High Availability(HA) to Namenode using the Quorum Journal Manager. As per the below link, we need two NN machines with same configuration. http://hadoop.apache.org/do

RE: ETL using Hadoop

2014-10-09 Thread Andrew Machtolff
The closest thing I can think of to a .NET API would be to set up Hive external tables, and use a vendor’s (Cloudera, et al.) ODBC driver. You could connect from your .NET app using ODBC to the Hive tables, and SELECT/INSERT to read/write. If you’re desperate. ☺ As far as ETL, I’d recommend you

Re: ETL using Hadoop

2014-10-09 Thread Alex Kamil
the fastest way to do ETL on Hadoop is via Hbase+Phoenix JDBC driver , as for ODBC mapping you could use Thrift or one of the ODBC-JDBC bridges On Thu, Oct 9, 2014 at 8

Re: Fwd: ETL using Hadoop

2014-10-09 Thread daemeon reiydelle
Hadoop is in effect a massively fast etl with high latency as the tradeoff. Other solutions allow different tradeoffs. And some of those occur in Map phase, some in a reduce phase (e.g. Stream or columnar stores). On Oct 7, 2014 11:32 PM, "Dattatrya Moin" wrote: > > Hi , > > We have our own ETL

Re: Hadoop configuration for cluster machines with different memory capacity / # of cores etc.

2014-10-09 Thread Manoj Samel
So, in that case, the resource manager will allocate containers of different capacity based on node capacity ? Thanks, On Wed, Oct 8, 2014 at 9:42 PM, Nitin Pawar wrote: > you can have different values on different nodes > > On Thu, Oct 9, 2014 at 4:15 AM, Manoj Samel > wrote: > >> In a hadoop

Re: Standby Namenode and Datanode coexistence

2014-10-09 Thread Manoj Samel
Quorum services like journal node (and zookeeper) need to have at least 3 instances running On Thu, Oct 9, 2014 at 4:19 AM, oc tsdb wrote: > Hi, > > We have cluster with 3 nodes (1 namenode + 2 datanodes). > Cluster is running with hadoop 2.4.0 version. > > We would like to add High Availability

Re: Hadoop configuration for cluster machines with different memory capacity / # of cores etc.

2014-10-09 Thread Nitin Pawar
yes On 9 Oct 2014 22:11, "Manoj Samel" wrote: > So, in that case, the resource manager will allocate containers of > different capacity based on node capacity ? > > Thanks, > > On Wed, Oct 8, 2014 at 9:42 PM, Nitin Pawar > wrote: > >> you can have different values on different nodes >> >> On Thu

Re: Hadoop configuration for cluster machines with different memory capacity / # of cores etc.

2014-10-09 Thread SF Hadoop
Yes. You are correct. Just keep in mind, for every spec X machine you have to have version X of hadoop configs (that only reside on spec X machines). Version Y configs reside on only version Y machines, and so on. But yes, it is possible. On Thu, Oct 9, 2014 at 9:40 AM, Manoj Samel wrote: >

Re: Standby Namenode and Datanode coexistence

2014-10-09 Thread SF Hadoop
You can run any of the daemons on any machine you want, you just have to be aware of the trade offs you are making with RAM allocation. I am hoping this is a DEV cluster. This is definitely not a configuration you would want to use in production. If you are asking in regards to a production clus

MapReduce jobs start only on the PC they are typed on

2014-10-09 Thread Piotr Kubaj
Hi. I'm trying to run Hadoop on a 2-PC cluster (I need to do some benchmarks for my bachelor thesis) and it works, but jobs start only on the PC I typed the command (doesn't matter whether it has better specs or not or where data is physically since I count Pi). My mapred-site.xml is: mapred.j

Re: MapReduce jobs start only on the PC they are typed on

2014-10-09 Thread SF Hadoop
What is in /etc/hadoop/conf/slaves? Something tells me it just says 'localhost'. You need to specify your slaves in that file. On Thu, Oct 9, 2014 at 2:24 PM, Piotr Kubaj wrote: > Hi. I'm trying to run Hadoop on a 2-PC cluster (I need to do some > benchmarks for my bachelor thesis) and it work

Re: MapReduce jobs start only on the PC they are typed on

2014-10-09 Thread Piotr Kubaj
On 10/09/2014 23:44, SF Hadoop wrote: > What is in /etc/hadoop/conf/slaves? > > Something tells me it just says 'localhost'. You need to specify your > slaves in that file. Nope, my slaves file is as following: 10.0.0.1 10.0.0.2 signature.asc Description: OpenPGP digital signature

RE: about long time balance stop

2014-10-09 Thread sunww
Maybe it is related tohttps://issues.apache.org/jira/browse/HDFS-5806 From: spe...@outlook.com To: user@hadoop.apache.org Subject: about long time balance stop Date: Thu, 9 Oct 2014 02:49:56 + Hi I'm using Hadoop 2.2.0。After I add some new nodes to cluster, I run balance.After several day

Re: Standby Namenode and Datanode coexistence

2014-10-09 Thread oc tsdb
Thank you.We understood. Thanks oc.tsdb On Thu, Oct 9, 2014 at 11:22 PM, SF Hadoop wrote: > You can run any of the daemons on any machine you want, you just have to > be aware of the trade offs you are making with RAM allocation. > > I am hoping this is a DEV cluster. This is definitely not a

Bug??? Under-Replicated Blocks.

2014-10-09 Thread cho ju il
hadoop 2.4.1 datanode disk failure. 'Number of Under-Replicated Blocks' is zero. After two disk failure will result in the loss of files. ( CORRUPT ) How do I fix it? 1. dfshealth.html Configured Capacity:42.91 TB DFS Used: 1.86 GB Non DFS Used: 29.63 TB DFS Remaining: 13.28 TB

Re: Standby Namenode and Datanode coexistence

2014-10-09 Thread oc tsdb
One more query we have - Standby namenode should be running with all the services that are running on active name node? or hadoop-hdfs-namenode is alone sufficient initially? Thanks oc.tsdb On Fri, Oct 10, 2014 at 10:06 AM, oc tsdb wrote: > Thank you.We understood. > > Thanks > oc.tsdb > > On