Thank you Clay,

So you installed small set of data first into Sandbox and used CentOS
machine and installed full blown HDP2.0 on that box using Ambari.

So when you installed HDP2.0 using Ambari, how did you configured your
master-namenode/slave-datanodes?

Also where did you installed Hive? On master node or outside of hadoop
cluster?

And how is Hive connecting to Hadoop Cluster to run the queries?

Also where is Hue installed, on which node and beneath the scene what is
happening to the query that you are writing in Hue, it is getting converted
into Hive Scripts and those scrips are ran on HDFS?

My understanding is that for interacting with HDFS, we would need different
clients like PIG/Hive/MapReduce Programs but my confusion is where do we
need to installed this clients inside Hadoop cluster, if yes then on which
node - name node or data node?

If we do not need to install this clients on Hadoop Cluster then where are
this clients installed and how do they interact with HDFS...My assumption
is that Hive/Pig/HUE/MapReduce programs are all client side stuffs that
would interact with HDFS on server side.

Please point out if my understanding is not correct here?

Thanks in advance Clay !!!




On Fri, Mar 14, 2014 at 1:01 PM, Clay McDonald <
stuart.mcdon...@bateswhite.com> wrote:

>  Also, I too created all my processes and SQL in Hortonworks' sandbox
> with small sample data. Then, we created 7 VMs and attached enough storage
> to handle the full dataset test. I installed and configured CentOS and
> installed Hortonwork HDP 2.0 using Ambari. The cluster is 4 datanodes and 3
> master nodes. Also, with the sandbox, Hue comes with it, but when you
> install with Ambari, Hue is not installed so you have to install Hue
> manually. Now I'm running queries on the full dataset. Clay
>
>
>
>
>
> *From:* Clay McDonald [mailto:stuart.mcdon...@bateswhite.com]
> *Sent:* Friday, March 14, 2014 12:52 PM
>
> *To:* 'user@hadoop.apache.org'
> *Subject:* RE: NodeManager health Question
>
>
>
> What do you want to know? Here is how it goes;
>
>
>
> 1.     We receive 6TB from an outside client and need to analyze the data
> quickly and report on our findings. I'm using an analysis that was done in
> our current environment with the same data.
>
> 2.     Upload the data to hdfs with -put
>
> 3.     Create tables in Hive with external like to the data in hdfs with
> STORED AS TEXTFILE LOCATION. (SQL is required for our analyst)
>
> 4.     Convert current SQL to HiveSQL and run analysis.
>
> 5.     Test ODBC connections to Hive data for pulling data.
>
>
>
> Clay
>
>
>
>
>
> *From:* ados1...@gmail.com [mailto:ados1...@gmail.com <ados1...@gmail.com>]
>
> *Sent:* Friday, March 14, 2014 11:40 AM
> *To:* user
> *Subject:* Re: NodeManager health Question
>
>
>
> Hey Clay,
>
>
>
> How have you loaded 6TB data into HDP? I am in a similar situation and
> wanted to understand your use case.
>
>
>
> On Thu, Mar 13, 2014 at 3:59 PM, Clay McDonald <
> stuart.mcdon...@bateswhite.com> wrote:
>
> Hello all, I have laid out my POC in a project plan and have HDP 2.0
> installed. HDFS is running fine and have loaded up about 6TB of data to run
> my test on. I have a series of SQL queries that I will run in Hive ver.
> 0.12.0. I had to manually install Hue and still have a few issues I'm
> working on there. But at the moment, my most pressing issue is with Hive
> jobs not running. In Yarn, my Hive queries are "Accepted" but are
> "Unassigned" and do not run. See attached.
>
>
>
> In Ambari, the datanodes all have the following error; NodeManager health
> CRIT for 20 days CRITICAL: NodeManager unhealthy
>
>
>
> From the datanode logs I found the following;
>
>
>
> ERROR datanode.DataNode (DataXceiver.java:run(225)) -
> dc-bigdata1.bateswhite.com:50010:DataXceiver error processing READ_BLOCK
> operation  src: /172.20.5.147:51299 dest: /172.20.5.141:50010
>
> java.net.SocketTimeoutException: 480000 millis timeout while waiting for
> channel to be ready for write. ch :
> java.nio.channels.SocketChannel[connected local=/172.20.5.141:50010remote=/
> 172.20.5.147:51299]
>
>             at
> org.apache.hadoop.net.SocketIOWithTimeout.waitForIO(SocketIOWithTimeout.java:246)
>
>             at
> org.apache.hadoop.net.SocketOutputStream.waitForWritable(SocketOutputStream.java:172)
>
>             at
> org.apache.hadoop.net.SocketOutputStream.transferToFully(SocketOutputStream.java:220)
>
>             at
> org.apache.hadoop.hdfs.server.datanode.BlockSender.sendPacket(BlockSender.java:546)
>
>             at
> org.apache.hadoop.hdfs.server.datanode.BlockSender.sendBlock(BlockSender.java:710)
>
>             at
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:340)
>
>             at
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opReadBlock(Receiver.java:101)
>
>             at
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:65)
>
>             at
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:221)
>
>             at java.lang.Thread.run(Thread.java:662)
>
>
>
> Also, in the namenode log I see the following;
>
>
>
> 2014-03-13 13:50:57,204 WARN  security.UserGroupInformation
> (UserGroupInformation.java:getGroupNames(1355)) - No groups available for
> user dr.who
>
>
>
>
>
> If anyone can point me in the right direction to troubleshoot this, I
> would really appreciate it!
>
>
>
> Thanks! Clay
>
>
>

Reply via email to