RE: One information about the Hive

Subramanian, Sanjay (HQP) Sun, 12 Jan 2014 23:32:14 -0800

Hi Vikas

Welcome to the world of Hive !


The first book u should read is by Capriolo , Wampler, Rutherglen
Programming Hive
http://www.amazon.com/Programming-Hive-Edward-Capriolo/dp/1449319335

This is a must read. I have immensely benefited from this book and the hive 
user group (the group is kickass).

If u r not sure of the details of HDFS/Hadoop then the Hadoop Definitive Guide 
(Tom White) is a must read.
My view would be u should know both very well eventually...

I have setup Hadoop and Hive cluster in three ways
[1] manually thru tarballs (lightweight but u need to know what u r installing 
and where)
[2] CDH & Cloudera manager (heavyweight but it does things in the 
background....easy to install and quick to setup on a sandbox and learn)...Plus 
Beeswax is s great starter UI for Hive queries
[3] Using Amazon EMR Hive (I realize this is the easiest and the fastest to 
setup to learn Hive)

My suggestion , Don't go for option [1] - u learn a lot there but it could take 
time and u might feel frustrated as well

using option [2] above , then I suggest
- 1 or 2 boxes - i7 quad core (or u can use a 8 core AMD FX 8300) with 16-32GB 
RAM
- download and install Cloudera manager

If u don't have access to box(es) to install hadoop/hive then the cheapest way  
to learn is by using Amazon EMR
- First create a S3 bucket and a folder to store a data file called songs.txt

  1,2,lennon,john,nowhere man
  1,3,lennon,john,strawberry fields forever
  2,1,mccartney,paul,penny lane
  2,2,mccartney,paul,michelle
  2,3,mccartney,paul,yesterday
  3,1,harrison,george,while my guitar gently weeps
  3,2,harrison,george,i want to tell you
  3,3,harrison,george,think for yourself
  3,4,harrison,george,something
  4,1,starr,ringo,octopuss garden
  4,2,starr,ringo,with a liitle help from my friends

- Create a key pair from the AWS console and save the private key on your local 
desktop

- Create a EMR cluster with Hive installed

- ssh -i /path/on/your/desktop/to/amazonkeypair.pem   
hadoop@<some-ec2-instance-name>.compute.amazonaws.com

- One the linux prompt
  -->   hive -e "CREATE EXTERNAL TABLE IF NOT EXISTS songs(id INT, SEQID INT, 
LASTNAME STRING, FIRSTNAME STRING, SONGNAME STRING) ROW FORMAT DELIMITED FIELDS 
TERMINATED BY ',' "
 --> hive -e "select songname from songs where lastname='lennon' OR lastname = 
'harrison'"

Hope this helps

Hive on !!!

sanjay






  id,seq,lastname,firstname,songname




________________________________
From: Vikas Parashar [para.vi...@gmail.com]
Sent: Sunday, January 12, 2014 10:50 PM
To: Prashant Kumar - ERS, HCL Tech
Cc: user@hive.apache.org
Subject: Re: One information about the Hive

Prashant,


Actually I just started reading and understanding the Hive. Could you please 
tell me how you learnt the Hive, you did any training. Is there any institute 
which is reliable for specifically Hive  Training. I read alots of tutorial on 
net, but still not able to co-relate the file which is stored on the hadoop 
cluster and how the hive actually works. The complete end to end transaction 
and its storage.Can you take some class on the pay basis  and clear my 
question. Pl help me .

i have learnt from community and my personal experience. What i can do, i just 
fwd your request to some known member of Big Data.


Note: One imp thing, can I post the question directly to you, if you do not 
mind and if I am not disturbing you.

Please put all question's on community only.


Thanks
Prashant

From: Vikas Parashar [mailto:para.vi...@gmail.com<mailto:para.vi...@gmail.com>]
Sent: Monday, January 13, 2014 11:07 AM
To: user@hive.apache.org<mailto:user@hive.apache.org>
Subject: Re: One information about the Hive

Prashant,


I am new to Hive, I am reading the doc which is available on Apache site and 
try to create a correlation between hadoop and Hive. so please help me to 
understand this:
As per my understanding, all the files where unstructured data are stored in 
HDFS system across the hadoop cluster. Now when we have to analyze those data 
we use Hive.
Now I have some question which I am not able to get :

1.When engineer/buisnessuser want to analyze the data, which is available on 
any of the file on HDFS cluster, so what is the steps to get the desired file 
and analyze the file using hive.

You need to map it with hdfs. With the help of map-reduce, initially you need 
to create some meta data in h catalog.

May be it will help 
you..http://hortonworks.com/use-cases/sentiment-analysis-hadoop-example/

2.Is Hive stores all the data in their tables after the analysis permanently?

Hive never store any data.

3.Is Hive itself a database?

It is just a data-access framework.




Thanks
Prashant


::DISCLAIMER::
----------------------------------------------------------------------------------------------------------------------------------------------------
The contents of this e-mail and any attachment(s) are confidential and intended 
for the named recipient(s) only.
E-mail transmission is not guaranteed to be secure or error-free as information 
could be intercepted, corrupted,
lost, destroyed, arrive late or incomplete, or may contain viruses in 
transmission. The e mail and its contents
(with or without referred errors) shall therefore not attach any liability on 
the originator or HCL or its affiliates.
Views or opinions, if any, presented in this email are solely those of the 
author and may not necessarily reflect the
views or opinions of HCL or its affiliates. Any form of reproduction, 
dissemination, copying, disclosure, modification,
distribution and / or publication of this message without the prior written 
consent of authorized representative of
HCL is strictly prohibited. If you have received this email in error please 
delete it and notify the sender immediately.
Before opening any email and/or attachments, please check them for viruses and 
other defects.
----------------------------------------------------------------------------------------------------------------------------------------------------

RE: One information about the Hive

Reply via email to