Re: One information about the Hive

Peyman Mohajerian Mon, 13 Jan 2014 08:30:08 -0800

I don't work for IBM, but found their training material helpful:
http://bigdatauniversity.com


There is a bit of biased toward IBM's stack, but they do a good job of
teaching Hive in general.


On Mon, Jan 13, 2014 at 3:01 AM, Nitin Pawar <nitinpawar...@gmail.com>wrote:

> The best way to answer your queries is,
>
> 1) set up a single node hadoop VM (there are readily available images from
> hortonworks and cloudera)
> 2) try to load data and see where it is stored (hive is a data access
> framework .. it does not store any data, information related to data is
> stored in metastore .. mainly hcatalog)
> 3) With hive its just writing queries and doing numbers, there are lot of
> file formats which do better with different kind of workloads.
>
> If you have basic understanding of hive and tried few queries you will
> find that hive is not a stand alone system (for now). It has hadoop
> mapreduce1 and hdfs then it has metastore then it has hive framework.
>
> You will need to understand bit more of hdfs as well.
>
> to answer your queries
>
> how the hive will connect with hadoop cluster,
>
> .. when you setup hive you can point it to a hadoop cluster or you can
> change these properties at table level.
>
>
> how the hive will  get the request,
>
> .. not sure what you mean by request .. if you mean the query then there
> are ways like hive cli (as I am aware development on this is getting less),
> then there are clients like beeline and then u have options of jdbc
> connections etc
>
>
> how the hive will process the request,
> .. how converts your query into an optimal mapreduce program and processes
> the data using that mapreduce program. How to convert a sql query to
> mapreduce program, you can look at ysmart framework from ohio university .
>
> after analysis ,where the analyzed data will be stored for further
> decision making
> .. hive does store any data automatically. You have to specifically
> mention where you want to save the data. a table or a file or something
> like that.
>
>
> On Mon, Jan 13, 2014 at 4:14 PM, Vikas Parashar <para.vi...@gmail.com>wrote:
>
>> Thanks Prashant, Definitely i shall go through that if needed. But from
>>  my experience, what i have faced is that user will have some integration
>> problem with HADOOP 2.
>>
>>
>>  Hi Vikas
>>>
>>>  Welcome to the world of Hive !
>>>
>>>  The first book u should read is by Capriolo , Wampler, Rutherglen
>>> Programming Hive
>>> http://www.amazon.com/Programming-Hive-Edward-Capriolo/dp/1449319335
>>>
>>>  This is a must read. I have immensely benefited from this book and the
>>> hive user group (the group is kickass).
>>>
>>>  If u r not sure of the details of HDFS/Hadoop then the Hadoop
>>> Definitive Guide (Tom White) is a must read.
>>> My view would be u should know both very well eventually...
>>>
>>
>>
>>
>>>  I have setup Hadoop and Hive cluster in three ways
>>> [1] manually thru tarballs (lightweight but u need to know what u r
>>> installing and where)
>>> [2] CDH & Cloudera manager (heavyweight but it does things in the
>>> background....easy to install and quick to setup on a sandbox and
>>> learn)...Plus Beeswax is s great starter UI for Hive queries
>>> [3] Using Amazon EMR Hive (I realize this is the easiest and the fastest
>>> to setup to learn Hive)
>>>
>>>  My suggestion , Don't go for option [1] - u learn a lot there but it
>>> could take time and u might feel frustrated as well
>>>
>>> using option [2] above , then I suggest
>>> - 1 or 2 boxes - i7 quad core (or u can use a 8 core AMD FX 8300) with
>>> 16-32GB RAM
>>> - download and install Cloudera manager
>>>
>>>  If u don't have access to box(es) to install hadoop/hive then the
>>> cheapest way  to learn is by using Amazon EMR
>>> - First create a S3 bucket and a folder to store a data file called
>>> songs.txt
>>>
>>>    1,2,lennon,john,nowhere man
>>>   1,3,lennon,john,strawberry fields forever
>>>   2,1,mccartney,paul,penny lane
>>>   2,2,mccartney,paul,michelle
>>>   2,3,mccartney,paul,yesterday
>>>    3,1,harrison,george,while my guitar gently weeps
>>>     3,2,harrison,george,i want to tell you
>>>    3,3,harrison,george,think for yourself
>>>    3,4,harrison,george,something
>>>     4,1,starr,ringo,octopuss garden
>>>     4,2,starr,ringo,with a liitle help from my friends
>>>
>>>  - Create a key pair from the AWS console and save the private key on
>>> your local desktop
>>>
>>>  - Create a EMR cluster with Hive installed
>>>
>>>  - ssh -i /path/on/your/desktop/to/amazonkeypair.pem   hadoop@
>>> <some-ec2-instance-name>.compute.amazonaws.com
>>>
>>>  - One the linux prompt
>>>    -->   hive -e "CREATE EXTERNAL TABLE IF NOT EXISTS songs(id INT,
>>> SEQID INT, LASTNAME STRING, FIRSTNAME STRING, SONGNAME STRING) ROW FORMAT
>>> DELIMITED FIELDS TERMINATED BY ',' "
>>>   --> hive -e "select songname from songs where lastname='lennon' OR
>>> lastname = 'harrison'"
>>>
>>>  Hope this helps
>>>
>>>  Hive on !!!
>>>
>>>  sanjay
>>>
>>>
>>>
>>>
>>>
>>>
>>>    id,seq,lastname,firstname,songname
>>>
>>>
>>>
>>>
>>>   ------------------------------
>>> *From:* Vikas Parashar [para.vi...@gmail.com]
>>> *Sent:* Sunday, January 12, 2014 10:50 PM
>>> *To:* Prashant Kumar - ERS, HCL Tech
>>> *Cc:* user@hive.apache.org
>>>
>>> *Subject:* Re: One information about the Hive
>>>
>>>   Prashant,
>>>
>>>
>>>    Actually I just started reading and understanding the Hive. Could
>>>> you please tell me how you learnt the Hive, you did any training. Is there
>>>> any institute which is reliable for specifically Hive  Training. I read
>>>> alots of tutorial on net, but still not able to co-relate the file which is
>>>> stored on the hadoop cluster and how the hive actually works. The complete
>>>> end to end transaction and its storage.Can you take some class on the pay
>>>> basis  and clear my question. Pl help me .
>>>>
>>>
>>> i have learnt from community and my personal experience. What i can do,
>>> i just fwd your request to some known member of Big Data.
>>>
>>>
>>>>
>>>>
>>>> Note: One imp thing, can I post the question directly to you, if you do
>>>> not mind and if I am not disturbing you.
>>>>
>>>
>>>  Please put all question's on community only.
>>>
>>>
>>>>
>>>>
>>>> Thanks
>>>>
>>>> Prashant
>>>>
>>>>
>>>>
>>>> *From:* Vikas Parashar [mailto:para.vi...@gmail.com]
>>>> *Sent:* Monday, January 13, 2014 11:07 AM
>>>> *To:* user@hive.apache.org
>>>> *Subject:* Re: One information about the Hive
>>>>
>>>>
>>>>
>>>> Prashant,
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> I am new to Hive, I am reading the doc which is available on Apache
>>>> site and try to create a correlation between hadoop and Hive. so please
>>>> help me to understand this:
>>>>
>>>> As per my understanding, all the files where unstructured data are
>>>> stored in HDFS system across the hadoop cluster. Now when we have to
>>>> analyze those data we use Hive.
>>>>
>>>> Now I have some question which I am not able to get :
>>>>
>>>>
>>>>
>>>> 1.When engineer/buisnessuser want to analyze the data, which is
>>>> available on any of the file on HDFS cluster, so what is the steps to get
>>>> the desired file and analyze the file using hive.
>>>>
>>>>
>>>>
>>>> You need to map it with hdfs. With the help of map-reduce, initially
>>>> you need to create some meta data in h catalog.
>>>>
>>>>
>>>>
>>>> May be it will help you..
>>>> http://hortonworks.com/use-cases/sentiment-analysis-hadoop-example/
>>>>
>>>>
>>>>
>>>>  2.Is Hive stores all the data in their tables after the analysis
>>>> permanently?
>>>>
>>>>
>>>>
>>>> Hive never store any data.
>>>>
>>>>
>>>>
>>>>  3.Is Hive itself a database?
>>>>
>>>>
>>>>
>>>> It is just a data-access framework.
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> Thanks
>>>>
>>>> Prashant
>>>>
>>>>
>>>>
>>>> ::DISCLAIMER::
>>>>
>>>> ----------------------------------------------------------------------------------------------------------------------------------------------------
>>>>
>>>> The contents of this e-mail and any attachment(s) are confidential and
>>>> intended for the named recipient(s) only.
>>>> E-mail transmission is not guaranteed to be secure or error-free as
>>>> information could be intercepted, corrupted,
>>>> lost, destroyed, arrive late or incomplete, or may contain viruses in
>>>> transmission. The e mail and its contents
>>>> (with or without referred errors) shall therefore not attach any
>>>> liability on the originator or HCL or its affiliates.
>>>> Views or opinions, if any, presented in this email are solely those of
>>>> the author and may not necessarily reflect the
>>>> views or opinions of HCL or its affiliates. Any form of reproduction,
>>>> dissemination, copying, disclosure, modification,
>>>> distribution and / or publication of this message without the prior
>>>> written consent of authorized representative of
>>>> HCL is strictly prohibited. If you have received this email in error
>>>> please delete it and notify the sender immediately.
>>>> Before opening any email and/or attachments, please check them for
>>>> viruses and other defects.
>>>>
>>>>
>>>> ----------------------------------------------------------------------------------------------------------------------------------------------------
>>>>
>>>>
>>>>
>>>
>>>
>>
>
>
> --
> Nitin Pawar
>

Re: One information about the Hive

Reply via email to