I don't work for IBM, but found their training material helpful: http://bigdatauniversity.com
There is a bit of biased toward IBM's stack, but they do a good job of teaching Hive in general. On Mon, Jan 13, 2014 at 3:01 AM, Nitin Pawar <nitinpawar...@gmail.com>wrote: > The best way to answer your queries is, > > 1) set up a single node hadoop VM (there are readily available images from > hortonworks and cloudera) > 2) try to load data and see where it is stored (hive is a data access > framework .. it does not store any data, information related to data is > stored in metastore .. mainly hcatalog) > 3) With hive its just writing queries and doing numbers, there are lot of > file formats which do better with different kind of workloads. > > If you have basic understanding of hive and tried few queries you will > find that hive is not a stand alone system (for now). It has hadoop > mapreduce1 and hdfs then it has metastore then it has hive framework. > > You will need to understand bit more of hdfs as well. > > to answer your queries > > how the hive will connect with hadoop cluster, > > .. when you setup hive you can point it to a hadoop cluster or you can > change these properties at table level. > > > how the hive will get the request, > > .. not sure what you mean by request .. if you mean the query then there > are ways like hive cli (as I am aware development on this is getting less), > then there are clients like beeline and then u have options of jdbc > connections etc > > > how the hive will process the request, > .. how converts your query into an optimal mapreduce program and processes > the data using that mapreduce program. How to convert a sql query to > mapreduce program, you can look at ysmart framework from ohio university . > > after analysis ,where the analyzed data will be stored for further > decision making > .. hive does store any data automatically. You have to specifically > mention where you want to save the data. a table or a file or something > like that. > > > On Mon, Jan 13, 2014 at 4:14 PM, Vikas Parashar <para.vi...@gmail.com>wrote: > >> Thanks Prashant, Definitely i shall go through that if needed. But from >> my experience, what i have faced is that user will have some integration >> problem with HADOOP 2. >> >> >> Hi Vikas >>> >>> Welcome to the world of Hive ! >>> >>> The first book u should read is by Capriolo , Wampler, Rutherglen >>> Programming Hive >>> http://www.amazon.com/Programming-Hive-Edward-Capriolo/dp/1449319335 >>> >>> This is a must read. I have immensely benefited from this book and the >>> hive user group (the group is kickass). >>> >>> If u r not sure of the details of HDFS/Hadoop then the Hadoop >>> Definitive Guide (Tom White) is a must read. >>> My view would be u should know both very well eventually... >>> >> >> >> >>> I have setup Hadoop and Hive cluster in three ways >>> [1] manually thru tarballs (lightweight but u need to know what u r >>> installing and where) >>> [2] CDH & Cloudera manager (heavyweight but it does things in the >>> background....easy to install and quick to setup on a sandbox and >>> learn)...Plus Beeswax is s great starter UI for Hive queries >>> [3] Using Amazon EMR Hive (I realize this is the easiest and the fastest >>> to setup to learn Hive) >>> >>> My suggestion , Don't go for option [1] - u learn a lot there but it >>> could take time and u might feel frustrated as well >>> >>> using option [2] above , then I suggest >>> - 1 or 2 boxes - i7 quad core (or u can use a 8 core AMD FX 8300) with >>> 16-32GB RAM >>> - download and install Cloudera manager >>> >>> If u don't have access to box(es) to install hadoop/hive then the >>> cheapest way to learn is by using Amazon EMR >>> - First create a S3 bucket and a folder to store a data file called >>> songs.txt >>> >>> 1,2,lennon,john,nowhere man >>> 1,3,lennon,john,strawberry fields forever >>> 2,1,mccartney,paul,penny lane >>> 2,2,mccartney,paul,michelle >>> 2,3,mccartney,paul,yesterday >>> 3,1,harrison,george,while my guitar gently weeps >>> 3,2,harrison,george,i want to tell you >>> 3,3,harrison,george,think for yourself >>> 3,4,harrison,george,something >>> 4,1,starr,ringo,octopuss garden >>> 4,2,starr,ringo,with a liitle help from my friends >>> >>> - Create a key pair from the AWS console and save the private key on >>> your local desktop >>> >>> - Create a EMR cluster with Hive installed >>> >>> - ssh -i /path/on/your/desktop/to/amazonkeypair.pem hadoop@ >>> <some-ec2-instance-name>.compute.amazonaws.com >>> >>> - One the linux prompt >>> --> hive -e "CREATE EXTERNAL TABLE IF NOT EXISTS songs(id INT, >>> SEQID INT, LASTNAME STRING, FIRSTNAME STRING, SONGNAME STRING) ROW FORMAT >>> DELIMITED FIELDS TERMINATED BY ',' " >>> --> hive -e "select songname from songs where lastname='lennon' OR >>> lastname = 'harrison'" >>> >>> Hope this helps >>> >>> Hive on !!! >>> >>> sanjay >>> >>> >>> >>> >>> >>> >>> id,seq,lastname,firstname,songname >>> >>> >>> >>> >>> ------------------------------ >>> *From:* Vikas Parashar [para.vi...@gmail.com] >>> *Sent:* Sunday, January 12, 2014 10:50 PM >>> *To:* Prashant Kumar - ERS, HCL Tech >>> *Cc:* user@hive.apache.org >>> >>> *Subject:* Re: One information about the Hive >>> >>> Prashant, >>> >>> >>> Actually I just started reading and understanding the Hive. Could >>>> you please tell me how you learnt the Hive, you did any training. Is there >>>> any institute which is reliable for specifically Hive Training. I read >>>> alots of tutorial on net, but still not able to co-relate the file which is >>>> stored on the hadoop cluster and how the hive actually works. The complete >>>> end to end transaction and its storage.Can you take some class on the pay >>>> basis and clear my question. Pl help me . >>>> >>> >>> i have learnt from community and my personal experience. What i can do, >>> i just fwd your request to some known member of Big Data. >>> >>> >>>> >>>> >>>> Note: One imp thing, can I post the question directly to you, if you do >>>> not mind and if I am not disturbing you. >>>> >>> >>> Please put all question's on community only. >>> >>> >>>> >>>> >>>> Thanks >>>> >>>> Prashant >>>> >>>> >>>> >>>> *From:* Vikas Parashar [mailto:para.vi...@gmail.com] >>>> *Sent:* Monday, January 13, 2014 11:07 AM >>>> *To:* user@hive.apache.org >>>> *Subject:* Re: One information about the Hive >>>> >>>> >>>> >>>> Prashant, >>>> >>>> >>>> >>>> >>>> >>>> I am new to Hive, I am reading the doc which is available on Apache >>>> site and try to create a correlation between hadoop and Hive. so please >>>> help me to understand this: >>>> >>>> As per my understanding, all the files where unstructured data are >>>> stored in HDFS system across the hadoop cluster. Now when we have to >>>> analyze those data we use Hive. >>>> >>>> Now I have some question which I am not able to get : >>>> >>>> >>>> >>>> 1.When engineer/buisnessuser want to analyze the data, which is >>>> available on any of the file on HDFS cluster, so what is the steps to get >>>> the desired file and analyze the file using hive. >>>> >>>> >>>> >>>> You need to map it with hdfs. With the help of map-reduce, initially >>>> you need to create some meta data in h catalog. >>>> >>>> >>>> >>>> May be it will help you.. >>>> http://hortonworks.com/use-cases/sentiment-analysis-hadoop-example/ >>>> >>>> >>>> >>>> 2.Is Hive stores all the data in their tables after the analysis >>>> permanently? >>>> >>>> >>>> >>>> Hive never store any data. >>>> >>>> >>>> >>>> 3.Is Hive itself a database? >>>> >>>> >>>> >>>> It is just a data-access framework. >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> Thanks >>>> >>>> Prashant >>>> >>>> >>>> >>>> ::DISCLAIMER:: >>>> >>>> ---------------------------------------------------------------------------------------------------------------------------------------------------- >>>> >>>> The contents of this e-mail and any attachment(s) are confidential and >>>> intended for the named recipient(s) only. >>>> E-mail transmission is not guaranteed to be secure or error-free as >>>> information could be intercepted, corrupted, >>>> lost, destroyed, arrive late or incomplete, or may contain viruses in >>>> transmission. The e mail and its contents >>>> (with or without referred errors) shall therefore not attach any >>>> liability on the originator or HCL or its affiliates. >>>> Views or opinions, if any, presented in this email are solely those of >>>> the author and may not necessarily reflect the >>>> views or opinions of HCL or its affiliates. Any form of reproduction, >>>> dissemination, copying, disclosure, modification, >>>> distribution and / or publication of this message without the prior >>>> written consent of authorized representative of >>>> HCL is strictly prohibited. If you have received this email in error >>>> please delete it and notify the sender immediately. >>>> Before opening any email and/or attachments, please check them for >>>> viruses and other defects. >>>> >>>> >>>> ---------------------------------------------------------------------------------------------------------------------------------------------------- >>>> >>>> >>>> >>> >>> >> > > > -- > Nitin Pawar >