Hi Vikas Welcome to the world of Hive !
The first book u should read is by Capriolo , Wampler, Rutherglen Programming Hive http://www.amazon.com/Programming-Hive-Edward-Capriolo/dp/1449319335 This is a must read. I have immensely benefited from this book and the hive user group (the group is kickass). If u r not sure of the details of HDFS/Hadoop then the Hadoop Definitive Guide (Tom White) is a must read. My view would be u should know both very well eventually... I have setup Hadoop and Hive cluster in three ways [1] manually thru tarballs (lightweight but u need to know what u r installing and where) [2] CDH & Cloudera manager (heavyweight but it does things in the background....easy to install and quick to setup on a sandbox and learn)...Plus Beeswax is s great starter UI for Hive queries [3] Using Amazon EMR Hive (I realize this is the easiest and the fastest to setup to learn Hive) My suggestion , Don't go for option [1] - u learn a lot there but it could take time and u might feel frustrated as well using option [2] above , then I suggest - 1 or 2 boxes - i7 quad core (or u can use a 8 core AMD FX 8300) with 16-32GB RAM - download and install Cloudera manager If u don't have access to box(es) to install hadoop/hive then the cheapest way to learn is by using Amazon EMR - First create a S3 bucket and a folder to store a data file called songs.txt 1,2,lennon,john,nowhere man 1,3,lennon,john,strawberry fields forever 2,1,mccartney,paul,penny lane 2,2,mccartney,paul,michelle 2,3,mccartney,paul,yesterday 3,1,harrison,george,while my guitar gently weeps 3,2,harrison,george,i want to tell you 3,3,harrison,george,think for yourself 3,4,harrison,george,something 4,1,starr,ringo,octopuss garden 4,2,starr,ringo,with a liitle help from my friends - Create a key pair from the AWS console and save the private key on your local desktop - Create a EMR cluster with Hive installed - ssh -i /path/on/your/desktop/to/amazonkeypair.pem hadoop@<some-ec2-instance-name>.compute.amazonaws.com - One the linux prompt --> hive -e "CREATE EXTERNAL TABLE IF NOT EXISTS songs(id INT, SEQID INT, LASTNAME STRING, FIRSTNAME STRING, SONGNAME STRING) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' " --> hive -e "select songname from songs where lastname='lennon' OR lastname = 'harrison'" Hope this helps Hive on !!! sanjay id,seq,lastname,firstname,songname ________________________________ From: Vikas Parashar [para.vi...@gmail.com] Sent: Sunday, January 12, 2014 10:50 PM To: Prashant Kumar - ERS, HCL Tech Cc: user@hive.apache.org Subject: Re: One information about the Hive Prashant, Actually I just started reading and understanding the Hive. Could you please tell me how you learnt the Hive, you did any training. Is there any institute which is reliable for specifically Hive Training. I read alots of tutorial on net, but still not able to co-relate the file which is stored on the hadoop cluster and how the hive actually works. The complete end to end transaction and its storage.Can you take some class on the pay basis and clear my question. Pl help me . i have learnt from community and my personal experience. What i can do, i just fwd your request to some known member of Big Data. Note: One imp thing, can I post the question directly to you, if you do not mind and if I am not disturbing you. Please put all question's on community only. Thanks Prashant From: Vikas Parashar [mailto:para.vi...@gmail.com<mailto:para.vi...@gmail.com>] Sent: Monday, January 13, 2014 11:07 AM To: user@hive.apache.org<mailto:user@hive.apache.org> Subject: Re: One information about the Hive Prashant, I am new to Hive, I am reading the doc which is available on Apache site and try to create a correlation between hadoop and Hive. so please help me to understand this: As per my understanding, all the files where unstructured data are stored in HDFS system across the hadoop cluster. Now when we have to analyze those data we use Hive. Now I have some question which I am not able to get : 1.When engineer/buisnessuser want to analyze the data, which is available on any of the file on HDFS cluster, so what is the steps to get the desired file and analyze the file using hive. You need to map it with hdfs. With the help of map-reduce, initially you need to create some meta data in h catalog. May be it will help you..http://hortonworks.com/use-cases/sentiment-analysis-hadoop-example/ 2.Is Hive stores all the data in their tables after the analysis permanently? Hive never store any data. 3.Is Hive itself a database? It is just a data-access framework. Thanks Prashant ::DISCLAIMER:: ---------------------------------------------------------------------------------------------------------------------------------------------------- The contents of this e-mail and any attachment(s) are confidential and intended for the named recipient(s) only. E-mail transmission is not guaranteed to be secure or error-free as information could be intercepted, corrupted, lost, destroyed, arrive late or incomplete, or may contain viruses in transmission. The e mail and its contents (with or without referred errors) shall therefore not attach any liability on the originator or HCL or its affiliates. Views or opinions, if any, presented in this email are solely those of the author and may not necessarily reflect the views or opinions of HCL or its affiliates. Any form of reproduction, dissemination, copying, disclosure, modification, distribution and / or publication of this message without the prior written consent of authorized representative of HCL is strictly prohibited. If you have received this email in error please delete it and notify the sender immediately. Before opening any email and/or attachments, please check them for viruses and other defects. ----------------------------------------------------------------------------------------------------------------------------------------------------