Hi, I downloaded this book someday back and started reading it. This book is giving me the programing style using hive. But when I tried to visualize how the hive will connect with hadoop cluster, how the hive will get the request, how the hive will process the request, after analysis ,where the analyzed data will be stored for further decision making. So I need to understand the end to end cycle from the request come to business user, and how this request will be served by this architecture and how this aggregated data/structured data will be stored for future reference. How the unstructured data will be stored in structured fashion somewhere(means where).
So how I can get these answer, pl suggest. I am new to Hive world, so may be I am asking childish question, but for me its imp to clear the concept. I have fair knowledge of sql(10 year experience in Oracle) so for me it will not be tough to understand the hiveQL. Thanks Prashant From: Subramanian, Sanjay (HQP) [mailto:sanjay.subraman...@roberthalf.com] Sent: Monday, January 13, 2014 1:01 PM To: user@hive.apache.org; Prashant Kumar - ERS, HCL Tech Subject: RE: One information about the Hive Hi Vikas Welcome to the world of Hive ! The first book u should read is by Capriolo , Wampler, Rutherglen Programming Hive http://www.amazon.com/Programming-Hive-Edward-Capriolo/dp/1449319335 This is a must read. I have immensely benefited from this book and the hive user group (the group is kickass). If u r not sure of the details of HDFS/Hadoop then the Hadoop Definitive Guide (Tom White) is a must read. My view would be u should know both very well eventually... I have setup Hadoop and Hive cluster in three ways [1] manually thru tarballs (lightweight but u need to know what u r installing and where) [2] CDH & Cloudera manager (heavyweight but it does things in the background....easy to install and quick to setup on a sandbox and learn)...Plus Beeswax is s great starter UI for Hive queries [3] Using Amazon EMR Hive (I realize this is the easiest and the fastest to setup to learn Hive) My suggestion , Don't go for option [1] - u learn a lot there but it could take time and u might feel frustrated as well using option [2] above , then I suggest - 1 or 2 boxes - i7 quad core (or u can use a 8 core AMD FX 8300) with 16-32GB RAM - download and install Cloudera manager If u don't have access to box(es) to install hadoop/hive then the cheapest way to learn is by using Amazon EMR - First create a S3 bucket and a folder to store a data file called songs.txt 1,2,lennon,john,nowhere man 1,3,lennon,john,strawberry fields forever 2,1,mccartney,paul,penny lane 2,2,mccartney,paul,michelle 2,3,mccartney,paul,yesterday 3,1,harrison,george,while my guitar gently weeps 3,2,harrison,george,i want to tell you 3,3,harrison,george,think for yourself 3,4,harrison,george,something 4,1,starr,ringo,octopuss garden 4,2,starr,ringo,with a liitle help from my friends - Create a key pair from the AWS console and save the private key on your local desktop - Create a EMR cluster with Hive installed - ssh -i /path/on/your/desktop/to/amazonkeypair.pem hadoop@<some-ec2-instance-name>.compute.amazonaws.com<mailto:hadoop@%3csome-ec2-instance-name%3e.compute.amazonaws.com> - One the linux prompt --> hive -e "CREATE EXTERNAL TABLE IF NOT EXISTS songs(id INT, SEQID INT, LASTNAME STRING, FIRSTNAME STRING, SONGNAME STRING) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' " --> hive -e "select songname from songs where lastname='lennon' OR lastname = 'harrison'" Hope this helps Hive on !!! sanjay id,seq,lastname,firstname,songname ________________________________ From: Vikas Parashar [para.vi...@gmail.com] Sent: Sunday, January 12, 2014 10:50 PM To: Prashant Kumar - ERS, HCL Tech Cc: user@hive.apache.org<mailto:user@hive.apache.org> Subject: Re: One information about the Hive Prashant, Actually I just started reading and understanding the Hive. Could you please tell me how you learnt the Hive, you did any training. Is there any institute which is reliable for specifically Hive Training. I read alots of tutorial on net, but still not able to co-relate the file which is stored on the hadoop cluster and how the hive actually works. The complete end to end transaction and its storage.Can you take some class on the pay basis and clear my question. Pl help me . i have learnt from community and my personal experience. What i can do, i just fwd your request to some known member of Big Data. Note: One imp thing, can I post the question directly to you, if you do not mind and if I am not disturbing you. Please put all question's on community only. Thanks Prashant From: Vikas Parashar [mailto:para.vi...@gmail.com<mailto:para.vi...@gmail.com>] Sent: Monday, January 13, 2014 11:07 AM To: user@hive.apache.org<mailto:user@hive.apache.org> Subject: Re: One information about the Hive Prashant, I am new to Hive, I am reading the doc which is available on Apache site and try to create a correlation between hadoop and Hive. so please help me to understand this: As per my understanding, all the files where unstructured data are stored in HDFS system across the hadoop cluster. Now when we have to analyze those data we use Hive. Now I have some question which I am not able to get : 1.When engineer/buisnessuser want to analyze the data, which is available on any of the file on HDFS cluster, so what is the steps to get the desired file and analyze the file using hive. You need to map it with hdfs. With the help of map-reduce, initially you need to create some meta data in h catalog. May be it will help you..http://hortonworks.com/use-cases/sentiment-analysis-hadoop-example/ 2.Is Hive stores all the data in their tables after the analysis permanently? Hive never store any data. 3.Is Hive itself a database? It is just a data-access framework. Thanks Prashant ::DISCLAIMER:: ---------------------------------------------------------------------------------------------------------------------------------------------------- The contents of this e-mail and any attachment(s) are confidential and intended for the named recipient(s) only. E-mail transmission is not guaranteed to be secure or error-free as information could be intercepted, corrupted, lost, destroyed, arrive late or incomplete, or may contain viruses in transmission. The e mail and its contents (with or without referred errors) shall therefore not attach any liability on the originator or HCL or its affiliates. Views or opinions, if any, presented in this email are solely those of the author and may not necessarily reflect the views or opinions of HCL or its affiliates. Any form of reproduction, dissemination, copying, disclosure, modification, distribution and / or publication of this message without the prior written consent of authorized representative of HCL is strictly prohibited. If you have received this email in error please delete it and notify the sender immediately. Before opening any email and/or attachments, please check them for viruses and other defects. ----------------------------------------------------------------------------------------------------------------------------------------------------