You can use Hortonworks data platform which already integrates HDFS, MapReduce and Hive well. http://hortonworks.com/products/hortonworksdataplatform/
Came across this new solution recently, They claim to be Hadoop based Standard SQL solution for data analytics. http://queryio.com/hadoop-big-data-product/hadoop-hive.html Have not given it a try yet but you can explore it. -Richard On Tue, Feb 5, 2013 at 10:07 AM, * *Preethi Vinayak Ponangi < vinayakpona...@gmail.com> wrote: > *From: *Preethi Vinayak Ponangi <vinayakpona...@gmail.com> > *Subject: **Re: Application of Cloudera Hadoop for Dataset analysis* > *Date: *February 5, 2013 8:07:47 AM PST > *To: *user@hadoop.apache.org > *Reply-To: *user@hadoop.apache.org > > It depends on what part of the Hadoop Eco system component you would like > to use. > > You can do it in several ways: > > 1) You could write a basic map reduce job to do joins. > This link could help or just a basic search on google would give you > several links. > > http://chamibuddhika.wordpress.com/2012/02/26/joins-with-map-reduce/ > > 2) You could use an abstract language like Pig to do these joins using > simple pig scripts. > http://pig.apache.org/docs/r0.7.0/piglatin_ref2.html > > 3) The simplest of all, you could write SQL like queries to do this join > using Hive. > http://hive.apache.org/ > > Hope this helps. > > Regards, > Vinayak. > > > On Tue, Feb 5, 2013 at 10:00 AM, Suresh Srinivas > <sur...@hortonworks.com>wrote: > >> Please take this thread to CDH mailing list. >> >> >> On Tue, Feb 5, 2013 at 2:43 AM, Sharath Chandra Guntuku < >> sharathchandr...@gmail.com> wrote: >> >>> Hi, >>> >>> I am Sharath Chandra, an undergraduate student at BITS-Pilani, India. I >>> would like to get the following clarifications regarding cloudera hadoop >>> distribution. I am using a CDH4 Demo VM for now. >>> >>> 1. After I upload the files into the file browser, if I have to link >>> two-three datasets using a key in those files, what should I do? Do I have >>> to run a query over them? >>> >>> 2. My objective is that I have some data collected over a few years and >>> now, I would like to link all of them, as in a database using keys and then >>> run queries over them to find out particular patterns. Later I would like >>> to implement some Machine learning algorithms on them for predictive >>> analysis. Will this be possible on the demo VM? >>> >>> I am totally new to this. Can I get some help on this? I would be very >>> grateful for the same. >>> >>> >>> ------------------------------------------------------------------------------ >>> Thanks and Regards, >>> *Sharath Chandra Guntuku* >>> Undergraduate Student (Final Year) >>> *Computer Science Department* >>> *Email*: f2009...@hyderabad.bits-pilani.ac.in >>> >>> *BITS-Pilani*, Hyderabad Campus >>> Jawahar Nagar, Shameerpet, RR Dist, >>> Hyderabad - 500078, Andhra Pradesh >>> >> >> >> >> -- >> http://hortonworks.com/download/ >> > > >