Re: help in hive

2012-10-22 Thread MiaoMiao
Try SELECT client, receive_day, receive_hour as start_time, receive_hour+1 as end_time FROM some_table WHERE client='xyz' AND receive_day=7 ORDER BY start_time; On Mon, Oct 22, 2012 at 4:41 PM, dyuti a hadoop.hiv...@gmail.com wrote: Hi all, I have a hive table with 235 million records.

Implementing a star schema (facts dimension model)

2012-10-22 Thread Austin Chungath
Hi, I am new to data warehousing in hadoop. This might be a trivial question but I was unable to find any answers in the mailing list. My questions are: A person has an existing data warehouse that uses a star schema (implemented in a mysql database).How to migrate it to Hadoop? I can use sqoop

Re: Implementing a star schema (facts dimension model)

2012-10-22 Thread Bejoy KS
Hi Austin You can import the existing tables to hive as such using sqoop. Hive is a wrapper over mapreduce that gives you the flexibility to create optimized mapreduce jobs using SQL like syntax. The is no relational style maintained in hive and don't treat hive as a typical

Re: Implementing a star schema (facts dimension model)

2012-10-22 Thread Manish Bhoge
Austin, There are some of the great questions asked simply in your email. Datawarehouse and hadoop echo system goes hand-on-hand. I don't think you need to move all data from your warehouse to hive and hbase. This is the key :) you need to understand where should you use have and where can you

How to run multiple Hive queries in parallel

2012-10-22 Thread Chunky Gupta
Hi, I have one name node machine and under which there are 4 slaves machines to run the job. The way users run queries is - They ssh into the name node machine - They initiate hive and submit their queries Currently multiple users log in with the same credentials and submit queries Whenever 2

Re: How to run multiple Hive queries in parallel

2012-10-22 Thread Bejoy KS
Hi Is your hive queries in waiting mode even though there are task slots available on your cluster? If task slots are getting exhausted and you need parallelism here, then you may need to look at some approaches of using fair scheduler and different user accounts for each user so that each

Re: help in hive

2012-10-22 Thread dyuti a
Hi, Thank you so much for your help. It works great. Regards, dti On Mon, Oct 22, 2012 at 2:18 PM, MiaoMiao liy...@gmail.com wrote: Try SELECT client, receive_day, receive_hour as start_time, receive_hour+1 as end_time FROM some_table WHERE client='xyz' AND receive_day=7 ORDER BY

Re: How to run multiple Hive queries in parallel

2012-10-22 Thread Bertrand Dechoux
Bejoy is right. I just want to say explicitly that the scheduler configuration is something which is orthogonal to the use of Hive. (ie same problem with Pig or standard MapReduce jobs). Regards Bertrand PS : There is also the capacity scheduler. On Mon, Oct 22, 2012 at 2:18 PM, Bejoy KS

Re: How to run multiple Hive queries in parallel

2012-10-22 Thread Chunky Gupta
Hi Bejoy and Bertrand Thanks for quick reply. I think tasks slots are not available in my cluster because I have only 4 slave machines. Actually I am beginner to HIVE. So, if you can let me know how I can check if time slots are available or not. I have different users credentials to log in

Re: How to run multiple Hive queries in parallel

2012-10-22 Thread Bejoy KS
Hi From the jobtracker web UI you can get the total number of map and reduce slots. Also from the wen UI itself you can get the num of running map/reduce tasks. Second value subtracted from first would give you the available slots. Fair scheduler is a property of map reduce and not of hive.

JOIN comparasion PIG V/S HIVE

2012-10-22 Thread yogesh dhari
Hi All, Is it true that Pig's JOIN operation is not so efficient as of HIVE. I have just tried over and found differences over JOIN query. Hive resulted the same as My Sql but Pig resulted some counts lesser then Hive Join. Please put some light over JOINS in Pig and Hive. Regards Yogesh

Query

2012-10-22 Thread Venugopal Krishnan
Hi, We have a requirement where we need to print the column headers in the generated file on executing a query. We are using Jdbc hive client to execute the query. Regards, Venugopal http://www.mindtree.com/email/disclaimer.html