Block Sampling

2012-06-15 Thread Ladda, Anand
Has the block sampling feature been added to one of the latest (Hive 0.8 or Hive 0.9) releases. The wiki has the blurb below on block sampling Block Sampling It is a feature that is still on trunk and is not yet in any release version. block_sample: TABLESAMPLE (n PERCENT) This will allow Hive to

RE: Block Sampling

2012-06-15 Thread Ladda, Anand
Hi Anand, This feature was implemented in HIVE-2121 and appeared in Hive 0.8.0. Ref: https://issues.apache.org/jira/browse/HIVE-2121 Thanks. Carl On Fri, Jun 15, 2012 at 11:59 AM, Ladda, Anand lan...@microstrategy.commailto:lan...@microstrategy.com wrote: Has the block sampling feature been

Block Sampling Impact

2012-06-15 Thread Ladda, Anand
Hi I was trying block sampling on a 6 million (~400MB sized table) and can see if I sample about 1 percent of the data I get about 3x faster response on the queries (I can also see difference in the data returned). The input format though is 'org.apache.hadoop.mapred.TextInputFormat' and not

RE: Front end visualization tool with Hive (when using as a warehouse)

2012-06-04 Thread Ladda, Anand
I agree with Bejoy's assessment - Hive is good for processing large volumes of data in a batch manner. But for real-time or any complex SQL based analysis you would typically want to have some type of a RDBMS in the mix along with Hadoop/Hive. In terms of what's missing in Hive today - On the

RE: Filtering on TIMESTAMP data type

2012-06-04 Thread Ladda, Anand
this to work or is there something else I should be using From: Ladda, Anand Sent: Monday, May 28, 2012 11:00 AM To: user@hive.apache.org Subject: RE: FW: Filtering on TIMESTAMP data type Debarshi Didn't quite follow your first comment. I get the write-your-own UDF part but was wondering how others have

Edit Rights to Hive Wiki

2012-05-29 Thread Ladda, Anand
Can someone grant me edit rights to the Hive Wiki? Thanks Anand

RE: FW: Filtering on TIMESTAMP data type

2012-05-28 Thread Ladda, Anand
Website: http://www.tcs.com Experience certainty. IT Services Business Solutions Outsourcing -Ladda, Anand wrote: - To: user@hive.apache.orgmailto:user@hive.apache.org user@hive.apache.orgmailto:user

FW: Filtering on TIMESTAMP data type

2012-05-26 Thread Ladda, Anand
How do I set-up a filter constant for TIMESTAMP datatype. In Hive 0.7 since timestamps were represented as strings a query like this would return data select * from LU_day where day_date ='2010-01-01 00:00:00'; But now with day_date as a TIMESTAMP column it doesn't. Is there some type of a

RE: [Marketing Mail] Doubts related to Amazon EMR

2012-04-23 Thread Ladda, Anand
Once you have a Hive Job flow running on Amazon EMR, you'll have access to the file system on the underlying EC2 machines (you'll get the machine name, etc once the cluster is running). You can then move your data files on the EC2 machine file system and load it into HDFS/Hive. I am not sure

Row Group Size of RCFile

2012-04-18 Thread Ladda, Anand
How do I set the Row Group Size of RCFile in Hive CREATE TABLE OrderFactPartClustRcFile( order_id INT, emp_id INT, order_amt FLOAT, order_cost FLOAT, qty_sold FLOAT, freight FLOAT, gross_dollar_sales FLOAT, ship_date STRING, rush_order STRING, customer_id INT, pymt_type INT,

RE: Hive Meta Information

2012-04-03 Thread Ladda, Anand
to process hadoop logs and based on that you can figure out who accessed the data and how Thanks, Nitin On Sat, Mar 31, 2012 at 3:36 AM, Ladda, Anand lan...@microstrategy.com wrote: How do I get the following meta information about a table 1. recent users of table, 2. top

RE: Hive Queries Performance Tuning - Map side joins, Map side aggregations, Partitioning/Clustering

2012-04-03 Thread Ladda, Anand
Bejoy KS From: Ladda, Anand lan...@microstrategy.commailto:lan...@microstrategy.com To: user@hive.apache.orgmailto:user@hive.apache.org user@hive.apache.orgmailto:user@hive.apache.org Sent: Sunday, April 1, 2012 11:59 PM Subject: Hive Queries Performance Tuning - Map

Table Statistics In Hive

2012-04-02 Thread Ladda, Anand
I've tried to collect statistics on an existing table in hive using the commands mentioned in this wiki page - https://cwiki.apache.org/confluence/display/Hive/StatsDev ANALYZE TABLE [TABLENAME] PARTITION(parcol1=..., partcol2=) COMPUTE STATISTICS But when I do a DESCRIBE EXTENDED

Hive Queries Performance Tuning - Map side joins, Map side aggregations, Partitioning/Clustering

2012-04-01 Thread Ladda, Anand
I am trying to understand what are some of the options/settings available to tune the performance of Hive Queries. I have seen the benefits of Map side joins and Partitioning/Clustering. However I have yet to realize the impact map side aggregation has on query performance. I tried running this

Hive Meta Information

2012-03-30 Thread Ladda, Anand
How do I get the following meta information about a table 1. recent users of table, 2. top users of table, 3. recent queries/jobs/reports, 4. number of rows in a table I don't see anything either in DESCRIBE FORMATTED or SHOW TABLE EXTENDED LIKE commands. Thanks