Re: Hive/HBase integration issue.

2010-11-18 Thread afancy
Hi, Does the INSERT clause have to include the OVERWRITE, which means that the new data will overwrite the previous data? How to implement the indeed INSERT operation, instead of OVERWRITE? BTW: How to implement the DELETE operator? thanks afancy --

Re: Hive/HBase integration issue.

2010-11-18 Thread John Sichi
This is unrelated to Hive/HBase integration; it looks like a Hadoop version issue. JVS On Nov 17, 2010, at 9:56 PM, Vivek Mishra wrote: > Hi, > Currently I am facing an issue with Hive/HBase integration. > > Exception in thread "main" java.lang.NoSuchMethodError: > org.apache.hadoop.util.She

Re: Hive/HBase integration issue.

2010-11-18 Thread John Sichi
As noted here, when writing to HBase, existing rows are overwritten, but old rows are not deleted. http://wiki.apache.org/hadoop/Hive/HBaseIntegration#Overwrite There is not yet any deletion support. JVS On Nov 18, 2010, at 1:00 AM, afancy wrote: > Hi, > > Does the INSERT clause have to in

Re: Running and Sliding on Aggregates

2010-11-18 Thread Michael Roessler
Thanks Jeff. I'll add something to the JIRA case. I'm after what I think are fairly common use cases, such as: create table corporate_expenses(cost_center INT, fiscal_date STRING, expense_amount DOUBLE); given tab-delimited file data.txt: SALES 20101118 3939

Re: Hive produces very small files despite hive.merge...=true settings

2010-11-18 Thread Ning Zhang
The settings looks good. The parameter hive.merge.size.smallfiles.avgsize is used to determine at run time if a merge should be triggered: if the average size of the files in the partition is SMALLER than the parameter and there are more than 1 file, the merge should be scheduled. Can you try to

Re: Hive produces very small files despite hive.merge...=true settings

2010-11-18 Thread Leo Alekseyev
Hi Ning, For the dataset I'm experimenting with, the total size of the output is 2mb, and the files are at most a few kb in size. My hive.input.format was set to default HiveInputFormat; however, when I set it to CombineHiveInputFormat, it only made the first stage of the job use fewer mappers. T

Re: Hive produces very small files despite hive.merge...=true settings

2010-11-18 Thread Ted Yu
Leo: You may find this helpful: http://indoos.wordpress.com/2010/06/24/hive-remote-debugging/ On Thu, Nov 18, 2010 at 2:57 PM, Leo Alekseyev wrote: > Hi Ning, > For the dataset I'm experimenting with, the total size of the output > is 2mb, and the files are at most a few kb in size. My > hive.i

Re: Hive produces very small files despite hive.merge...=true settings

2010-11-18 Thread Ning Zhang
I see. If you are using dynamic partitions, HIVE-1307 and HIVE-1622 need to be there for merging to take place. HIVE-1307 was committed to trunk on 08/25 and HIVE-1622 was committed on 09/13. The simplest way is to update your Hive trunk and rerun the query. If it still doesn't work maybe you ca

Using jdbc in embedded mode - Can't find warehouse directory

2010-11-18 Thread Stuart Smith
Hello, I'm trying to connect to hive using the JDBC driver in embedded mode. I can load the driver successfully & connect to it via: hiveConnection = DriverManager.getConnection( "jdbc:hive://", "", "" ) But when I query a table that I know exists - I can query it via a hive command line ru

Re: Hive produces very small files despite hive.merge...=true settings

2010-11-18 Thread Leo Alekseyev
I thought I was running Hive with those changes merged in, but to make sure, I built the latest trunk version. The behavior changed somewhat (as in, it runs 2 stages instead of 1), but it still generates the same number of files (# of files generated is equal to the number of the original mappers,

RE: Hive/HBase integration issue.

2010-11-18 Thread Vivek Mishra
Hi, Just found that, It is related to HIVE-1264 JIRA. Thanks for all help. Vivek -Original Message- From: John Sichi [mailto:jsi...@fb.com] Sent: Friday, November 19, 2010 1:02 AM To: Subject: Re: Hive/HBase integration issue. This is unrelated to Hive/HBase integration; it looks like