date:20101015

Re: Merging small files with dynamic partitions

2010-10-15 Thread Sammy Yu

Hi guys, Thanks for the response. I tried running without hive.mergejob.maponly with the same result. I've attached the explain extended output. I am running this query on EC2 boxes, however it's not running on EMR. Hive is running on top of a hadoop 0.20.2 setup.. Thanks, Sammy On Fri, O

Re: Merging small files with dynamic partitions

2010-10-15 Thread Ning Zhang

The output file shows it only have 2 jobs (the mapreduce job and the move task). This indicates that the plan does not have merge enabled. Merge should consists of a ConditionalTask and 2 sub tasks (a MR task and a move task). Can you send the plan of the query? One thing I noticed is that you

Re: Merging small files with dynamic partitions

2010-10-15 Thread Edward Capriolo

Sammy, This is not the exact remedy you were looking for, but my company open sourced our file crusher utility. http://www.jointhegrid.com/hadoop_filecrush/index.jsp We use it to good effect to turn many small files into one. Works with text and sequence files , and custom writables. Edward On

Re: Help with last 30 day unique user query

2010-10-15 Thread Vijay

Thanks Alex! That is exactly what I thought was the limitation but wanted to make sure I'm not missing anything. On Fri, Oct 15, 2010 at 10:51 AM, Alex Boisvert wrote: > As far as I know, Hive has no built-in support for sliding-window > analytics. There is an enhancement request here: > https:

Re: Multiple insert statement and levels of aggregation

2010-10-15 Thread Alex Boisvert

Cool, I hadn't come across lateral views yet. I'll see if I can use that. thanks!! alex On Fri, Oct 15, 2010 at 11:17 AM, Ning Zhang wrote: > In the multi-insert statement, you cannot put another FROM clause. What you > can do is to put both UDTF in the FROM clause: > > FROM foo lateral view

UDAF modes

2010-10-15 Thread Alex Boisvert

Hi, I'm writing a UDAF and I'm a little unclear about the PARTIAL1, PARTIAL2, FINAL and COMPLETE modes. I've read the extent of the Javadoc ;) and looked at some of the built-in UDAFs in the Hive source tree and I'm still unclear about the properties of the input data in each aggregation step. C

Re: Multiple insert statement and levels of aggregation

2010-10-15 Thread Ning Zhang

In the multi-insert statement, you cannot put another FROM clause. What you can do is to put both UDTF in the FROM clause: FROM foo lateral view someUDTF(foo.a) as t1_a lateral view anotherUDTF(foo.a) as T2_a INSERT ... SELECT a,b,c,count(1), t1_a .. SELECT a,b,c,count(1), t2_a .. On Oct 15, 2

Multiple insert statement and levels of aggregation

2010-10-15 Thread Alex Boisvert

Hi, I'd like to write a multiple-insert select statement where I need to call different UDTFs and perform several levels of aggregation based on the result of the initial table, e.g., FROM (SELECT * from TABLE foo) foo INSERT OVERWRITE TABLE bar SELECT a, b, c, count(1) FROM (SELECT someUDTF(fo

Re: Help with last 30 day unique user query

2010-10-15 Thread Alex Boisvert

As far as I know, Hive has no built-in support for sliding-window analytics. There is an enhancement request here: https://issues.apache.org/jira/browse/HIVE-896 Without such support, the brute force way of doing things is, SELECT COUNT(DISTINCT us

Re: Help with last 30 day unique user query

2010-10-15 Thread Ning Zhang

Sorry I don't understand your question. I thought you were referring to the lack of DATE type in Hive. HiveQL has the similar syntax with SQL like count(distinct col). Your regular SQL query should work together with the help of UDFs I mentioned. On Oct 15, 2010, at 9:43 AM, Vijay wrote: Thank

Re: Help with last 30 day unique user query

2010-10-15 Thread Vijay

Thanks, Ning! Finding the date which is 30 days before/later was easy enough but my problem is beyond that. I need to find unique users based on these last 30 days for a range of days. Does that make sense? On Fri, Oct 15, 2010 at 12:10 AM, Ning Zhang wrote: > There are some UDFs that convert a

Need help to ignore corrupted gzipped files while doing a query

2010-10-15 Thread Parag Arora

Hello I have a small query and need little help on the same. I have a hive table which loads its data from files partitioned by timestamp (every 15 minutes) and placed there in gzipped format. There may be some gzip files which are corrupted (while transferring files, network error etc. may have r

Re: Got question after deploy hadoop-0.21.0

2010-10-15 Thread SingoWong

Hi, The first issue was sloved, the second warring message still existing... On Thu, Oct 14, 2010 at 5:03 PM, SingoWong wrote: > Hi, > > I got some question after deploy hadoop-0.21.0 need to help. > There is a new deploy not update, and i execute start-hdfs.sh, > start-mapred.sh, got the messa

Re: Help with last 30 day unique user query

2010-10-15 Thread Ning Zhang

There are some UDFs that convert a string to epoch time and back to a string. e.g., select from_unixtime(unix_timestamp('2010-10-10', '-MM-dd') + 60*60*24*30, '-MM-dd') from src limit 1; will given you the date which is 30 days later than 2010-10-10. On Oct 14, 2010, at 11:36 PM, Vij

Re: Merging small files with dynamic partitions

Re: Merging small files with dynamic partitions

Re: Merging small files with dynamic partitions

Re: Help with last 30 day unique user query

Re: Multiple insert statement and levels of aggregation

UDAF modes

Re: Multiple insert statement and levels of aggregation

Multiple insert statement and levels of aggregation

Re: Help with last 30 day unique user query

Re: Help with last 30 day unique user query

Re: Help with last 30 day unique user query

Need help to ignore corrupted gzipped files while doing a query

Re: Got question after deploy hadoop-0.21.0

Re: Help with last 30 day unique user query

14 matches

Site Navigation

Mail list logo

Footer information