Re: What's the advised way to do groupby 2 attributes from a table with 1000 columns?

2016-03-27 Thread Gopal Vijayaraghavan
> I only need to query 3 columns, ... > The source table is about 1PB. Format of this table is extremely critical. A columnar data format like ORC is recommended to avoid reading any other columns when reading 3 out of 1000. > Will it be advised to do a subquery first, and then send it to the >

Re: Automatic Update statistics on ORC tables in Hive

2016-03-27 Thread Gopal Vijayaraghavan
> This might be a bit far fetched but is there any plan for background >ANALYZE STATISTICS to be performed on ORC tables https://issues.apache.org/jira/browse/HIVE-12669 Cheers, Gopal

Automatic Update statistics on ORC tables in Hive

2016-03-27 Thread Mich Talebzadeh
This might be a bit far fetched but is there any plan for background ANALYZE STATISTICS to be performed on ORC tables for example when it does compaction etc. Also I notice that "desc formatted does not show details of statistics run time. Could that be added in future releases as I think it wil

What's the advised way to do groupby 2 attributes from a table with 1000 columns?

2016-03-27 Thread Rex X
Give a table with 1000 columns: col1, col2, ..., col1000 The source table is about 1PB. I only need to query 3 columns, select col1, col2, sum(col3) as col3 from myTable group by col1, col2 Will it be advised to do a subquery first, and then send it to the aggregation of group by, so that