> I only need to query 3 columns,
...
> The source table is about 1PB.
Format of this table is extremely critical.
A columnar data format like ORC is recommended to avoid reading any other
columns when reading 3 out of 1000.
> Will it be advised to do a subquery first, and then send it to the
>
> This might be a bit far fetched but is there any plan for background
>ANALYZE STATISTICS to be performed on ORC tables
https://issues.apache.org/jira/browse/HIVE-12669
Cheers,
Gopal
This might be a bit far fetched but is there any plan for background
ANALYZE STATISTICS to be performed on ORC tables for example when it does
compaction etc.
Also I notice that "desc formatted does not show details of
statistics run time. Could that be added in future releases as I think it
wil
Give a table with 1000 columns:
col1, col2, ..., col1000
The source table is about 1PB.
I only need to query 3 columns,
select col1, col2, sum(col3) as col3
from myTable
group by
col1, col2
Will it be advised to do a subquery first, and then send it to the
aggregation of group by, so that