Re: Format dillema

2017-06-22 Thread Gopal Vijayaraghavan
> I kept hearing about vectorization, but later found out it was going to work > if i used ORC. Yes, it's a tautology - if you cared about performance, you'd use ORC, because ORC is the fastest format. And doing performance work to support folks who don't quite care about it, is not exactly

Re: Hive query on ORC table is really slow compared to Presto

2017-06-22 Thread Gopal Vijayaraghavan
> 1711647 -1032220119 Ok, so this is the hashCode skew issue, probably the one we already know about. https://github.com/apache/hive/commit/fcc737f729e60bba5a241cf0f607d44f7eac7ca4 String hashcode distribution is much better in master after that. Hopefully that fixes the distinct speed issue

User already granted INSERT privilege, but hdfs permission denied

2017-06-22 Thread wuchang
The admin user of my hive is named appuser.I have create a database named wuchang_test and a table named abtestmsg. Yes , I describe the database, the OWNER NAME of this database is appuser and OWNER TYPE is USER ,just like below: 0: jdbc:hive2://hive.data.ms.netease.com:1000> describe database

hive authorization problem

2017-06-22 Thread Namas Amitabha
Hi all, I met a problem with Hive Default Authorization - Legacy Mode, I tried to enable the authorization on hiveserver2, and this is my hive-site.xml in hiveserver2 conf:

Controlling Number of small files while inserting into Hive table

2017-06-22 Thread Arpan Rajani
Hello everyone, I am sure many of you might have faced similar issue. We do "insert into 'target_table' select a,b,c from x where .." kind of queries for a nightly load. This insert goes in a new partition of the target_table. Now the concern is : *this inserts load hardly any data* ( I would