Re: Indexes in Hive

2016-01-06 Thread Alan Gates
The issue with this is that HDFS lacks the ability to co-locate blocks. So if you break your columns into one file per column (the more traditional column route) you end up in a situation where 2/3 of the time only one of your columns is being locally read, which results in a significant

Re: Indexes in Hive

2016-01-06 Thread Jörn Franke
I am not sure how much performance one could gain in comparison to ORC or Parquet. They work pretty well once you know how to use them. However, there is still ways to optimize them. For instance, sorting of data is a key factor for these formats to be efficient. Nevertheless, if you have a lot of

RE: Indexes in Hive

2016-01-06 Thread Mich Talebzadeh
recipient to ensure that this email is virus free, therefore neither Peridale Ltd, its subsidiaries nor their employees accept any responsibility. From: Alan Gates [mailto:alanfga...@gmail.com] Sent: 06 January 2016 18:19 To: user@hive.apache.org Subject: Re: Indexes in Hive The issue with

Re: Indexes in Hive

2016-01-05 Thread Jörn Franke
If I understand you correctly this could be just another Hive storage format. > On 06 Jan 2016, at 07:24, Mich Talebzadeh wrote: > > Hi, > > Thinking loudly. > > Ideally we should consider a totally columnar storage offering in which each > column of table is stored as

RE: Indexes in Hive

2016-01-05 Thread Mich Talebzadeh
ache.org Subject: Re: Indexes in Hive If I understand you correctly this could be just another Hive storage format. > On 06 Jan 2016, at 07:24, Mich Talebzadeh <m...@peridale.co.uk> wrote: > > Hi, > > Thinking loudly. > > Ideally we should consider a totally columnar st