Re: ORC files and statistics

2016-01-19 Thread Jörn Franke
@hive.apache.org; Ashok Kumar > Cc: Jörn Franke > Subject: Re: ORC files and statistics > > On Tue, Jan 19, 2016 at 9:45 AM, Ashok Kumar wrote: > Thank you both. > > So if I have a Hive table of ORC type and it contains 100K rows, there will > be 10 row groups of 10K

RE: ORC files and statistics

2016-01-19 Thread Mich Talebzadeh
heir employees accept any responsibility. From: Ashok Kumar [mailto:ashok34...@yahoo.com] Sent: 19 January 2016 20:36 To: User Subject: Re: ORC files and statistics Thanks Owen, I got a bit confused comparing ORC with what I know about indexes in relational databases. Still need to

Re: ORC files and statistics

2016-01-19 Thread Ashok Kumar
bject: Re: ORC files and statistics  On Tue, Jan 19, 2016 at 9:45 AM, Ashok Kumar wrote: Thank you both.  So if I have a Hive table of ORC type and it contains 100K rows, there will be 10 row groups of 10K row each.  Yes   within each row group there will be min, max, count(distint_value) and sum

Re: ORC files and statistics

2016-01-19 Thread Owen O'Malley
On Tue, Jan 19, 2016 at 9:45 AM, Ashok Kumar wrote: > Thank you both. > > So if I have a Hive table of ORC type and it contains 100K rows, there > will be 10 row groups of 10K row each. > Yes > > within each row group there will be min, max, count(distint_value) and sum > for each column withi

Re: ORC files and statistics

2016-01-19 Thread Ashok Kumar
Thank you both. So if I have a Hive table of ORC type and it contains 100K rows, there will be 10 row groups of 10K row each. within each row group there will be min, max, count(distint_value) and sum for each column within that row group. is count mean count of distinct values including null oc

Re: ORC files and statistics

2016-01-19 Thread Jörn Franke
Just be aware that you should insert the data sorted at least on the most discrimating column of your where clause > On 19 Jan 2016, at 17:27, Owen O'Malley wrote: > > It has both. Each index has statistics of min, max, count, and sum for each > column in the row group of 10,000 rows. It also

Re: ORC files and statistics

2016-01-19 Thread Owen O'Malley
It has both. Each index has statistics of min, max, count, and sum for each column in the row group of 10,000 rows. It also has the location of the start of each row group, so that the reader can jump straight to the beginning of the row group. The reader takes a SearchArgument (eg. age > 100) tha

ORC files and statistics

2016-01-19 Thread Ashok Kumar
Hi, I have read some notes on ORC files in Hive and indexes. The document describes in the indexes but makes reference to statistics Indexes |   | |   | |   |   |   |   |   | | IndexesIndexes ORC provides three level of indexes within each file: file level - statistics about the values in each c