The build-in indexes in ORC file does not work.

2016-03-16 Thread Joseph
Hi all, I have known that ORC provides three level of indexes within each file, file level, stripe level, and row level. The file and stripe level statistics are in the file footer so that they are easy to access to determine if the rest of the file needs to be read at all. Row level indexes i

Re: The build-in indexes in ORC file does not work.

2016-03-16 Thread Mich Talebzadeh
Hi, The parameters that control the stripe, row group are configurable via the ORC creation script CREATE TABLE dummy ( ID INT , CLUSTERED INT , SCATTERED INT , RANDOMISED INT , RANDOM_STRING VARCHAR(50) , SMALL_VC VARCHAR(10) , PADDING VARCHAR(10) ) CLUSTERED BY (ID) INT

Re: The build-in indexes in ORC file does not work.

2016-03-18 Thread Gopal Vijayaraghavan
> I have tried bloom filter ,but it makes no improvement。I know about > tez, but never use, I will try it later. ... >select count(*) from gprs where terminal_type=25080; > will not scan data > Time taken: 353.345 seconds CombineInputFormat does not do any split-elimination, so MapRed

Re: The build-in indexes in ORC file does not work.

2016-03-18 Thread Mich Talebzadeh
I love to see these ORC table optimization help but it is not obvious to me under what circumstances they bare fruit. Case in point. I have an ORC table with 100 Million rows created as follows: CREATE TABLE `dummy`( `id` int, `clustered` int, `scattered` int, `randomised` int, `random_

Re: The build-in indexes in ORC file does not work.

2016-03-19 Thread Mich Talebzadeh
Hi Gopal, I am using Hive 2 on Spark 1.3.1 engine. OK, This is only a test table. What would be the best way to create this table in Hive as ORC format? Thanks Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw

Re: The build-in indexes in ORC file does not work.

2016-03-19 Thread Mich Talebzadeh
; Sent with Good (www.good.com) > -- > *From:* Joseph > *Sent:* Wednesday, March 16, 2016 9:46:25 AM > *To:* user > *Cc:* user; user > *Subject:* Re: Re: The build-in indexes in ORC file does not work. > > > terminal_type =0, 260,000,000 rows, almost cov

Re: The build-in indexes in ORC file does not work.

2016-03-19 Thread Gopal Vijayaraghavan
> I love to see these ORC table optimization help but it is not obvious to >me under what circumstances they bare fruit. Are you using Tez or LLAP? Your explain plans are clearly missing the optimizations I've added as part of Stinger.next. https://github.com/apache/hive/blob/master/ql/src/test/

Re: The build-in indexes in ORC file does not work.

2016-03-19 Thread Jörn Franke
How much data are you querying? What is the query? How selective it is supposed to be? What is the block size? > On 16 Mar 2016, at 11:23, Joseph wrote: > > Hi all, > > I have known that ORC provides three level of indexes within each file, file > level, stripe level, and row level. > The fi

Re: The build-in indexes in ORC file does not work.

2016-03-19 Thread Jörn Franke
minal_type = 25080; > select * from gprs where terminal_type = 25080; > > In the gprs table, the "terminal_type" column's value is in [0, 25066] > > Joseph > > From: Jörn Franke > Date: 2016-03-16 19:26 > To: Joseph > CC: user; user > Subject: Re

Re: Re: The build-in indexes in ORC file does not work.

2016-03-19 Thread Joseph
y way to check the use of stats? Joseph From: Gopal Vijayaraghavan Date: 2016-03-16 22:18 To: user@hive.apache.org CC: Joseph Subject: Re: The build-in indexes in ORC file does not work. > I have tried bloom filter ,but it makes no improvement。I know about > tez, but never use, I will try it

Re: Re: The build-in indexes in ORC file does not work.

2016-03-20 Thread Joseph
le number is 800, each of them is about 51M. my query statement is : select count(*) from gprs where terminal_type = 25080; select * from gprs where terminal_type = 25080; In the gprs table, the "terminal_type" column's value is in [0, 25066] Joseph From: Jörn Franke Date: 2016-0