[ https://issues.apache.org/jira/browse/PHOENIX-4544?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Ankit Singhal reassigned PHOENIX-4544: -------------------------------------- Assignee: Ankit Singhal > Update statistics inconsistent behavior > ---------------------------------------- > > Key: PHOENIX-4544 > URL: https://issues.apache.org/jira/browse/PHOENIX-4544 > Project: Phoenix > Issue Type: Bug > Affects Versions: 5.0.0 > Reporter: Romil Choksi > Assignee: Ankit Singhal > Priority: Major > Attachments: PHOENIX-4544.patch > > > Update statistics may not generate the stats information for all dependent > indexes. And this behavior may depend on whether the command executed > synchronously or asynchronously. > I have a table GIGANTIC_TABLE with ~500k rows with global index I1 and local > index I2. > If async is turned on (the default value): > {noformat} > 0: jdbc:phoenix:> update statistics GIGANTIC_TABLE ALL; > No rows affected (0.081 seconds) > 0: jdbc:phoenix:> select count(GUIDE_POSTS_ROW_COUNT) from SYSTEM.STATS WHERE > PHYSICAL_NAME='I1' AND COLUMN_FAMILY='0'; > +-------------------------------+ > | COUNT(GUIDE_POSTS_ROW_COUNT) | > +-------------------------------+ > | 5 | > +-------------------------------+ > 1 row selected (0.009 seconds) > 0: jdbc:phoenix:> select count(GUIDE_POSTS_ROW_COUNT) from SYSTEM.STATS WHERE > PHYSICAL_NAME='GIGANTIC_TABLE' AND COLUMN_FAMILY='0'; > +-------------------------------+ > | COUNT(GUIDE_POSTS_ROW_COUNT) | > +-------------------------------+ > | 520 | > +-------------------------------+ > 1 row selected (0.014 seconds) > 0: jdbc:phoenix:> select count(GUIDE_POSTS_ROW_COUNT) from SYSTEM.STATS WHERE > PHYSICAL_NAME='GIGANTIC_TABLE' AND COLUMN_FAMILY='L#0'; > +-------------------------------+ > | COUNT(GUIDE_POSTS_ROW_COUNT) | > +-------------------------------+ > | 0 | > +-------------------------------+ > 1 row selected (0.008 seconds) > 0: jdbc:phoenix:> > {noformat} > As we can see there is no records for local index I2. But if we run > statistics for indexes: > {noformat} > 0: jdbc:phoenix:> update statistics GIGANTIC_TABLE INDEX; > No rows affected (0.036 seconds) > 0: jdbc:phoenix:> select count(GUIDE_POSTS_ROW_COUNT) from SYSTEM.STATS WHERE > PHYSICAL_NAME='GIGANTIC_TABLE' AND COLUMN_FAMILY='L#0'; > +-------------------------------+ > | COUNT(GUIDE_POSTS_ROW_COUNT) | > +-------------------------------+ > | 20 | > +-------------------------------+ > 1 row selected (0.007 seconds) > {noformat} > the statistic for local index is generated correctly. > Now we turn async off: > {noformat} > 0: jdbc:phoenix:> delete from SYSTEM.STATS; > 547 rows affected (0.079 seconds) > 0: jdbc:phoenix:> update statistics GIGANTIC_TABLE ALL; > 999,998 rows affected (4.671 seconds) > 0: jdbc:phoenix:> select count(GUIDE_POSTS_ROW_COUNT) from SYSTEM.STATS WHERE > PHYSICAL_NAME='GIGANTIC_TABLE' AND COLUMN_FAMILY='0'; > +-------------------------------+ > | COUNT(GUIDE_POSTS_ROW_COUNT) | > +-------------------------------+ > | 520 | > +-------------------------------+ > 1 row selected (0.04 seconds) > 0: jdbc:phoenix:> select count(GUIDE_POSTS_ROW_COUNT) from SYSTEM.STATS WHERE > PHYSICAL_NAME='GIGANTIC_TABLE' AND COLUMN_FAMILY='L#0'; > +-------------------------------+ > | COUNT(GUIDE_POSTS_ROW_COUNT) | > +-------------------------------+ > | 20 | > +-------------------------------+ > 1 row selected (0.012 seconds) > 0: jdbc:phoenix:> select count(GUIDE_POSTS_ROW_COUNT) from SYSTEM.STATS WHERE > PHYSICAL_NAME='I1' AND COLUMN_FAMILY='0'; > +-------------------------------+ > | COUNT(GUIDE_POSTS_ROW_COUNT) | > +-------------------------------+ > | 0 | > +-------------------------------+ > 1 row selected (0.011 seconds) > {noformat} > As we can see we got statistics for the table itself and local index. But not > for the global index. > Moreover, if we try to update statistics for indexes: > {noformat} > 0: jdbc:phoenix:> update statistics GIGANTIC_TABLE INDEX; > 499,999 rows affected (0.332 seconds) > 0: jdbc:phoenix:> select count(GUIDE_POSTS_ROW_COUNT) from SYSTEM.STATS WHERE > PHYSICAL_NAME='I1' AND COLUMN_FAMILY='0'; > +-------------------------------+ > | COUNT(GUIDE_POSTS_ROW_COUNT) | > +-------------------------------+ > | 0 | > +-------------------------------+ > 1 row selected (0.009 seconds) > {noformat} > So, still no records for global index. > But if we delete statistics first and run update for indexes: > {noformat} > 0: jdbc:phoenix:> delete from SYSTEM.STATS; > 541 rows affected (0.024 seconds) > 0: jdbc:phoenix:> update statistics GIGANTIC_TABLE INDEX; > 999,998 rows affected (0.41 seconds) > 0: jdbc:phoenix:> select count(GUIDE_POSTS_ROW_COUNT) from SYSTEM.STATS WHERE > PHYSICAL_NAME='I1' AND COLUMN_FAMILY='0'; > +-------------------------------+ > | COUNT(GUIDE_POSTS_ROW_COUNT) | > +-------------------------------+ > | 5 | > +-------------------------------+ > 1 row selected (0.01 seconds) > 0: jdbc:phoenix:> select count(GUIDE_POSTS_ROW_COUNT) from SYSTEM.STATS WHERE > PHYSICAL_NAME='GIGANTIC_TABLE' AND COLUMN_FAMILY='L#0'; > +-------------------------------+ > | COUNT(GUIDE_POSTS_ROW_COUNT) | > +-------------------------------+ > | 20 | > +-------------------------------+ > 1 row selected (0.01 seconds) > {noformat} > than we got statistics for both local and global indexes. -- This message was sent by Atlassian JIRA (v7.6.3#76005)