Thanks for the reply. We will attempt to update to Phoenix 4.14.X and re-try adding secondary indexes.
Can you help to clarify “local indexes are stored in the same table as the data”. When a local index is created in Phoenix I observe that a new table is created in HBase _LOCAL_IDX_TABLE_NAME. It was my assumption that this is where the columns for the index table are stored, along with the PK values? Moreover using EXPLAIN in Phoenix I can see that it will attempt to SCAN OVER _LOCAL_IDX_TABLE_NAME when my query is using the index. On 2019/07/25 14:00:25, Josh Elser <e...@apache.org<mailto:e...@apache.org>> wrote: > Local indexes are stored in the same table as the data. They are "local" > > to the data.> > > I would not be surprised if you are running into issues because you are > > using such an old version of Phoenix.> > > On 7/24/19 10:35 PM, Alexander Lytchier wrote:> > > Hi,> > > > > > We are currently using Cloudera as a package manager for our Hadoop > > > Cluster with Phoenix 4.7.0 (CLABS_PHOENIX)and HBase 1.2.0-cdh5.7.6. > > > Phoenix 4.7.0 appears to be the latest version supported > > > (http://archive.cloudera.com/cloudera-labs/phoenix/parcels/latest/) even > > > though it’s old.> > > > > > The table in question has a binary row-key: pk BINARY(30): 1 Byte for > > > salting, 8 Bytes - timestamp (Long), 20 Bytes - hash result of other > > > record fields. + 1 extra byte for unknown issue about updating schema in > > > future (not sure if relevant). We are currently facing performance > > > issues and are attempting to mitigate it by adding secondary indexes.> > > > > > When generating a local index synchronously with the following command:> > > > > > CREATE LOCAL INDEX INDEX_TABLE ON “MyTable” (“cf”.”type”);> > > > > > I can see that the resulting index table in Phoenix is populated, in > > > HBase I can see the row-key of the index table and queries work as > > expected:> > > > > > \x00\x171545413\x00 column=cf:cf:type, timestamp=1563954319353, > > > value=1545413> > > > > > \x00\x00\x00\x01b\xB2s\xDB> > > > > > @\x1B\x94\xFA\xD4\x14c\x0B> > > > > > d$\x82\xAD\xE6\xB3\xDF\x06> > > > > > \xC9\x07@\xB9\xAE\x00> > > > > > However, for the case where the index is created asynchronously, and > > > then populated using the IndexTool, with the following commands:> > > > > > > > > CREATE LOCAL INDEX INDEX_TABLE ON “MyTable” (“cf”.”type”) ASYNC;> > > > > > sudo -u hdfs HADOOP_CLASSPATH=`hbase classpath` hadoop jar > > > /opt/cloudera/parcels/CDH-5.7.1-1.cdh5.7.1.p0.11/lib/hbase/bin/../lib/hbase-client-1.2.0-cdh5.7.1.jar > > > > > org.apache.phoenix.mapreduce.index.IndexTool --data-table "MyTable" > > > --index-table INDEX_TABLE --output-path hdfs://nameservice1/> > > > > > I get the following row-key in HBase:> > > > > > > > > \x00\x00\x00\x00\x00\x00\x column=cf:cf:type, timestamp=1563954000238, > > > value=1545413> > > > > > 00\x00\x00\x00\x00\x00\x00> > > > > > \x00\x00\x00\x00\x00\x00\x> > > > > > 00\x00\x00\x00\x00\x00\x00> > > > > > \x00\x00\x00\x00\x00\x00\x> > > > > > 151545413\x00\x00\x> > > > > > 00\x00\x01b\xB2s\xDB@\x1B\> > > > > > x94\xFA\xD4\x14c\x0Bd$\x82> > > > > > \xAD\xE6\xB3\xDF\x06\xC9\x> > > > > > 07@\xB9\xAE\x00> > > > > > It is has 32 additional 0-bytes (\x00). Why is there a difference – is > > > one expected? What’s more, the index table in Phoenix is empty (I guess > > > it’s not able to read the underlying HBase index table with that key?), > > > so any queries that use the local index in Phoenix return no value.> > > > > > Do you have any suggestions? We must use the /async /method to populate > > > the index table on production because of the massive amounts of data, > > > but if Phoenix is not able to read the index table it cannot be used for > > > queries.> > > > > > Is it possible this issue has been fixed in a newer version?> > > > > > Thanks> > > > >