Re: Secondary Indexes - Missing Data in Phoenix

Alexander Lytchier Thu, 25 Jul 2019 19:08:41 -0700

Thanks for the reply.

We will attempt to update to Phoenix 4.14.X and re-try adding secondary indexes.


Can you help to clarify “local indexes are stored in the same table as the 
data”. When a local index is created in Phoenix I observe that a new table is 
created in HBase _LOCAL_IDX_TABLE_NAME. It was my assumption that this is where 
the columns for the index table are stored, along with the PK values? Moreover 
using EXPLAIN in Phoenix I can see that it will attempt to SCAN OVER 
_LOCAL_IDX_TABLE_NAME when my query is using the index.

On 2019/07/25 14:00:25, Josh Elser <[email protected]<mailto:[email protected]>> 
wrote:
> Local indexes are stored in the same table as the data. They are "local" >
> to the data.>
>
> I would not be surprised if you are running into issues because you are >
> using such an old version of Phoenix.>
>
> On 7/24/19 10:35 PM, Alexander Lytchier wrote:>
> > Hi,>
> > >
> > We are currently using Cloudera as a package manager for our Hadoop >
> > Cluster with Phoenix 4.7.0 (CLABS_PHOENIX)and HBase 1.2.0-cdh5.7.6. >
> > Phoenix 4.7.0 appears to be the latest version supported >
> > (http://archive.cloudera.com/cloudera-labs/phoenix/parcels/latest/) even >
> > though it’s old.>
> > >
> > The table in question has a binary row-key: pk BINARY(30): 1 Byte for >
> > salting, 8 Bytes - timestamp (Long), 20 Bytes - hash result of other >
> > record fields. + 1 extra byte for unknown issue about updating schema in >
> > future (not sure if relevant). We are currently facing performance >
> > issues and are attempting to mitigate it by adding secondary indexes.>
> > >
> > When generating a local index synchronously with the following command:>
> > >
> > CREATE LOCAL INDEX INDEX_TABLE ON “MyTable” (“cf”.”type”);>
> > >
> > I can see that the resulting index table in Phoenix is populated, in >
> > HBase I can see the row-key of the index table and queries work as 
> > expected:>
> > >
> > \x00\x171545413\x00 column=cf:cf:type, timestamp=1563954319353, >
> > value=1545413>
> > >
> > \x00\x00\x00\x01b\xB2s\xDB>
> > >
> > @\x1B\x94\xFA\xD4\x14c\x0B>
> > >
> > d$\x82\xAD\xE6\xB3\xDF\x06>
> > >
> > \xC9\x07@\xB9\xAE\x00>
> > >
> > However, for the case where the index is created asynchronously, and >
> > then populated using the IndexTool, with the following commands:>
> > >
> > >
> > CREATE LOCAL INDEX INDEX_TABLE ON “MyTable” (“cf”.”type”) ASYNC;>
> > >
> > sudo -u hdfs HADOOP_CLASSPATH=`hbase classpath` hadoop jar >
> > /opt/cloudera/parcels/CDH-5.7.1-1.cdh5.7.1.p0.11/lib/hbase/bin/../lib/hbase-client-1.2.0-cdh5.7.1.jar
> >  >
> > org.apache.phoenix.mapreduce.index.IndexTool --data-table "MyTable" >
> > --index-table INDEX_TABLE --output-path hdfs://nameservice1/>
> > >
> > I get the following row-key in HBase:>
> > >
> > >
> > \x00\x00\x00\x00\x00\x00\x column=cf:cf:type, timestamp=1563954000238, >
> > value=1545413>
> > >
> > 00\x00\x00\x00\x00\x00\x00>
> > >
> > \x00\x00\x00\x00\x00\x00\x>
> > >
> > 00\x00\x00\x00\x00\x00\x00>
> > >
> > \x00\x00\x00\x00\x00\x00\x>
> > >
> > 151545413\x00\x00\x>
> > >
> > 00\x00\x01b\xB2s\xDB@\x1B\>
> > >
> > x94\xFA\xD4\x14c\x0Bd$\x82>
> > >
> > \xAD\xE6\xB3\xDF\x06\xC9\x>
> > >
> > 07@\xB9\xAE\x00>
> > >
> > It is has 32 additional 0-bytes (\x00). Why is there a difference – is >
> > one expected? What’s more, the index table in Phoenix is empty (I guess >
> > it’s not able to read the underlying HBase index table with that key?), >
> > so any queries that use the local index in Phoenix return no value.>
> > >
> > Do you have any suggestions? We must use the /async /method to populate >
> > the index table on production because of the massive amounts of data, >
> > but if Phoenix is not able to read the index table it cannot be used for >
> > queries.>
> > >
> > Is it possible this issue has been fixed in a newer version?>
> > >
> > Thanks>
> > >
>

Re: Secondary Indexes - Missing Data in Phoenix

Reply via email to