Re: Secondary Indexes - Missing Data in Phoenix

Aleksandr Saraseka Mon, 29 Jul 2019 01:58:09 -0700

Hello Alex.
Please refer to this JIRA https://issues.apache.org/jira/browse/PHOENIX-1734 .
Since v4.8 local index it's just a shadow CF within data table.


On Fri, Jul 26, 2019 at 5:08 AM Alexander Lytchier <
alexanderlytch...@m800.com> wrote:

> Thanks for the reply.
>
> We will attempt to update to Phoenix 4.14.X and re-try adding secondary
> indexes.
>
> Can you help to clarify “local indexes are stored in the same table as the
> data”. When a local index is created in Phoenix I observe that a new table
> is created in HBase *_LOCAL_IDX_TABLE_NAME*. It was my assumption that
> this is where the columns for the index table are stored, along with the PK
> values? Moreover using *EXPLAIN* in Phoenix I can see that it will
> attempt to SCAN OVER *_LOCAL_IDX_TABLE_NAME* when my query is using the
> index.
>
>
>
> On 2019/07/25 14:00:25, Josh Elser <e...@apache.org> wrote:
>
> > Local indexes are stored in the same table as the data. They are "local"
> >
>
> > to the data.>
>
> >
>
> > I would not be surprised if you are running into issues because you are
> >
>
> > using such an old version of Phoenix.>
>
> >
>
> > On 7/24/19 10:35 PM, Alexander Lytchier wrote:>
>
> > > Hi,>
>
> > > >
>
> > > We are currently using Cloudera as a package manager for our Hadoop >
>
> > > Cluster with Phoenix 4.7.0 (CLABS_PHOENIX)and HBase 1.2.0-cdh5.7.6. >
>
> > > Phoenix 4.7.0 appears to be the latest version supported >
>
> > > (http://archive.cloudera.com/cloudera-labs/phoenix/parcels/latest/)
> even >
>
> > > though it’s old.>
>
> > > >
>
> > > The table in question has a binary row-key: pk BINARY(30): 1 Byte for
> >
>
> > > salting, 8 Bytes - timestamp (Long), 20 Bytes - hash result of other >
>
> > > record fields. + 1 extra byte for unknown issue about updating schema
> in >
>
> > > future (not sure if relevant). We are currently facing performance >
>
> > > issues and are attempting to mitigate it by adding secondary indexes.>
>
> > > >
>
> > > When generating a local index synchronously with the following
> command:>
>
> > > >
>
> > > CREATE LOCAL INDEX INDEX_TABLE ON “MyTable” (“cf”.”type”);>
>
> > > >
>
> > > I can see that the resulting index table in Phoenix is populated, in >
>
> > > HBase I can see the row-key of the index table and queries work as
> expected:>
>
> > > >
>
> > > \x00\x171545413\x00 column=cf:cf:type, timestamp=1563954319353, >
>
> > > value=1545413>
>
> > > >
>
> > > \x00\x00\x00\x01b\xB2s\xDB>
>
> > > >
>
> > > @\x1B\x94\xFA\xD4\x14c\x0B>
>
> > > >
>
> > > d$\x82\xAD\xE6\xB3\xDF\x06>
>
> > > >
>
> > > \xC9\x07@\xB9\xAE\x00>
>
> > > >
>
> > > However, for the case where the index is created asynchronously, and >
>
> > > then populated using the IndexTool, with the following commands:>
>
> > > >
>
> > > >
>
> > > CREATE LOCAL INDEX INDEX_TABLE ON “MyTable” (“cf”.”type”) ASYNC;>
>
> > > >
>
> > > sudo -u hdfs HADOOP_CLASSPATH=`hbase classpath` hadoop jar >
>
> > >
> /opt/cloudera/parcels/CDH-5.7.1-1.cdh5.7.1.p0.11/lib/hbase/bin/../lib/hbase-client-1.2.0-cdh5.7.1.jar
> >
>
> > > org.apache.phoenix.mapreduce.index.IndexTool --data-table "MyTable" >
>
> > > --index-table INDEX_TABLE --output-path hdfs://nameservice1/>
>
> > > >
>
> > > I get the following row-key in HBase:>
>
> > > >
>
> > > >
>
> > > \x00\x00\x00\x00\x00\x00\x column=cf:cf:type, timestamp=1563954000238,
> >
>
> > > value=1545413>
>
> > > >
>
> > > 00\x00\x00\x00\x00\x00\x00>
>
> > > >
>
> > > \x00\x00\x00\x00\x00\x00\x>
>
> > > >
>
> > > 00\x00\x00\x00\x00\x00\x00>
>
> > > >
>
> > > \x00\x00\x00\x00\x00\x00\x>
>
> > > >
>
> > > 151545413\x00\x00\x>
>
> > > >
>
> > > 00\x00\x01b\xB2s\xDB@\x1B\>
>
> > > >
>
> > > x94\xFA\xD4\x14c\x0Bd$\x82>
>
> > > >
>
> > > \xAD\xE6\xB3\xDF\x06\xC9\x>
>
> > > >
>
> > > 07@\xB9\xAE\x00>
>
> > > >
>
> > > It is has 32 additional 0-bytes (\x00). Why is there a difference – is
> >
>
> > > one expected? What’s more, the index table in Phoenix is empty (I
> guess >
>
> > > it’s not able to read the underlying HBase index table with that
> key?), >
>
> > > so any queries that use the local index in Phoenix return no value.>
>
> > > >
>
> > > Do you have any suggestions? We must use the /async /method to
> populate >
>
> > > the index table on production because of the massive amounts of data,
> >
>
> > > but if Phoenix is not able to read the index table it cannot be used
> for >
>
> > > queries.>
>
> > > >
>
> > > Is it possible this issue has been fixed in a newer version?>
>
> > > >
>
> > > Thanks>
>
> > > >
>
> >
>


-- 
Aleksandr Saraseka
DBA
380997600401
 *•*  asaras...@eztexting.com  *•*  eztexting.com
<http://eztexting.com?utm_source=WiseStamp&utm_medium=email&utm_term=&utm_content=&utm_campaign=signature>
<http://facebook.com/eztexting?utm_source=WiseStamp&utm_medium=email&utm_term=&utm_content=&utm_campaign=signature>
<http://linkedin.com/company/eztexting/?utm_source=WiseStamp&utm_medium=email&utm_term=&utm_content=&utm_campaign=signature>
<http://twitter.com/eztexting?utm_source=WiseStamp&utm_medium=email&utm_term=&utm_content=&utm_campaign=signature>
<https://www.youtube.com/eztexting?utm_source=WiseStamp&utm_medium=email&utm_term=&utm_content=&utm_campaign=signature>
<https://www.instagram.com/ez_texting/?utm_source=WiseStamp&utm_medium=email&utm_term=&utm_content=&utm_campaign=signature>
<https://www.facebook.com/alex.saraseka?utm_source=WiseStamp&utm_medium=email&utm_term=&utm_content=&utm_campaign=signature>
<https://www.linkedin.com/in/alexander-saraseka-32616076/?utm_source=WiseStamp&utm_medium=email&utm_term=&utm_content=&utm_campaign=signature>

Re: Secondary Indexes - Missing Data in Phoenix

Reply via email to