1) How to create index old way via intermediate HFiles?
I see “direct” option for IndexTool but description says its disabled:
private static final Option DIRECT_API_OPTION = new Option("direct", "direct",
false,
"This parameter is deprecated. Direct mode will be used whether it is set
or not. Keeping it for backwards compatibility.”);
2) On phoenix-4.14.2 (old indexes) WAL disabling for index table was possible
by “ALTER TABLE main_table SET DISABLE_WAL=true”
Maybe we can add this feature to 4.16+ ?
3) My main table has VERSIONS=>1. Anyway I decided to major-compacted before
next run and still got Delete mutations
From table metrics ~ 10% of mutations is Delete
I checked my main table, it has loaded IndexRegionObserver:
coprocessor$1 =>
'|org.apache.phoenix.coprocessor.ScanRegionObserver|805306366|',
coprocessor$2 =>
'|org.apache.phoenix.coprocessor.UngroupedAggregateRegionObserver|805306366|',
coprocessor$3 =>
'|org.apache.phoenix.coprocessor.GroupedAggregateRegionObserver|805306366|',
coprocessor$4 =>
'|org.apache.phoenix.coprocessor.ServerCachingEndpointImpl|805306366|',
coprocessor$5 =>
'|org.apache.phoenix.hbase.index.IndexRegionObserver|805306366|org.apache.hadoop.hbase.index.codec.class=org.apache.phoenix.index.PhoenixIndexCodec,index.builder=org.apache.phoenix.index.PhoenixIndexBuilder'
By the way I split index table for more regions, increased
hbase.hregion.memstore.flush.size, hbase.hstore.blockingStoreFiles and get ~
30% speedup.
This is still very slow compared to old index creation.
> On 31 Mar 2021, at 02:55, Kadir Ozdemir <[email protected]> wrote:
>
> I assume that your base table has several versions for a given row. If so,
> creating a consistent index on this base table can be slower than creating an
> old design index. This is because the new design creates an index row for
> every data table row version. It simply replays the mutations on a row
> without updating the data table but makes necessary mutations on the index
> table. It does this to make sure that if you use SCN connections to do
> point-in-time queries, the index will return correct results. During these
> replays, index rows will be deleted if index columns are modified. This is
> the reason I think you see delete mutations on the index table.
>
> 1) Yes
> 2) No
> 3) No
>
> It will be a good improvement to have an option to support (3) by just
> creating indexes using the last data row versions. Please feel free to create
> an improvement Jira for this.
>
> Did you create your base table using 4.16? If not, have you upgraded it to
> the new index design using IndexUpgradeTool? I am asking this to make sure
> that your index actually uses the new index design. You can verify this using
> the HBase shell by describing the data table and checking if the
> IndexRegionObserver coproc is loaded on your base table.
>
>
> On Tue, Mar 30, 2021 at 3:10 PM Alexander Batyrshin <[email protected]
> <mailto:[email protected]>> wrote:
> I tried on phoenix-4.16.0
>
> > On 31 Mar 2021, at 00:54, Alexander Batyrshin <[email protected]
> > <mailto:[email protected]>> wrote:
> >
> > Hello,
> > I tried to create new consistent index on mutable table and found out that
> > IndexTool MapReduce works 3-5 times slower compared to old indexes on 4.14.2
> > So I have some question;
> >
> > 1) Is it possible to create index old way via intermediate HFiles and
> > bulk-loading?
> > 2) Is it possible to disable WAL on HBase index table for creation time?
> > 3) My main table has no updates, but I observe Delete mutations on index
> > table. Is it possible to disable this for initial index creation time?
> >
>