Hi,

I'm using *Phoenix4.6* and in my use case I have a table that keeps a
sliding window of 7 days worth of data. I have 3 local indexes on this
table and in out use case we have aprox: 150 producers that are inserting
data (in batches of 300-1500 events) in real-time.

Some days ago I started to get a lot of errors like the below ones. The
number of errors was so large that the cluster performance dropped a lot
and my disks read bandwidth was crazy high but the write bandwidth was
normal. I can ensure that during that period no readers were running only
producers.

ERROR [B.defaultRpcServer.handler=25,queue=5,port=16020]
> parallel.BaseTaskRunner: Found a failed task because:
> org.apache.hadoop.hbase.DoNotRetryIOException: *ERROR 2008 (INT10): ERROR
> 2008 (INT10): Unable to find cached index metadata.*
>  key=4276342695061435086
> region=BIDDING_EVENTS,\xFEK\x17\xE4\xB1~K\x08,1458435680333.ee29454d68f5b679a8e8cc775dd0edfa.
> *Index update failed*
> java.util.concurrent.ExecutionException:
> org.apache.hadoop.hbase.DoNotRetryIOException: ERROR 2008 (INT10): ERROR
> 2008 (INT10): Unable to find cached index metadata.
>  key=4276342695061435086
> region=BIDDING_EVENTS,\xFEK\x17\xE4\xB1~K\x08,1458435680333.ee29454d68f5b679a8e8cc775dd0edfa.
> Index update failed
> Caused by: org.apache.hadoop.hbase.DoNotRetryIOException: ERROR 2008
> (INT10): ERROR 2008 (INT10): Unable to find cached index metadata.
>  key=4276342695061435086
> region=BIDDING_EVENTS,\xFEK\x17\xE4\xB1~K\x08,1458435680333.ee29454d68f5b679a8e8cc775dd0edfa.
> Index update failed
> Caused by: java.sql.SQLException: ERROR 2008 (INT10): Unable to find
> cached index metadata.  key=4276342695061435086
> region=BIDDING_EVENTS,\xFEK\x17\xE4\xB1~K\x08,1458435680333.ee29454d68f5b679a8e8cc775dd0edfa.
> INFO  [B.defaultRpcServer.handler=25,queue=5,port=16020]
> parallel.TaskBatch: Aborting batch of tasks because Found a failed task
> because: org.apache.hadoop.hbase.DoNotRetryIOException: ERROR 2008 (INT10):
> ERROR 2008 (INT10): Unable to find cached index metadata.
>  key=4276342695061435086
> region=BIDDING_EVENTS,\xFEK\x17\xE4\xB1~K\x08,1458435680333.ee29454d68f5b679a8e8cc775dd0edfa.
> Index update failed
> ERROR [B.defaultRpcServer.handler=25,queue=5,port=16020] 
> *builder.IndexBuildManager:
> Found a failed index update!*
> INFO  [B.defaultRpcServer.handler=25,queue=5,port=16020]
> util.IndexManagementUtil: Rethrowing
> org.apache.hadoop.hbase.DoNotRetryIOException: ERROR 2008 (INT10): ERROR
> 2008 (INT10): Unable to find cached index metadata.
>  key=4276342695061435086
> region=BIDDING_EVENTS,\xFEK\x17\xE4\xB1~K\x08,1458435680333.ee29454d68f5b679a8e8cc775dd0edfa.
> Index update failed


I searched for the error and I made the following changes on the server
side:

   - *phoenix.coprocessor.maxServerCacheTimeToLiveMs *from 30s to 2min
   - *phoenix.coprocessor.maxMetaDataCacheSize* from 20MB to 40MB

After I changed these properties I restarted the cluster and the errors
were gone but disks read bandwidth was still very high and I was getting
*responseTooSlow* warnings. As a quick solution I created fresh tables and
then the problems were gone.

Now, after one day running with new tables I started to see the problem
again but I think this was during a major compaction but I would like to
understand more the reasons&consequences of these problems.

- What are the major consequences of these errors? I assume that index data
is not written within the index table, right? Then, why was the read
bandwidth of my disks so high even without readers and after changed those
properties?

- Is there any optimal or recommended value for the above properties or am
I missing some tunning on other properties for the metadata cache?

Thank you,
Pedro

Reply via email to