David Wayne Birdsall created TRAFODION-2017:
-----------------------------------------------

             Summary: Tune HBase row cache sizes so UPDATE STATS completes
                 Key: TRAFODION-2017
                 URL: https://issues.apache.org/jira/browse/TRAFODION-2017
             Project: Apache Trafodion
          Issue Type: Bug
          Components: sql-cmp
    Affects Versions: 2.0-incubating
         Environment: All, though prevalent on clusters with larger tables 
and/or heavier loads
            Reporter: David Wayne Birdsall
            Assignee: David Wayne Birdsall
             Fix For: 2.1-incubating


UPDATE STATISTICS often fails with HBase socket timeout exception and/or 
scanner timeout exception when run with sampling on larger tables or clusters 
with heavy concurrent workloads.

We have experimented in the past with setting various CQDs on large tables to 
reduce these failures, however we were loathe to set them all the time due to 
fears that this would lengthen elapsed time in non-failure scenarios. 

Some recent work by Carol Pearson however shows that the increase in elapsed 
time is negligible for smaller tables and in failure scenarios the failure does 
not occur quickly, so paying a small penalty in elapsed time to increase the 
probability of success seems a better trade-off.

Carol's work involves tables of less than 1 billion rows. The existing CQD 
logic is still required for larger tables. But for tables of less than 1 
billion rows, she recommends setting HBASE_ROWS_CACHED_MIN and 
HBASE_ROWS_CACHED_MAX to '50'. This JIRA is written to cover this change. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to