[ 
https://issues.apache.org/jira/browse/IMPALA-6422?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander Behm resolved IMPALA-6422.
------------------------------------
       Resolution: Fixed
    Fix Version/s: Impala 2.1.2

commit 1dfdc6704b74c77d63accb69e9197fd203455be0
Author: Alex Behm <alex.b...@cloudera.com>
Date:   Thu Jan 18 19:06:30 2018 -0800

    IMPALA-6422: Use ldexp() instead of powf() in HLL.
    
    Using ldexp() to compute a floating point power of two is
    over 10x faster than powf().
    
    This change is particularly helpful for speeding up
    COMPUTE STATS TABLESAMPLE which has many calls to
    HllFinalEstimate() where floating point power of two
    computations are relevant.
    
    Testing:
    - core/hdfs run passed
    
    Change-Id: I517614d3f9cf1cf56b15a173c3cfe76e0f2e0382
    Reviewed-on: http://gerrit.cloudera.org:8080/9078
    Reviewed-by: Alex Behm <alex.b...@cloudera.com>
    Tested-by: Impala Public Jenkins


> Compute stats tablesample spends a lot of time in powf()
> --------------------------------------------------------
>
>                 Key: IMPALA-6422
>                 URL: https://issues.apache.org/jira/browse/IMPALA-6422
>             Project: IMPALA
>          Issue Type: Improvement
>          Components: Backend
>    Affects Versions: Impala 2.11.0
>            Reporter: Alexander Behm
>            Assignee: Alexander Behm
>            Priority: Major
>              Labels: compute-stats, perfomance
>             Fix For: Impala 2.1.2
>
>
> [~mmokhtar] did perf profiling for COMPUTE STATS TABLESAMPLE and discovered 
> that a lot of time is spent on finalizing HLL intermediates. Most time is 
> spent in powf().
> Relevant snippet from AggregateFunctions::HllFinalEstimate() in 
> aggregate-functions-ir.cc:
> {code}
>   for (int i = 0; i < num_buckets; ++i) {
>     harmonic_mean += powf(2.0f, -buckets[i]);
>     if (buckets[i] == 0) ++num_zero_registers;
>   }
> {code}
> Since we're doing a power of 2 using ldexp() should be much more efficient.
> I did a microbenchmark and found that ldexp() is >10x faster than powf() for 
> this scenario.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to