[ 
https://issues.apache.org/jira/browse/SPARK-27586?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-27586.
-----------------------------------
       Resolution: Fixed
         Assignee: WoudyGao
    Fix Version/s: 3.0.0

This is resolved via https://github.com/apache/spark/pull/24494

> Improve binary comparison: replace Scala's for-comprehension if statements 
> with while loop
> ------------------------------------------------------------------------------------------
>
>                 Key: SPARK-27586
>                 URL: https://issues.apache.org/jira/browse/SPARK-27586
>             Project: Spark
>          Issue Type: Improvement
>          Components: SQL
>    Affects Versions: 2.4.2
>         Environment: benchmark env:
>  * Intel(R) Xeon(R) CPU E5-2650 v4 @ 2.20GHz
>  * Linux 4.4.0-33.bm.1-amd64
>  * java version "1.8.0_131"
>  * Scala 2.11.8
>  * perf version 4.4.0
> Run:
> 40,000,000 times comparison on 32 bytes-length binary
>  
>            Reporter: WoudyGao
>            Assignee: WoudyGao
>            Priority: Minor
>             Fix For: 3.0.0
>
>
> I found the cpu cost of TypeUtils.compareBinary is noticeable when handle 
> some big parquet files;
> After some perf work, I found:
> the " for-comprehension if statements" will execute ≈15X instructions than 
> while loop
>  
> *'while-loop' version perf:*
>   
>  {{        886.687949      task-clock (msec)         #    1.257 CPUs 
> utilized}}
>  {{             3,089      context-switches          #    0.003 M/sec}}
>  {{               265      cpu-migrations            #    0.299 K/sec}}
>  {{            12,227      page-faults               #    0.014 M/sec}}
>  {{     2,209,183,920      cycles                    #    2.492 GHz}}
>  {{   <not supported>      stalled-cycles-frontend}}
>  {{   <not supported>      stalled-cycles-backend}}
>  {{     6,865,836,114      instructions              #    3.11  insns per 
> cycle}}
>  {{     1,568,910,228      branches                  # 1769.405 M/sec}}
>  {{         9,172,613      branch-misses             #    0.58% of all 
> branches}}
>   
>  {{       0.705671157 seconds time elapsed}}
>   
> *TypeUtils.compareBinary perf:*
>  {{      16347.242313      task-clock (msec)         #    1.233 CPUs 
> utilized}}
>  {{             8,370      context-switches          #    0.512 K/sec}}
>  {{               481      cpu-migrations            #    0.029 K/sec}}
>  {{           536,671      page-faults               #    0.033 M/sec}}
>  {{    40,857,347,119      cycles                    #    2.499 GHz}}
>  {{   <not supported>      stalled-cycles-frontend}}
>  {{   <not supported>      stalled-cycles-backend}}
>  {{    90,606,381,612      instructions              #    2.22  insns per 
> cycle}}
>  {{    18,107,867,151      branches                  # 1107.702 M/sec}}
>  {{        12,880,296      branch-misses             #    0.07% of all 
> branches}}
>   
>  {{      13.257617118 seconds time elapsed}}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to