[ https://issues.apache.org/jira/browse/SPARK-27586?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Dongjoon Hyun resolved SPARK-27586. ----------------------------------- Resolution: Fixed Assignee: WoudyGao Fix Version/s: 3.0.0 This is resolved via https://github.com/apache/spark/pull/24494 > Improve binary comparison: replace Scala's for-comprehension if statements > with while loop > ------------------------------------------------------------------------------------------ > > Key: SPARK-27586 > URL: https://issues.apache.org/jira/browse/SPARK-27586 > Project: Spark > Issue Type: Improvement > Components: SQL > Affects Versions: 2.4.2 > Environment: benchmark env: > * Intel(R) Xeon(R) CPU E5-2650 v4 @ 2.20GHz > * Linux 4.4.0-33.bm.1-amd64 > * java version "1.8.0_131" > * Scala 2.11.8 > * perf version 4.4.0 > Run: > 40,000,000 times comparison on 32 bytes-length binary > > Reporter: WoudyGao > Assignee: WoudyGao > Priority: Minor > Fix For: 3.0.0 > > > I found the cpu cost of TypeUtils.compareBinary is noticeable when handle > some big parquet files; > After some perf work, I found: > the " for-comprehension if statements" will execute ≈15X instructions than > while loop > > *'while-loop' version perf:* > > {{ 886.687949 task-clock (msec) # 1.257 CPUs > utilized}} > {{ 3,089 context-switches # 0.003 M/sec}} > {{ 265 cpu-migrations # 0.299 K/sec}} > {{ 12,227 page-faults # 0.014 M/sec}} > {{ 2,209,183,920 cycles # 2.492 GHz}} > {{ <not supported> stalled-cycles-frontend}} > {{ <not supported> stalled-cycles-backend}} > {{ 6,865,836,114 instructions # 3.11 insns per > cycle}} > {{ 1,568,910,228 branches # 1769.405 M/sec}} > {{ 9,172,613 branch-misses # 0.58% of all > branches}} > > {{ 0.705671157 seconds time elapsed}} > > *TypeUtils.compareBinary perf:* > {{ 16347.242313 task-clock (msec) # 1.233 CPUs > utilized}} > {{ 8,370 context-switches # 0.512 K/sec}} > {{ 481 cpu-migrations # 0.029 K/sec}} > {{ 536,671 page-faults # 0.033 M/sec}} > {{ 40,857,347,119 cycles # 2.499 GHz}} > {{ <not supported> stalled-cycles-frontend}} > {{ <not supported> stalled-cycles-backend}} > {{ 90,606,381,612 instructions # 2.22 insns per > cycle}} > {{ 18,107,867,151 branches # 1107.702 M/sec}} > {{ 12,880,296 branch-misses # 0.07% of all > branches}} > > {{ 13.257617118 seconds time elapsed}} -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org