[ 
https://issues.apache.org/jira/browse/HUDI-1970?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17433587#comment-17433587
 ] 

Raymond Xu edited comment on HUDI-1970 at 10/25/21, 7:13 AM:
-------------------------------------------------------------

* 1B records (randomized values in the example trip model)
 * 100 partitions, evenly distributed, `year=*/month=*/day=*`, 50 parquet files 
/ partition
 * hudi: 109.8 GB = 22.4 MB parquet x 5000
 * delta: 70.9 GB = 14.5 MB parquet x 5000

|SQL|Hudi 0.9.0|
|select fare, begin_lon, begin_lat, ts from hudi_trips_snapshot where fare > 
20.0|129.352|108.312|104.914|
|select count(*) from hudi_trips_snapshot|96.001|83.839|66.973|
|select count(*) from hudi_trips_snapshot where year = '2020' and month = '03' 
and day = '01'|1.880|1.776|1.767|
|select fare, begin_lon, begin_lat, ts from hudi_trips_snapshot where 
year='2020' and month='03' and day='01' and fare between 20 and 
50|3.650|3.147|3.086|


was (Author: xushiyan):
* 1B records (randomized values in the example trip model)
 * 100 partitions, evenly distributed, year=*/month=*/day=*, 50 parquet files / 
partition
 * EMR 6.2 Spark 3.0.1-amzn-0
 * S3, parquet compression snappy
 * hudi: 109.8 GB = 22.4 MB parquet x 5000
 * delta: 70.9 GB = 14.5 MB parquet x 5000

|SQL|Hudi 0.9.0|
|select fare, begin_lon, begin_lat, ts from hudi_trips_snapshot where fare > 
20.0|129.352|108.312|104.914|
|select count(*) from hudi_trips_snapshot|96.001|83.839|66.973|
|select count(*) from hudi_trips_snapshot where year = '2020' and month = '03' 
and day = '01'|1.880|1.776|1.767|
|select fare, begin_lon, begin_lat, ts from hudi_trips_snapshot where 
year='2020' and month='03' and day='01' and fare between 20 and 
50|3.650|3.147|3.086|

> Performance testing/certification of key SQL DMLs
> -------------------------------------------------
>
>                 Key: HUDI-1970
>                 URL: https://issues.apache.org/jira/browse/HUDI-1970
>             Project: Apache Hudi
>          Issue Type: Sub-task
>          Components: Performance, Spark Integration
>            Reporter: Vinoth Chandar
>            Assignee: Raymond Xu
>            Priority: Blocker
>             Fix For: 0.10.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to