Re: Experimental results using TPC-DS (versus Spark and Presto)

김동원 Mon, 30 Jan 2017 17:49:50 -0800

As the size can cause confusion as you pointed, let me explain about it for a 
while for others.


The benchmark size comes from a scale factor of dsdgen (tpc-"ds" "d"ata 
"gen"erator).
If you take a look at 
http://eastcirclek.blogspot.kr/2016/12/loading-tpc-ds-data-into-mysql.html 
<http://eastcirclek.blogspot.kr/2016/12/loading-tpc-ds-data-into-mysql.html>,
a user can specify the scale factor when executing dsdgen (with -scale 
parameter in Step 3).
Using N as scale factor results in 24 CSV files (each of which is one of 24 
tpc-ds tables) collectively amounting to N gigabytes.
When I convert 1TB dataset into ORC and Parquet, data sizes are 
- ORC : 313,787,478,951 bytes
- Parquet :  544,802,530,684 bytes
, respectively.
(High compression ratio is a reason why I use ORC over Parquet)

Individual table sizes are smaller than the total size.
The size of each table for 1TB workload stored as ORC is as follows:
$ hdfs dfs -du /user/hive/warehouse/tpcds_bin_partitioned_orc_1024.db | sort -nr
138604293501  /user/hive/warehouse/tpcds_bin_partitioned_orc_1024.db/store_sales
95512050042   
/user/hive/warehouse/tpcds_bin_partitioned_orc_1024.db/catalog_sales
50883597954   /user/hive/warehouse/tpcds_bin_partitioned_orc_1024.db/web_sales
14898403615   
/user/hive/warehouse/tpcds_bin_partitioned_orc_1024.db/store_returns
9381171748    
/user/hive/warehouse/tpcds_bin_partitioned_orc_1024.db/catalog_returns
4467912025    /user/hive/warehouse/tpcds_bin_partitioned_orc_1024.db/web_returns
23303614      /user/hive/warehouse/tpcds_bin_partitioned_orc_1024.db/inventory
11922214      /user/hive/warehouse/tpcds_bin_partitioned_orc_1024.db/customer
2010633       
/user/hive/warehouse/tpcds_bin_partitioned_orc_1024.db/customer_address
1183265       /user/hive/warehouse/tpcds_bin_partitioned_orc_1024.db/item
698942        
/user/hive/warehouse/tpcds_bin_partitioned_orc_1024.db/customer_demographics
403459        
/user/hive/warehouse/tpcds_bin_partitioned_orc_1024.db/catalog_page
361232        /user/hive/warehouse/tpcds_bin_partitioned_orc_1024.db/date_dim
132187        /user/hive/warehouse/tpcds_bin_partitioned_orc_1024.db/time_dim
10943         /user/hive/warehouse/tpcds_bin_partitioned_orc_1024.db/promotion
6066          /user/hive/warehouse/tpcds_bin_partitioned_orc_1024.db/web_site
4911          /user/hive/warehouse/tpcds_bin_partitioned_orc_1024.db/store
4686          /user/hive/warehouse/tpcds_bin_partitioned_orc_1024.db/call_center
2562          /user/hive/warehouse/tpcds_bin_partitioned_orc_1024.db/web_page
1836          /user/hive/warehouse/tpcds_bin_partitioned_orc_1024.db/warehouse
1320          /user/hive/warehouse/tpcds_bin_partitioned_orc_1024.db/ship_mode
895           
/user/hive/warehouse/tpcds_bin_partitioned_orc_1024.db/household_demographics
888           /user/hive/warehouse/tpcds_bin_partitioned_orc_1024.db/reason
413           /user/hive/warehouse/tpcds_bin_partitioned_orc_1024.db/income_band

Considering every query doesn't need all tables, Presto can process many 
queries within its budget.
What can be a problem is a case in which the intermediate data generated by 
(outer-)joining two large tables cause a query execution to use more memory 
than query.max-memory or query.max-memory-per-node in the middle of the 
execution, in which case Presto stops executing the query.
1TB workload causes more queries to exceed memory limit than 100TB workload.
Due to its in-memory nature, Presto doesn't show any spill during the execution 
of queries, which results in no disk write during the query execution as shown 
in page 11 of my slide.
I found Presto is working on spilling data during join, but the issue is still 
open: https://github.com/prestodb/presto/issues/5897 
<https://github.com/prestodb/presto/issues/5897>.

- Dongwon

> 2017. 1. 31. 오전 9:55, Goden Yao <[email protected]> 작성:
> 
> ORC works well with Presto too at least.
> Can you explain a little how you ran 1TB benchmark on a 5*80 = 400GB total 
> memory in presto cluster. 
> Did you use compression to fit them all in memory? or partitioned data , etc.
> 
> 
> On Mon, Jan 30, 2017 at 3:50 PM Dongwon Kim <[email protected] 
> <mailto:[email protected]>> wrote:
> Goun : Just to make all the engines use the same data and I usually
> store data in ORC. I know that it can make biased results in favor of
> Hive. I did Spark experiments with Parquet, and Spark works better
> with Parquet as it is believed (not included in the result though).
> 
> Goden : Oops, 128GB main memory for the master and all the slaves for
> sure because I'm using 80GB per each node.
> 
> Gopal : (yarn logs -application $APPID) doesn't contain a line
> containing HISTORY so it doesn't produce svg file. Should I turn on
> some option to get the lines containing HISTORY in yarn application
> log?
> 
> 2017-01-31 4:47 GMT+09:00 Goden Yao <[email protected] 
> <mailto:[email protected]>>:
> > was the master 128MB or 128GB memory?
> >
> >
> > On Mon, Jan 30, 2017 at 3:24 AM Gopal Vijayaraghavan <[email protected] 
> > <mailto:[email protected]>>
> > wrote:
> >>
> >>
> >> > Hive LLAP shows better performance than Presto and Spark for most
> >> > queries, but it shows very poor performance on the execution of query 72.
> >>
> >> My suspicion will be the the inventory x catalog_sales x warehouse join -
> >> assuming the column statistics are present and valid.
> >>
> >> If you could send the explain formatted plans and swimlanes for LLAP, I
> >> can probably debug this better.
> >>
> >>
> >> https://github.com/apache/tez/blob/master/tez-tools/swimlanes/yarn-swimlanes.sh
> >>  
> >> <https://github.com/apache/tez/blob/master/tez-tools/swimlanes/yarn-swimlanes.sh>
> >>
> >> Use the "submitted to <appid>" in this to get the diagram.
> >>
> >> Cheers,
> >> Gopal
> >>
> >>
> > --
> > Goden
> -- 
> Goden

Re: Experimental results using TPC-DS (versus Spark and Presto)

Reply via email to