As the size can cause confusion as you pointed, let me explain about it for a while for others.
The benchmark size comes from a scale factor of dsdgen (tpc-"ds" "d"ata "gen"erator). If you take a look at http://eastcirclek.blogspot.kr/2016/12/loading-tpc-ds-data-into-mysql.html <http://eastcirclek.blogspot.kr/2016/12/loading-tpc-ds-data-into-mysql.html>, a user can specify the scale factor when executing dsdgen (with -scale parameter in Step 3). Using N as scale factor results in 24 CSV files (each of which is one of 24 tpc-ds tables) collectively amounting to N gigabytes. When I convert 1TB dataset into ORC and Parquet, data sizes are - ORC : 313,787,478,951 bytes - Parquet : 544,802,530,684 bytes , respectively. (High compression ratio is a reason why I use ORC over Parquet) Individual table sizes are smaller than the total size. The size of each table for 1TB workload stored as ORC is as follows: $ hdfs dfs -du /user/hive/warehouse/tpcds_bin_partitioned_orc_1024.db | sort -nr 138604293501 /user/hive/warehouse/tpcds_bin_partitioned_orc_1024.db/store_sales 95512050042 /user/hive/warehouse/tpcds_bin_partitioned_orc_1024.db/catalog_sales 50883597954 /user/hive/warehouse/tpcds_bin_partitioned_orc_1024.db/web_sales 14898403615 /user/hive/warehouse/tpcds_bin_partitioned_orc_1024.db/store_returns 9381171748 /user/hive/warehouse/tpcds_bin_partitioned_orc_1024.db/catalog_returns 4467912025 /user/hive/warehouse/tpcds_bin_partitioned_orc_1024.db/web_returns 23303614 /user/hive/warehouse/tpcds_bin_partitioned_orc_1024.db/inventory 11922214 /user/hive/warehouse/tpcds_bin_partitioned_orc_1024.db/customer 2010633 /user/hive/warehouse/tpcds_bin_partitioned_orc_1024.db/customer_address 1183265 /user/hive/warehouse/tpcds_bin_partitioned_orc_1024.db/item 698942 /user/hive/warehouse/tpcds_bin_partitioned_orc_1024.db/customer_demographics 403459 /user/hive/warehouse/tpcds_bin_partitioned_orc_1024.db/catalog_page 361232 /user/hive/warehouse/tpcds_bin_partitioned_orc_1024.db/date_dim 132187 /user/hive/warehouse/tpcds_bin_partitioned_orc_1024.db/time_dim 10943 /user/hive/warehouse/tpcds_bin_partitioned_orc_1024.db/promotion 6066 /user/hive/warehouse/tpcds_bin_partitioned_orc_1024.db/web_site 4911 /user/hive/warehouse/tpcds_bin_partitioned_orc_1024.db/store 4686 /user/hive/warehouse/tpcds_bin_partitioned_orc_1024.db/call_center 2562 /user/hive/warehouse/tpcds_bin_partitioned_orc_1024.db/web_page 1836 /user/hive/warehouse/tpcds_bin_partitioned_orc_1024.db/warehouse 1320 /user/hive/warehouse/tpcds_bin_partitioned_orc_1024.db/ship_mode 895 /user/hive/warehouse/tpcds_bin_partitioned_orc_1024.db/household_demographics 888 /user/hive/warehouse/tpcds_bin_partitioned_orc_1024.db/reason 413 /user/hive/warehouse/tpcds_bin_partitioned_orc_1024.db/income_band Considering every query doesn't need all tables, Presto can process many queries within its budget. What can be a problem is a case in which the intermediate data generated by (outer-)joining two large tables cause a query execution to use more memory than query.max-memory or query.max-memory-per-node in the middle of the execution, in which case Presto stops executing the query. 1TB workload causes more queries to exceed memory limit than 100TB workload. Due to its in-memory nature, Presto doesn't show any spill during the execution of queries, which results in no disk write during the query execution as shown in page 11 of my slide. I found Presto is working on spilling data during join, but the issue is still open: https://github.com/prestodb/presto/issues/5897 <https://github.com/prestodb/presto/issues/5897>. - Dongwon > 2017. 1. 31. 오전 9:55, Goden Yao <[email protected]> 작성: > > ORC works well with Presto too at least. > Can you explain a little how you ran 1TB benchmark on a 5*80 = 400GB total > memory in presto cluster. > Did you use compression to fit them all in memory? or partitioned data , etc. > > > On Mon, Jan 30, 2017 at 3:50 PM Dongwon Kim <[email protected] > <mailto:[email protected]>> wrote: > Goun : Just to make all the engines use the same data and I usually > store data in ORC. I know that it can make biased results in favor of > Hive. I did Spark experiments with Parquet, and Spark works better > with Parquet as it is believed (not included in the result though). > > Goden : Oops, 128GB main memory for the master and all the slaves for > sure because I'm using 80GB per each node. > > Gopal : (yarn logs -application $APPID) doesn't contain a line > containing HISTORY so it doesn't produce svg file. Should I turn on > some option to get the lines containing HISTORY in yarn application > log? > > 2017-01-31 4:47 GMT+09:00 Goden Yao <[email protected] > <mailto:[email protected]>>: > > was the master 128MB or 128GB memory? > > > > > > On Mon, Jan 30, 2017 at 3:24 AM Gopal Vijayaraghavan <[email protected] > > <mailto:[email protected]>> > > wrote: > >> > >> > >> > Hive LLAP shows better performance than Presto and Spark for most > >> > queries, but it shows very poor performance on the execution of query 72. > >> > >> My suspicion will be the the inventory x catalog_sales x warehouse join - > >> assuming the column statistics are present and valid. > >> > >> If you could send the explain formatted plans and swimlanes for LLAP, I > >> can probably debug this better. > >> > >> > >> https://github.com/apache/tez/blob/master/tez-tools/swimlanes/yarn-swimlanes.sh > >> > >> <https://github.com/apache/tez/blob/master/tez-tools/swimlanes/yarn-swimlanes.sh> > >> > >> Use the "submitted to <appid>" in this to get the diagram. > >> > >> Cheers, > >> Gopal > >> > >> > > -- > > Goden > -- > Goden
