Dear Jeevan, Spark 1.2 is quite old, and If I were you I would go for a newer version.
However, is there a parallelism level (e.g., 20, 30) that works for both installations?
regards, Apostolos On 29/08/2018 04:55 μμ, jeevan.ks wrote:
Hi, I've two systems. One is built on Spark 1.2 and the other on 2.1. I am benchmarking both with the same benchmarks (wordcount, grep, sort, etc.) with the same data set from S3 bucket (size ranges from 50MB to 10 GB). The Spark cluster I made use of is r3.xlarge, 8 instances, 4 cores each, and 28GB RAM. I observed a strange behaviour while running the benchmarks and is as follows: - When I ran Spark 1.2 version with default partition number (sc.defaultParallelism), the jobs would take forever to complete. So I changed it to the number of cores, i.e., 32 times 3 = 96. This did a magic and the jobs completed quickly. - However, when I tried the above magic number on the version 2.1, the jobs are taking forever. Deafult parallelism works better, but not that efficient. I'm having problem to rationalise this and compare both the systems. My question is: what changes were made from 1.2 to 2.1 with respect to default parallelism for this behaviour to occur? How can I have both versions behave similary on the same software/hardware configuration so that I can compare? I'd really appreciate your help on this! Cheers, Jeevan -- Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/ --------------------------------------------------------------------- To unsubscribe e-mail: user-unsubscr...@spark.apache.org
-- Apostolos N. Papadopoulos, Associate Professor Department of Informatics Aristotle University of Thessaloniki Thessaloniki, GREECE tel: ++0030312310991918 email: papad...@csd.auth.gr twitter: @papadopoulos_ap web: http://delab.csd.auth.gr/~apostol --------------------------------------------------------------------- To unsubscribe e-mail: user-unsubscr...@spark.apache.org