Re: Sorting tuples with byte key and byte value

2019-07-16 Thread Supun Kamburugamuve
Thanks, Keith. we have set the SPARK_WORKER_INSTANCES=8. So that means we are running 8 workers in a single machine with 1 thread and this gives the 8 threads? Is there a preference for running 1 worker and 8 threads inside it? These are dual CPU machines, so I believe we at least need 2 worker

Re: Sorting tuples with byte key and byte value

2019-07-15 Thread Keith Chapman
Hi Supun, A couple of things with regard to your question. --executor-cores means the number of worker threads per VM. According to your requirement this should be set to 8. *repartitionAndSortWithinPartitions *is a RDD operation, RDD operations in Spark are not performant both in terms of

Sorting tuples with byte key and byte value

2019-07-15 Thread Supun Kamburugamuve
Hi all, We are trying to measure the sorting performance of Spark. We have a 16 node cluster with 48 cores and 256GB of ram in each machine and 10Gbps network. Let's say we are running with 128 parallel tasks and each partition generates about 1GB of data (total 128GB). We are using the method