Hello Spark community,

We have a project where we want to use Spark as computation engine to perform 
calculations and return result via REST services.
Working with Spark we have learned how to do things to make it work faster and 
finally optimize our code to produce results in acceptable time (1-2 seconds). 
But when we tried to test it under concurrent load we realizes that time grows 
significantly while increasing amount of concurrent requests. This was expected 
and we are trying to find the way to scale our solution to get acceptable time 
under concurrent load, and here we faced with fact that adding more slave 
servers not increasing average timing while having several concurrent requests.

As for now I observing following behavior: While hitting our test REST service 
using 100 threads and having 1 master and one slave node we have average timing 
for those 100 requests 30.6 seconds/request, in case we add 2 slave nodes 
average time becomes 29.8 seconds/request, which seems pretty similar to test 
case with one slave node.  While doing those test cases we monitor server load 
using htop and weird thing here that in first case our slave node’s CPU’s were 
loaded on 90-95% and in second case with 2 slaves it loads CPU’s on 45-50%.
We are trying to find bottleneck in our solution but was not succeed in this 
exercise yet. We have checked all hardware for possible bottleneck Network IO, 
Disk IO, RAM, CPU but no-one seems to be even closed to limit.

Our major suspects at this moment is Spark configuration. Our application is 
Self Contained Spark application, Spark is ruined in standalone mode (without 
external resource managers). We are not submitting jar to spark using shell 
script, instead we create spark context in our spring boot application and it 
connects to spark master and slave by itself. All requests to spark are going 
through internal thread pool. We have experimented with thread pool sizes and 
find out the best performance appears when we have 16 threads in thread pool, 
where each thread is performing one or several manipulations with RDDs, and by 
itself could be considered as spark job.

For now I assume we either misconfigured Spark due to lack of experience with 
it, or our use case is not really use case for Spark and it is simply not 
designed for big parallel load. I’ve make this conclusion since behavior when 
after adding new nodes we have no performance gain doesn’t makes any sense for 
me. So any help and ideas would be really helpful.


Hardware details:
Azure D5v2 instances for master and 3 slaves. D5v2 featured with 2.4 GHz 
E5-2673 v3 (16 cores, 56Gb RAM, SSD HDD). Using ipref we have tested network 
speed and it is around 1 Gb/s.



-- 


CONFIDENTIALITY NOTICE: This email and files attached to it are 
confidential. If you are not the intended recipient you are hereby notified 
that using, copying, distributing or taking any action in reliance on the 
contents of this information is strictly prohibited. If you have received 
this email in error please notify the sender and delete this email.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Reply via email to