Hi,

can you detail the symptom further? Was it that only 12 requests were
services and the other 440 timed out? I don't think that Spark is well
suited for this kind of workload, or at least the way it is being
represented. How long does a single request take Spark to complete?

Even with fair scheduling, you will only be able to have a fixed amount of
tasks running on Spark at once. Usually this is bounded by the max cores
setting in configuration. Since you mention local as a comparison point I
get the impression you are running Spark Standalone for cluster. The
implication, if this is reflective of your current setup, is that you
aren't going to get much concurrency for separate spray requests. lets say
your max cores is 16 and your number of tasks/partitions per stage of your
spark DAG is 8. Then at any given time only 2 requests can be serviced. It
may also be the case that with fair scheduling that a single request gets
pre-empted after completing one stage of the DAG and has to wait to
continue instead of proceeding directly to the next stage.

This hypothesis would also support the observation that local is no better
than cluster, because you probably have even less concurrent spark tasks
available on the single local machine.


spark.cores.max(not set)When running on a standalone deploy cluster
<https://spark.apache.org/docs/1.3.1/spark-standalone.html> or a Mesos
cluster in "coarse-grained" sharing mode
<https://spark.apache.org/docs/1.3.1/running-on-mesos.html#mesos-run-modes>,
the maximum amount of CPU cores to request for the application from across
the cluster (not from each machine). If not set, the default will be
spark.deploy.defaultCores on Spark's standalone cluster manager, or
infinite (all available cores) on Mesos.

On Tue, Jun 23, 2015 at 12:44 PM, daunnc <dau...@gmail.com> wrote:

> So the situation is following: got a spray server, with a spark context
> available (fair scheduling in a cluster mode, via spark-submit). There are
> some http urls, which calling spark rdd, and collecting information from
> accumulo / hdfs / etc (using rdd). Noticed, that there is a sort of
> limitation, on requests:
>
> wrk -t8 -c50 -d30s "http://localhost:4444/…/";
> Running 30s test @ http://localhost:4444/…/
>   8 threads and 50 connections
>   Thread Stats   Avg      Stdev     Max   +/- Stdev
>     Latency     1.03s   523.30ms   1.70s    50.00%
>     Req/Sec     6.05      5.49    20.00     71.58%
>   452 requests in 30.04s, 234.39KB read
>   Socket errors: connect 0, read 0, write 0, timeout 440
>
> So this happens on making some calls with spark rdd (not depends on called
> function), and in browser you can see ERR_EMPTY_RESPONSE
>
> Now the solution was to use cache, but want to know about this limitations,
> or mb some settings.
> This error happens in local mode and in cluster mode, so guess not depends
> on it.
>
> P.S. logs are clear (or simply don't know where to look, but stdout of a
> spar-submit in a client mode is clear).
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/Limitations-using-SparkContext-tp23452.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
>
>

Reply via email to