I think you would have to be more specific. How are you running
shortest-path? how long does it take? how long do you expect, roughly?
does the bottleneck seem to be I/O, CPU? are you caching what needs to
be cached?

If your cluster is virtualized, and has little memory, you may be
hitting disk constantly, and also hitting the overhead of virtualized
I/O. It's unclear what your infrastructure is like.

"Too slow" is one of those how-long-is-a-piece-of-string questions.
There's no inherent reason 500GB of data can't be processed but how
fast will depend on what you are doing.

On Fri, Aug 22, 2014 at 2:49 AM, Denis RP <qq378789...@gmail.com> wrote:
> Hi,
>
> I'm using spark on a cluster of 8 VMs, each with two cores and 3.5GB RAM.
>
> But I need to run a shortest path algorithm on data of 500+GB(textfile, each
> line contains a node id and nodes it points to)
>
> I've tested it on the cluster, but the speed seems to be extremely slow, and
> haven't got any result yet.
>
> Is it natural to be so slow based on such cluster and data, or there is
> something wrong since the problem can be solved much efficiently?(say half
> an hour after reading the data?)
>
> Thanks!
>
>
>
> --
> View this message in context: 
> http://apache-spark-user-list.1001560.n3.nabble.com/The-running-time-of-spark-tp12624.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Reply via email to