I think you would have to be more specific. How are you running shortest-path? how long does it take? how long do you expect, roughly? does the bottleneck seem to be I/O, CPU? are you caching what needs to be cached?
If your cluster is virtualized, and has little memory, you may be hitting disk constantly, and also hitting the overhead of virtualized I/O. It's unclear what your infrastructure is like. "Too slow" is one of those how-long-is-a-piece-of-string questions. There's no inherent reason 500GB of data can't be processed but how fast will depend on what you are doing. On Fri, Aug 22, 2014 at 2:49 AM, Denis RP <qq378789...@gmail.com> wrote: > Hi, > > I'm using spark on a cluster of 8 VMs, each with two cores and 3.5GB RAM. > > But I need to run a shortest path algorithm on data of 500+GB(textfile, each > line contains a node id and nodes it points to) > > I've tested it on the cluster, but the speed seems to be extremely slow, and > haven't got any result yet. > > Is it natural to be so slow based on such cluster and data, or there is > something wrong since the problem can be solved much efficiently?(say half > an hour after reading the data?) > > Thanks! > > > > -- > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/The-running-time-of-spark-tp12624.html > Sent from the Apache Spark User List mailing list archive at Nabble.com. > > --------------------------------------------------------------------- > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > For additional commands, e-mail: user-h...@spark.apache.org > --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org