Re: why a machine learning application run slowly on the spark cluster

Xiangrui Meng Tue, 29 Jul 2014 22:26:30 -0700

Could you share more details about the dataset and the algorithm? For
example, if the dataset has 10M+ features, it may be slow for the driver to
collect the weights from executors (just a blind guess). -Xiangrui



On Tue, Jul 29, 2014 at 9:15 PM, Tan Tim <unname...@gmail.com> wrote:

> Hi, all
>
> [Setting]
>
> Input data:
> the data on the hdfs, 10 part (text file), the size of every part is about
> 2.3G
>
> Spark Clusters
> Run on CentOS, 8 machines, 8 cores and 128G Memory per machine.
>
> The setting for Spark Context:
> val conf = new SparkConf().setMaster("spark://xxx-xxx-xx001:12036").
> setAppName("OWLQN").setSparkHome("/var/bh/lib/spark-0.9.1-bin-hadoop1").
> setJars(List(jarFile))
> conf.set("spark.serializer", "org.apache.spark.serializer.KryoSerializer")
> conf.set("spark.kryo.registrator", "LRRegistrator")
> conf.set("spark.executor.memory", "64g")
> conf.set("spark.default.parallelism", "128")
> conf.set("spark.akka.timeout", "60")
> conf.set("spark.storage.memoryFraction", "0.7")
> conf.set("spark.kryoserializer.buffer.mb", "1024")
> conf.set("spark.cores.max", "64")
> conf.set("spark.speculation", "true")
> conf.set("spark.storage.blockManagerTimeoutIntervalMs", "60000")
> val sc = new SparkContext(conf)
>
> [Trouble]
>
> Executor not start up concurency
> For every stage, the executor not start up concurrency, some executor
> finished all the tasks, other excutor still not begin the task, as the
> webUI shows (some executors  finished 10 tasks, and the other two is still
> not shown on the webUI):
>
> as Andrew Xia suggestion, I add sleep after new spark context, but some
> stage also has this problem.
>
> IO/CPU alwsy not fully used
> when taskes start up, all the cpu is not fully used, the usage of cpu more
> than 100% for  less than 2 seconds, and then drop to 1%, but all the task
> not finished. The same thing happens to I/O
>
>
> The attach file is the log for some stages, every stage average 3.5
> minutes, too slowly compares to other experiment(run the same task on the
> clusters of ubuntu not centos)
>
>
>

Re: why a machine learning application run slowly on the spark cluster

Reply via email to