from:"Junjie Qian"

Why no computations run on workers/slaves in cluster mode?

2016-02-17 Thread Junjie Qian

Hi all, I am new to Spark, and have one problem that, no computations run on workers/slave_servers in the standalone cluster mode. The Spark version is 1.6.0, and environment is CentOS. I run the example codes, e.g. https://github.com/apache/spark/blob/master/examples/src/main/scala/org/apache/s

How to parallel read files in a directory

2016-02-11 Thread Junjie Qian

Hi all, I am working with Spark 1.6, scala and have a big dataset divided into several small files. My question is: right now the read operation takes really long time and often has RDD warnings. Is there a way I can read the files in parallel, that all nodes or workers read the file at the same