Hi, I am a beginner of Hadoop and Spark, and want some help in understanding how hadoop works.
If we have a cluster of 5 computers, and install Spark on the cluster WITHOUT Hadoop. And then we run the code on one computer: val doc = sc.textFile("/home/scalatest.txt",5) doc.count Can the "count" task be distributed to all the 5 computers? Or it is only run by 5 parallel threads of the current computer? On th other hand, if we install Hadoop on the cluster and upload the data into HDFS, when running the same code will this "count" task be done by 25 threads? Thank you very much for your help. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Need-help-about-how-hadoop-works-tp4638.html Sent from the Apache Spark User List mailing list archive at Nabble.com.