Hi, I am a beginner of Hadoop and Spark, and want some help in understanding
how hadoop works.

If we have a cluster of 5 computers, and install Spark on the cluster
WITHOUT Hadoop. And then we run the code on one computer: 
val doc = sc.textFile("/home/scalatest.txt",5)
doc.count
Can the "count" task be distributed to all the 5 computers? Or it is only
run by 5 parallel threads of the current computer?

On th other hand, if we install Hadoop on the cluster and upload the data
into HDFS, when running the same code will this "count" task be done by 25
threads?

Thank you very much for your help. 



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Need-help-about-how-hadoop-works-tp4638.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

Reply via email to