I'm processing about 10GB of tab delimited rawdata with a few fields (page
and user id along with timestamp when user viewed the page) using a 40 node
cluster and using SparkSQL to compute the number of unique visitors per page
at various intervals. I'm currently just reading the data as sc.textFile()
which returns the RDD to register and run the sql query against the table.

Since the RDD is already a partitioned collection, wanted to check if there
are examples on how to use partitioner or repartition on the RDD to see if
they will help improve the performance further.

Thanks!




--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/RDD-partitioner-or-repartition-examples-tp11813.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Reply via email to