Hello All,      I am writing a simple Spark application  to count  UV(unique 
view) from a log file。Below is my code,it is not right on the red line .My idea 
 here is same cookie on a host  only count one .So i want to split the host 
from the previous RDD. But now I don't know how to finish it .Any suggestion 
will be appreciate! val url_index = args(1).toIntval cookie_index = 
args(2).toIntval textRDD = sc.textFile(args(0))       .map(_.split("\t"))     
.map(line => ((new java.net.URL(line(url_index)).getHost) + "\t" + 
line(cookie_index),1))       .reduceByKey(_ + _)     .map(line => 
(line.split("\t")(0),1))           .reduceByKey(_ + _)     .map(item => 
item.swap)         .sortByKey(false)       .map(item => item.swap)
--------------------------------

Reply via email to