Hello All, I am writing a simple Spark application to count UV(unique
view) from a log file。Below is my code,it is not right on the red line .My idea
here is same cookie on a host only count one .So i want to split the host
from the previous RDD. But now I don't know how to finish it .Any suggestion
will be appreciate! val url_index = args(1).toIntval cookie_index =
args(2).toIntval textRDD = sc.textFile(args(0)) .map(_.split("\t"))
.map(line => ((new java.net.URL(line(url_index)).getHost) + "\t" +
line(cookie_index),1)) .reduceByKey(_ + _) .map(line =>
(line.split("\t")(0),1)) .reduceByKey(_ + _) .map(item =>
item.swap) .sortByKey(false) .map(item => item.swap)
--------------------------------