Hello All, We are using HashPartitioner in the following way on a 3 node cluster (1 master and 2 worker nodes).
val u = sc.textFile("hdfs://x.x.x.x:8020/user/azureuser/s.txt").map[(Int, Int)](line => { line.split("\\|") match { case Array(x, y) => (y.toInt, x.toInt) } }).partitionBy(new HashPartitioner(8)).setName("u").persist() u.count() If we run this from the spark shell, the data (52 MB) is split across the two worker nodes. But if we put this in a scala program and run it, then all the data goes to only one node. We have run it multiple times, but this behavior does not change. This seems strange. Is there some problem with the way we use HashPartitioner? Thanks in advance. Regards, Raghava.