Hello All,

We are using HashPartitioner in the following way on a 3 node cluster (1
master and 2 worker nodes).

val u = sc.textFile("hdfs://x.x.x.x:8020/user/azureuser/s.txt").map[(Int,
Int)](line => { line.split("\\|") match { case Array(x, y) => (y.toInt,
x.toInt) } }).partitionBy(new HashPartitioner(8)).setName("u").persist()

u.count()

If we run this from the spark shell, the data (52 MB) is split across the
two worker nodes. But if we put this in a scala program and run it, then
all the data goes to only one node. We have run it multiple times, but this
behavior does not change. This seems strange.

Is there some problem with the way we use HashPartitioner?

Thanks in advance.

Regards,
Raghava.

Reply via email to