Hi,

When I run the below program, I see two files in the HDFS because the
number of partitions in 2. But, one of the file is empty. Why is it so? Is
the work not distributed equally to all the tasks?

textFile.flatMap(lambda line: line.split()).map(lambda word: (word, 1)).
*reduceByKey*(lambda a, b: a+b).*repartition(2)*
.saveAsTextFile("hdfs://localhost:9000/user/praveen/output/")

Thanks,
Praveen

Reply via email to