subject:"Data are partial to a specific partition after sort"

Re: Data are partial to a specific partition after sort

2015-01-29 Thread Sean Owen

(By the way, you can use wordRDD.countByValue instead of the map and reduceByKey. It won't make a difference to your issue but is more compact.) As you say, the problem is the very limited range of keys (word lengths). I wonder if you can use sortBy instead of map and sortByKey, and instead

Data are partial to a specific partition after sort

2015-01-28 Thread 瀬川　卓也

For example, We consider the word count of the long text data (100GB order). There is clearly a bias for the word , has been expected to be a long tail data do word count. Probably word number 1 occupies about over 1 / 10. word count code ``` val allWordLineSplited: RDD[String] = // create