(By the way, you can use wordRDD.countByValue instead of the map and
reduceByKey. It won't make a difference to your issue but is more
compact.)
As you say, the problem is the very limited range of keys (word
lengths). I wonder if you can use sortBy instead of map and sortByKey,
and instead
For example, We consider the word count of the long text data (100GB order).
There is clearly a bias for the word , has been expected to be a long tail data
do word count. Probably word number 1 occupies about over 1 / 10.
word count code
```
val allWordLineSplited: RDD[String] = // create