Ted, remember we talked about this last week? The problem was (I think it's fixed now) that when I was asking for 20 clusters, every mapper would give me 20 clusters (rather than k log n ~ 200) and the points clumped together resulting in one cluster with the vast majority of the points ~17K out the ~19K.
Now that I fixed that added more tests that seem to be confirming all StreamingKMeans implementations get about the same results (whether they're local or MapReduce) and the multiple restarts of BallKMeans, I'm expecting it to be a lot better. Actual data tests coming soon (please check that new cluster thread). :)