Hello All , I am interested to use bisecting k-means algorithm implemented in spark. While using bisecting k-means I found that some of my clustering requests on large data-set failed with OOM issues.
As data-set size is expected to be large , so I wanted to use some pre-processing steps to reduce resource requirements. If found that Canopy Clustering helps in that. I could not anything equivalent to it in spark. Is something available? or is it planned in some future releases . Please let me know. Thank you