zhengruifeng opened a new pull request #27784: [SPARK-31032][ML] GMM compute summary and update distributions in one pass URL: https://github.com/apache/spark/pull/27784 ### What changes were proposed in this pull request? 1, compute summary and update distributions in one pass; 2, remove logic related to check `shouldDistributeGaussians` ### Why are the changes needed? In current impl, GMM need to trigger two jobs at one iteration: 1, one to compute summary; 2, if `shouldDistributeGaussians = ((k - 1.0) / k) * numFeatures > 25.0`, trigger another to update distributions; `shouldDistributeGaussians` is almost true in practice, since numFeatures is likely to be greater than 25. We can use only one job to impl above computation. ### Does this PR introduce any user-facing change? No ### How was this patch tested? existing testsuites
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org