[ https://issues.apache.org/jira/browse/SPARK-2336?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14583886#comment-14583886 ]
Debasish Das edited comment on SPARK-2336 at 6/12/15 6:51 PM: -------------------------------------------------------------- Very cool idea Sen. Did you also look into FLANN for randomized KDTree and KMeansTree. We have a PR for rowSimilarities https://github.com/apache/spark/pull/6213 for brute force KNN generation which we will use to compare the QoR of your PR as soon as you open up a stable version. was (Author: debasish83): Very cool idea Sen. Did you also look into FLANN for randomized KDTree and KMeansTree. We have a PR for rowSimilarities which we will use to compare the QoR of your PR as soon as you open up a stable version. > Approximate k-NN Models for MLLib > --------------------------------- > > Key: SPARK-2336 > URL: https://issues.apache.org/jira/browse/SPARK-2336 > Project: Spark > Issue Type: New Feature > Components: MLlib > Reporter: Brian Gawalt > Priority: Minor > Labels: clustering, features > > After tackling the general k-Nearest Neighbor model as per > https://issues.apache.org/jira/browse/SPARK-2335 , there's an opportunity to > also offer approximate k-Nearest Neighbor. A promising approach would involve > building a kd-tree variant within from each partition, a la > http://www.autonlab.org/autonweb/14714.html?branch=1&language=2 > This could offer a simple non-linear ML model that can label new data with > much lower latency than the plain-vanilla kNN versions. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org