[ 
https://issues.apache.org/jira/browse/SPARK-2336?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14583886#comment-14583886
 ] 

Debasish Das edited comment on SPARK-2336 at 6/12/15 6:51 PM:
--------------------------------------------------------------

Very cool idea Sen. Did you also look into FLANN for randomized KDTree and 
KMeansTree. We have a PR for rowSimilarities 
https://github.com/apache/spark/pull/6213 for brute force KNN generation which 
we will use to compare the QoR of your PR as soon as you open up a stable 
version.



was (Author: debasish83):
Very cool idea Sen. Did you also look into FLANN for randomized KDTree and 
KMeansTree. We have a PR for rowSimilarities which we will use to compare the 
QoR of your PR as soon as you open up a stable version.


> Approximate k-NN Models for MLLib
> ---------------------------------
>
>                 Key: SPARK-2336
>                 URL: https://issues.apache.org/jira/browse/SPARK-2336
>             Project: Spark
>          Issue Type: New Feature
>          Components: MLlib
>            Reporter: Brian Gawalt
>            Priority: Minor
>              Labels: clustering, features
>
> After tackling the general k-Nearest Neighbor model as per 
> https://issues.apache.org/jira/browse/SPARK-2335 , there's an opportunity to 
> also offer approximate k-Nearest Neighbor. A promising approach would involve 
> building a kd-tree variant within from each partition, a la
> http://www.autonlab.org/autonweb/14714.html?branch=1&language=2
> This could offer a simple non-linear ML model that can label new data with 
> much lower latency than the plain-vanilla kNN versions.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to