[GitHub] spark pull request: [SPARK-3012] Standardized Distance Functions b...

2014-10-21 Thread yu-iskw
Github user yu-iskw commented on the pull request: https://github.com/apache/spark/pull/1964#issuecomment-59885635 Because this patch is not fit for the Spark design concept, I close this PR without merging.

[GitHub] spark pull request: [SPARK-3012] Standardized Distance Functions b...

2014-10-21 Thread yu-iskw
Github user yu-iskw closed the pull request at: https://github.com/apache/spark/pull/1964 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is

[GitHub] spark pull request: [SPARK-3012] Standardized Distance Functions b...

2014-09-16 Thread yu-iskw
Github user yu-iskw commented on the pull request: https://github.com/apache/spark/pull/1964#issuecomment-55825405 Hi @mengxr, @erikerlandson, @rnowling Sorry to comment again. Could you review it? thanks --- If your project is set up for it, you can reply to this email and

[GitHub] spark pull request: [SPARK-3012] Standardized Distance Functions b...

2014-09-09 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1964#issuecomment-54952704 [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20038/consoleFull) for PR 1964 at commit

[GitHub] spark pull request: [SPARK-3012] Standardized Distance Functions b...

2014-09-09 Thread yu-iskw
Github user yu-iskw commented on the pull request: https://github.com/apache/spark/pull/1964#issuecomment-54953348 @mengxr, @erikerlandson, @rnowling Sorry for having been kept waiting. I redesigned the interface of distance functions. Here is the full file change.

[GitHub] spark pull request: [SPARK-3012] Standardized Distance Functions b...

2014-09-09 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1964#issuecomment-54960119 [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20038/consoleFull) for PR 1964 at commit

[GitHub] spark pull request: [SPARK-3012] Standardized Distance Functions b...

2014-09-08 Thread aarondav
Github user aarondav commented on the pull request: https://github.com/apache/spark/pull/1964#issuecomment-54802662 Jenkins, ok to test. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this

[GitHub] spark pull request: [SPARK-3012] Standardized Distance Functions b...

2014-09-08 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1964#issuecomment-54806883 [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/19976/consoleFull) for PR 1964 at commit

[GitHub] spark pull request: [SPARK-3012] Standardized Distance Functions b...

2014-09-08 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1964#issuecomment-54814790 [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/19976/consoleFull) for PR 1964 at commit

[GitHub] spark pull request: [SPARK-3012] Standardized Distance Functions b...

2014-09-05 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1964#issuecomment-54694468 Can one of the admins verify this patch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project

[GitHub] spark pull request: [SPARK-3012] Standardized Distance Functions b...

2014-09-04 Thread yu-iskw
Github user yu-iskw commented on the pull request: https://github.com/apache/spark/pull/1964#issuecomment-54564146 I'm sorry for delay to reply in replying for you. Because I didn't concern about Python API, I rethink of the design for distance now. Please give me a few days.

[GitHub] spark pull request: [SPARK-3012] Standardized Distance Functions b...

2014-09-02 Thread mengxr
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/1964#issuecomment-54254431 @yu-iskw @erikerlandson @dlwh I prefer simple types for parameters for model serialization and consistent APIs across languages. In a predictive model, we should store

[GitHub] spark pull request: [SPARK-3012] Standardized Distance Functions b...

2014-09-02 Thread mengxr
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/1964#issuecomment-54255195 @yu-iskw Thanks for contributing to Breeze! I linked this JIRA to SPARK-3219, which tries to generalize k-means to support more distance measures, and we will try to use

[GitHub] spark pull request: [SPARK-3012] Standardized Distance Functions b...

2014-08-29 Thread yu-iskw
Github user yu-iskw commented on the pull request: https://github.com/apache/spark/pull/1964#issuecomment-53845417 @mengxr, @dlwh, @erikerlandson, @rnowling, Thank you for your feedback. I agree with that idea which distance metrics/measures are implemented in Breeze.

[GitHub] spark pull request: [SPARK-3012] Standardized Distance Functions b...

2014-08-29 Thread erikerlandson
Github user erikerlandson commented on the pull request: https://github.com/apache/spark/pull/1964#issuecomment-53936014 @yu-iskw, your use case above with `setMeasure` is definitely how I envisioned it working, in terms of passing in metric functions as parameters. I'm

[GitHub] spark pull request: [SPARK-3012] Standardized Distance Functions b...

2014-08-27 Thread mengxr
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/1964#issuecomment-53547074 @yu-iskw Sorry for the delay in code review! What do you expect users to do with the distances? For example, users can pick different distance measures in

[GitHub] spark pull request: [SPARK-3012] Standardized Distance Functions b...

2014-08-27 Thread yu-iskw
Github user yu-iskw commented on the pull request: https://github.com/apache/spark/pull/1964#issuecomment-53567544 @mengxr Thank you for your comment. I expect users are able to pick different distance measure in clustering algorithm, such as k-means. And I agree with

[GitHub] spark pull request: [SPARK-3012] Standardized Distance Functions b...

2014-08-27 Thread rnowling
Github user rnowling commented on the pull request: https://github.com/apache/spark/pull/1964#issuecomment-53570185 @mengxr , @yu-iskw I think it is valuable to contribute distance metrics to Breeze, but not all of the metrics provided by @yu-iskw may be of interest to

[GitHub] spark pull request: [SPARK-3012] Standardized Distance Functions b...

2014-08-27 Thread erikerlandson
Github user erikerlandson commented on the pull request: https://github.com/apache/spark/pull/1964#issuecomment-53581400 It seems most elegant and scala-like to directly supply the distance function as the parameter, instead of just a 'type' that selects some internal function. If

[GitHub] spark pull request: [SPARK-3012] Standardized Distance Functions b...

2014-08-27 Thread dlwh
Github user dlwh commented on the pull request: https://github.com/apache/spark/pull/1964#issuecomment-53598635 We'd happily take a distance ufunc, or perhaps a collection of distance ufuncs along the lines of what's here. I'd want it to be more Breeze-like than what's in the

[GitHub] spark pull request: [SPARK-3012] Standardized Distance Functions b...

2014-08-26 Thread erikerlandson
Github user erikerlandson commented on the pull request: https://github.com/apache/spark/pull/1964#issuecomment-53445323 @yu-iskw, I'm in favor of adopting a standardized distance metric class. How best to proceed is a question of architecture and road map. I'm interested in

[GitHub] spark pull request: [SPARK-3012] Standardized Distance Functions b...

2014-08-26 Thread yu-iskw
Github user yu-iskw commented on the pull request: https://github.com/apache/spark/pull/1964#issuecomment-53525367 @mengxr, Will you please review the detail on this pull request. Thank you for your trouble in advance. --- If your project is set up for it, you can reply to this

[GitHub] spark pull request: [SPARK-3012] Standardized Distance Functions b...

2014-08-25 Thread erikerlandson
Github user erikerlandson commented on the pull request: https://github.com/apache/spark/pull/1964#issuecomment-53341467 You want to use abs(vector) instead of vector.map(abs(_)): https://gist.github.com/erikerlandson/488275019e4b54f5cdaf With the test harness above, I am

[GitHub] spark pull request: [SPARK-3012] Standardized Distance Functions b...

2014-08-25 Thread yu-iskw
Github user yu-iskw commented on the pull request: https://github.com/apache/spark/pull/1964#issuecomment-53347789 Hi @erikerlandson, Thank you for your comment. I use breeze.numerics.abs instead of Math.abs. I tried to check the performance of both again. That of

[GitHub] spark pull request: [SPARK-3012] Standardized Distance Functions b...

2014-08-25 Thread erikerlandson
Github user erikerlandson commented on the pull request: https://github.com/apache/spark/pull/1964#issuecomment-53348770 One thing I noticed was that this will not compile: val v = new DenseBreezeVector((1 to 10).map(j = 1.0)) val a = abs(v) // fails

[GitHub] spark pull request: [SPARK-3012] Standardized Distance Functions b...

2014-08-25 Thread yu-iskw
Github user yu-iskw commented on the pull request: https://github.com/apache/spark/pull/1964#issuecomment-53351219 Hi @erikerlandson, I appriciate your example code. I only use abs(v.toArray) in ChebyshevDistanceMetric. But i didn't change in

[GitHub] spark pull request: [SPARK-3012] Standardized Distance Functions b...

2014-08-25 Thread yu-iskw
Github user yu-iskw commented on the pull request: https://github.com/apache/spark/pull/1964#issuecomment-53378581 If the interface of distance function is valid, merging this issue, and then trying the performance improvement in another issue would be a good idea. Since the

[GitHub] spark pull request: [SPARK-3012] Standardized Distance Functions b...

2014-08-23 Thread yu-iskw
Github user yu-iskw commented on the pull request: https://github.com/apache/spark/pull/1964#issuecomment-53149119 BYW, I checked the performance of Math.abs() and breeze.numerics.abs. It seems that Math.abs() performs better than breeze.numerics.abs. A performs better than B.

[GitHub] spark pull request: [SPARK-3012] Standardized Distance Functions b...

2014-08-22 Thread erikerlandson
Github user erikerlandson commented on the pull request: https://github.com/apache/spark/pull/1964#issuecomment-53105970 Hi @yu-iskw, Universal element-wise functions (U-Funcs), including element-wise abs(v), are listed here:

[GitHub] spark pull request: [SPARK-3012] Standardized Distance Functions b...

2014-08-22 Thread erikerlandson
Github user erikerlandson commented on the pull request: https://github.com/apache/spark/pull/1964#issuecomment-53108224 the implicit conversion function doesn't have to throw an exception, it just needs to override the breeze vector overloading and inherit the default spark vector

[GitHub] spark pull request: [SPARK-3012] Standardized Distance Functions b...

2014-08-22 Thread yu-iskw
Github user yu-iskw commented on the pull request: https://github.com/apache/spark/pull/1964#issuecomment-53129462 Good Morning @erikerlandson, Thank you for telling me the documentation. But I can not compile, if I changed the code in ChebyshevDistanceMetric. I guess

[GitHub] spark pull request: [SPARK-3012] Standardized Distance Functions b...

2014-08-21 Thread rnowling
Github user rnowling commented on the pull request: https://github.com/apache/spark/pull/1964#issuecomment-52924889 @erikerlandson Breeze uses non-JVM data structures so it can use BLAS. The malloc and copying could be both expensive and lead to 2-3x higher memory usage. This

[GitHub] spark pull request: [SPARK-3012] Standardized Distance Functions b...

2014-08-21 Thread yu-iskw
Github user yu-iskw commented on the pull request: https://github.com/apache/spark/pull/1964#issuecomment-52943537 Hi @erikerlandson and @rnowling, Thank you for your advice. I modified my code. Would you please review it. - Improve the performance of calculating

[GitHub] spark pull request: [SPARK-3012] Standardized Distance Functions b...

2014-08-21 Thread yu-iskw
Github user yu-iskw commented on the pull request: https://github.com/apache/spark/pull/1964#issuecomment-52994477 Sorry for immediate modification. - CosineDistanceMeasure and TanimotoDistanceMeasure are reverted before 64f7389 -- Because both classes can not be applied

[GitHub] spark pull request: [SPARK-3012] Standardized Distance Functions b...

2014-08-21 Thread erikerlandson
Github user erikerlandson commented on the pull request: https://github.com/apache/spark/pull/1964#issuecomment-52996580 I think the design can be made somewhat less complex. I coded up an example here: https://gist.github.com/erikerlandson/f4b9b9a5c9469f2d9006 A couple

[GitHub] spark pull request: [SPARK-3012] Standardized Distance Functions b...

2014-08-21 Thread erikerlandson
Github user erikerlandson commented on the pull request: https://github.com/apache/spark/pull/1964#issuecomment-53003052 Given how small these classes are, it seems like it would be efficient to define them all in a single file --- If your project is set up for it, you can reply to

[GitHub] spark pull request: [SPARK-3012] Standardized Distance Functions b...

2014-08-21 Thread yu-iskw
Github user yu-iskw commented on the pull request: https://github.com/apache/spark/pull/1964#issuecomment-53006322 Hello @erikerlandson, Thank you for your comment. Its' in the morning in Japan :) I don't think a Weighted trait improves the design. I agree

[GitHub] spark pull request: [SPARK-3012] Standardized Distance Functions b...

2014-08-21 Thread erikerlandson
Github user erikerlandson commented on the pull request: https://github.com/apache/spark/pull/1964#issuecomment-53008478 @yu-iskw I took a fast look and that seems good to me. I will try to review it more tomorrow. --- If your project is set up for it, you can reply to this

[GitHub] spark pull request: [SPARK-3012] Standardized Distance Functions b...

2014-08-21 Thread erikerlandson
Github user erikerlandson commented on the pull request: https://github.com/apache/spark/pull/1964#issuecomment-53008699 one brief comment: with breeze vectors you can say `abs(v)` instead of `v.map(Math.abs(_))` import breeze.numerics.abs --- If your project is set up for

[GitHub] spark pull request: [SPARK-3012] Standardized Distance Functions b...

2014-08-21 Thread rnowling
Github user rnowling commented on the pull request: https://github.com/apache/spark/pull/1964#issuecomment-53010747 Overall, changes look great! Thank you for considering our feedback. I like that you were able to reduce the number of files. I think

[GitHub] spark pull request: [SPARK-3012] Standardized Distance Functions b...

2014-08-21 Thread yu-iskw
Github user yu-iskw commented on the pull request: https://github.com/apache/spark/pull/1964#issuecomment-53012573 one brief comment: with breeze vectors you can say abs(v) instead of v.map(Math.abs(_)) Sorry, I can't find abs API for a vector in Breeze API documentation.

[GitHub] spark pull request: [SPARK-3012] Standardized Distance Functions b...

2014-08-21 Thread yu-iskw
Github user yu-iskw commented on the pull request: https://github.com/apache/spark/pull/1964#issuecomment-53013311 @rnowling Thank you for your comment. - Remove SquaredEuclideanDistanceMeasure.scala - Aggregate tests to reduce the number of files --- If your

[GitHub] spark pull request: [SPARK-3012] Standardized Distance Functions b...

2014-08-20 Thread yu-iskw
Github user yu-iskw commented on the pull request: https://github.com/apache/spark/pull/1964#issuecomment-52756284 Jenkins, test this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: [SPARK-3012] Standardized Distance Functions b...

2014-08-20 Thread yu-iskw
Github user yu-iskw commented on the pull request: https://github.com/apache/spark/pull/1964#issuecomment-52756243 Hi @erikerlandson, Thank you for your comment. I modify the code according to your advice. Would you please check it. - DistanceMeasure trait

[GitHub] spark pull request: [SPARK-3012] Standardized Distance Functions b...

2014-08-20 Thread rnowling
Github user rnowling commented on the pull request: https://github.com/apache/spark/pull/1964#issuecomment-52810571 Great work! If a class implements the DistanceMetric trait, it should probably be renamed ---DistanceMetric. For example, CosineDistanceMeasure should be

[GitHub] spark pull request: [SPARK-3012] Standardized Distance Functions b...

2014-08-20 Thread erikerlandson
Github user erikerlandson commented on the pull request: https://github.com/apache/spark/pull/1964#issuecomment-52848633 I like where this RFE is going. One comment about metric versus measure - cosine distance is a subclass of DistanceMeasure, as it is *not* a metric.

[GitHub] spark pull request: [SPARK-3012] Standardized Distance Functions b...

2014-08-20 Thread erikerlandson
Github user erikerlandson commented on the pull request: https://github.com/apache/spark/pull/1964#issuecomment-52858495 I feel ambiguous about whether there needs to be a sub-trait for WeightedDistanceMeasure, but if we have that then we will also need a WeightedDistanceMetric as a

[GitHub] spark pull request: [SPARK-3012] Standardized Distance Functions b...

2014-08-18 Thread erikerlandson
Github user erikerlandson commented on the pull request: https://github.com/apache/spark/pull/1964#issuecomment-52516267 I'm wondering if it might be simpler and more idiomatic to just define distance measure directly as any subclass of Function2, like: trait DistanceMeasure

[GitHub] spark pull request: [SPARK-3012] Standardized Distance Functions b...

2014-08-18 Thread JoshRosen
Github user JoshRosen commented on the pull request: https://github.com/apache/spark/pull/1964#issuecomment-52527707 Jenkins, test this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not

[GitHub] spark pull request: [SPARK-3012] Standardized Distance Functions b...

2014-08-18 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1964#issuecomment-52528032 QA tests have started for PR 1964. This patch merges cleanly. brView progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/18761/consoleFull ---

[GitHub] spark pull request: [SPARK-3012] Standardized Distance Functions b...

2014-08-18 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1964#issuecomment-52537821 QA results for PR 1964:br- This patch FAILED unit tests.br- This patch merges cleanlybr- This patch adds the following public classes (experimental):brclass

[GitHub] spark pull request: [SPARK-3012] Standardized Distance Functions b...

2014-08-18 Thread erikerlandson
Github user erikerlandson commented on the pull request: https://github.com/apache/spark/pull/1964#issuecomment-52554072 It might be useful to distinguish true metrics from 'measures'. For example, cosine distance is not a true distance metric. In some algorithms, that difference

[GitHub] spark pull request: [SPARK-3012] Standardized Distance Functions b...

2014-08-15 Thread yu-iskw
GitHub user yu-iskw opened a pull request: https://github.com/apache/spark/pull/1964 [SPARK-3012] Standardized Distance Functions between two Vectors for MLlib https://issues.apache.org/jira/browse/SPARK-3012 I implemented some distance measures between two Vector

[GitHub] spark pull request: [SPARK-3012] Standardized Distance Functions b...

2014-08-15 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/1964#issuecomment-52282833 Can one of the admins verify this patch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your

[GitHub] spark pull request: [SPARK-3012] Standardized Distance Functions b...

2014-08-15 Thread yu-iskw
Github user yu-iskw commented on the pull request: https://github.com/apache/spark/pull/1964#issuecomment-52375350 Jenkins, test this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have