Github user asfgit closed the pull request at:
https://github.com/apache/spark/pull/216
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabl
Github user mengxr commented on the pull request:
https://github.com/apache/spark/pull/216#issuecomment-81983258
@LIDIAgroup Sorry that I don't have enough bandwidth to review this PR.
Since there are unresolved performance issues, do you mind closing this PR for
now? I recommend regi
Github user leizongxiong commented on a diff in the pull request:
https://github.com/apache/spark/pull/216#discussion_r19710343
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/discretization/EntropyMinimizationDiscretizer.scala
---
@@ -0,0 +1,276 @@
+/*
+ * Licensed
Github user leizongxiong commented on the pull request:
https://github.com/apache/spark/pull/216#issuecomment-61396457
does the branch can be published with spark 1.2.0 version @mengxr
---
If your project is set up for it, you can reply to this email and have your
reply appear on Git
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/216#issuecomment-54694750
Can one of the admins verify this patch?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project do
Github user avulanov commented on the pull request:
https://github.com/apache/spark/pull/216#issuecomment-51784545
@mengxr I've tested the code on few examples after making it compatible
with the current version of `LabeledPoint`. It seems to work and produce
results similar to what W
Github user avulanov commented on a diff in the pull request:
https://github.com/apache/spark/pull/216#discussion_r16053912
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/discretization/EntropyMinimizationDiscretizerModel.scala
---
@@ -0,0 +1,82 @@
+/*
+ * Licensed
Github user avulanov commented on a diff in the pull request:
https://github.com/apache/spark/pull/216#discussion_r16054017
--- Diff:
mllib/src/test/scala/org/apache/spark/mllib/discretization/EntropyMinimizationDiscretizerSuite.scala
---
@@ -0,0 +1,71 @@
+/*
+ * Licensed
Github user avulanov commented on a diff in the pull request:
https://github.com/apache/spark/pull/216#discussion_r16054027
--- Diff:
mllib/src/test/scala/org/apache/spark/mllib/discretization/EntropyMinimizationDiscretizerSuite.scala
---
@@ -0,0 +1,71 @@
+/*
+ * Licensed
Github user avulanov commented on a diff in the pull request:
https://github.com/apache/spark/pull/216#discussion_r16053983
--- Diff:
mllib/src/test/scala/org/apache/spark/mllib/discretization/EntropyMinimizationDiscretizerSuite.scala
---
@@ -0,0 +1,71 @@
+/*
+ * Licensed
Github user avulanov commented on a diff in the pull request:
https://github.com/apache/spark/pull/216#discussion_r16054011
--- Diff:
mllib/src/test/scala/org/apache/spark/mllib/discretization/EntropyMinimizationDiscretizerSuite.scala
---
@@ -0,0 +1,71 @@
+/*
+ * Licensed
Github user avulanov commented on a diff in the pull request:
https://github.com/apache/spark/pull/216#discussion_r16053868
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/discretization/EntropyMinimizationDiscretizerModel.scala
---
@@ -0,0 +1,82 @@
+/*
+ * Licensed
Github user avulanov commented on a diff in the pull request:
https://github.com/apache/spark/pull/216#discussion_r16053704
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/discretization/EntropyMinimizationDiscretizer.scala
---
@@ -0,0 +1,276 @@
+/*
+ * Licensed to t
Github user LIDIAgroup commented on the pull request:
https://github.com/apache/spark/pull/216#issuecomment-39346452
I've followed your suggestion of using `runJob`. Now `evalThresholds` looks
way more simple and the result is the same. Thanks for your help.
---
If your project is se
Github user LIDIAgroup commented on a diff in the pull request:
https://github.com/apache/spark/pull/216#discussion_r11196809
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/discretization/ArrayAccumulator.scala
---
@@ -0,0 +1,32 @@
+/*
+ * Licensed to the Apache Sof
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/216#discussion_r11168845
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/discretization/EntropyMinimizationDiscretizer.scala
---
@@ -0,0 +1,317 @@
+/*
+ * Licensed to the
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/216#discussion_r11168529
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/discretization/EntropyMinimizationDiscretizer.scala
---
@@ -0,0 +1,317 @@
+/*
+ * Licensed to the
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/216#discussion_r11168017
--- Diff:
mllib/src/test/scala/org/apache/spark/mllib/discretization/EntropyMinimizationDiscretizerSuite.scala
---
@@ -0,0 +1,71 @@
+/*
+ * Licensed to
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/216#discussion_r11167994
--- Diff:
mllib/src/test/scala/org/apache/spark/mllib/discretization/EntropyMinimizationDiscretizerSuite.scala
---
@@ -0,0 +1,71 @@
+/*
+ * Licensed to
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/216#discussion_r11167980
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/discretization/ArrayAccumulator.scala
---
@@ -0,0 +1,32 @@
+/*
+ * Licensed to the Apache Softwar
Github user mengxr commented on the pull request:
https://github.com/apache/spark/pull/216#issuecomment-38988486
@LIDIAgroup Thanks for the update! The new code didn't pass the style
check. Please run `sbt/sbt scalastyle` to see the error messages! I saw the
following from Travis log
Github user LIDIAgroup commented on the pull request:
https://github.com/apache/spark/pull/216#issuecomment-38944614
I've made some big changes:
1. Big refactor in the architecture of the discretizer. Now I think it's
more coherent with other packages in MLlib.
2. I've changed
Github user LIDIAgroup commented on a diff in the pull request:
https://github.com/apache/spark/pull/216#discussion_r11065687
--- Diff: mllib/src/main/scala/org/apache/spark/mllib/util/InfoTheory.scala
---
@@ -0,0 +1,49 @@
+/*
+ * Licensed to the Apache Software Foundation
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/216#issuecomment-38881855
Can one of the admins verify this patch?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your proj
Github user LIDIAgroup commented on a diff in the pull request:
https://github.com/apache/spark/pull/216#discussion_r11019805
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/discretization/EntropyMinimizationDiscretizer.scala
---
@@ -0,0 +1,402 @@
+/*
+ * Licensed to
Github user LIDIAgroup commented on a diff in the pull request:
https://github.com/apache/spark/pull/216#discussion_r11019751
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/discretization/EntropyMinimizationDiscretizer.scala
---
@@ -0,0 +1,402 @@
+/*
+ * Licensed to
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/216#discussion_r11000756
--- Diff: mllib/src/main/scala/org/apache/spark/mllib/util/InfoTheory.scala
---
@@ -0,0 +1,49 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF
Github user mengxr commented on the pull request:
https://github.com/apache/spark/pull/216#issuecomment-38727471
@LIDIAgroup For the second item, it is very common to have different
training and descretizing data. For example, we have a labeled dataset
containing a subset of members,
Github user LIDIAgroup commented on the pull request:
https://github.com/apache/spark/pull/216#issuecomment-38718686
I'll make some changes that, imho, will improve the discretizer in some
aspects:
1. I'll change the accumulator from a `Map` to an `Array`. This implies
collecting
Github user LIDIAgroup commented on a diff in the pull request:
https://github.com/apache/spark/pull/216#discussion_r10969423
--- Diff: mllib/src/main/scala/org/apache/spark/mllib/util/InfoTheory.scala
---
@@ -0,0 +1,49 @@
+/*
+ * Licensed to the Apache Software Foundation
Github user mengxr commented on the pull request:
https://github.com/apache/spark/pull/216#issuecomment-38622854
@LIDIAgroup , I made one pass through the code. My major concern is the
complexity of the algorithm. Could you help answer the following questions?
0. What is the t
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/216#discussion_r10954204
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/discretization/EntropyMinimizationDiscretizer.scala
---
@@ -0,0 +1,402 @@
+/*
+ * Licensed to the
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/216#discussion_r10953908
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/discretization/EntropyMinimizationDiscretizer.scala
---
@@ -0,0 +1,402 @@
+/*
+ * Licensed to the
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/216#discussion_r10951916
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/discretization/EntropyMinimizationDiscretizer.scala
---
@@ -0,0 +1,402 @@
+/*
+ * Licensed to the
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/216#discussion_r10951721
--- Diff:
mllib/src/test/scala/org/apache/spark/mllib/discretization/EntropyMinimizationDiscretizerSuite.scala
---
@@ -0,0 +1,71 @@
+/*
+ * Licensed to
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/216#discussion_r10951631
--- Diff: mllib/src/main/scala/org/apache/spark/mllib/util/InfoTheory.scala
---
@@ -0,0 +1,49 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/216#discussion_r10951390
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/discretization/Utils.scala ---
@@ -0,0 +1,54 @@
+/*
+ * Licensed to the Apache Software Foundation
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/216#discussion_r10948283
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/discretization/Utils.scala ---
@@ -0,0 +1,54 @@
+/*
+ * Licensed to the Apache Software Foundation
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/216#discussion_r10948233
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/discretization/Utils.scala ---
@@ -0,0 +1,54 @@
+/*
+ * Licensed to the Apache Software Foundation
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/216#discussion_r10948196
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/discretization/Utils.scala ---
@@ -0,0 +1,54 @@
+/*
+ * Licensed to the Apache Software Foundation
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/216#discussion_r10947993
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/discretization/MapAccumulator.scala
---
@@ -0,0 +1,40 @@
+/*
+ * Licensed to the Apache Software
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/216#issuecomment-38607525
Merged build finished.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/216#issuecomment-38607527
One or more automated tests failed
Refer to this link for build results:
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/13438/
---
If your p
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/216#discussion_r10947648
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/discretization/MapAccumulator.scala
---
@@ -0,0 +1,40 @@
+/*
+ * Licensed to the Apache Software
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/216#discussion_r10947573
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/discretization/EntropyMinimizationDiscretizer.scala
---
@@ -0,0 +1,402 @@
+/*
+ * Licensed to the
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/216#discussion_r10947586
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/discretization/MapAccumulator.scala
---
@@ -0,0 +1,40 @@
+/*
+ * Licensed to the Apache Software
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/216#discussion_r10947517
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/discretization/EntropyMinimizationDiscretizer.scala
---
@@ -0,0 +1,402 @@
+/*
+ * Licensed to the
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/216#discussion_r10947486
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/discretization/EntropyMinimizationDiscretizer.scala
---
@@ -0,0 +1,402 @@
+/*
+ * Licensed to the
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/216#discussion_r10947120
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/discretization/EntropyMinimizationDiscretizer.scala
---
@@ -0,0 +1,402 @@
+/*
+ * Licensed to the
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/216#discussion_r10947062
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/discretization/DiscretizerModel.scala
---
@@ -0,0 +1,50 @@
+/*
+ * Licensed to the Apache Softwar
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/216#discussion_r10946987
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/discretization/DiscretizerModel.scala
---
@@ -0,0 +1,50 @@
+/*
+ * Licensed to the Apache Softwar
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/216#discussion_r10946507
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/discretization/EntropyMinimizationDiscretizer.scala
---
@@ -0,0 +1,402 @@
+/*
+ * Licensed to the
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/216#discussion_r10942149
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/discretization/EntropyMinimizationDiscretizer.scala
---
@@ -0,0 +1,402 @@
+/*
+ * Licensed to the
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/216#discussion_r10941658
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/discretization/EntropyMinimizationDiscretizer.scala
---
@@ -0,0 +1,402 @@
+/*
+ * Licensed to the
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/216#discussion_r10941200
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/discretization/EntropyMinimizationDiscretizer.scala
---
@@ -0,0 +1,402 @@
+/*
+ * Licensed to the
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/216#discussion_r10941192
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/discretization/DiscretizerModel.scala
---
@@ -0,0 +1,50 @@
+/*
+ * Licensed to the Apache Softwar
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/216#issuecomment-38592747
Merged build triggered.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not hav
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/216#issuecomment-38592748
Merged build started.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have t
Github user mengxr commented on the pull request:
https://github.com/apache/spark/pull/216#issuecomment-38591501
Jenkins, retest this please.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have t
Github user LIDIAgroup commented on the pull request:
https://github.com/apache/spark/pull/216#issuecomment-38559061
We've tried to follow all suggestions made by @mengxr. If you feel that we
should make any other change, please don't hesitate to tell us, we're are
willing to discuss
Github user mengxr commented on the pull request:
https://github.com/apache/spark/pull/216#issuecomment-3846
@LIDIAgroup Thanks for updating
https://github.com/apache/incubator-spark/pull/541
There are some style problems that cause test build failure. You can use
`dev/ru
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/216#discussion_r10898397
--- Diff:
mllib/src/test/scala/org/apache/spark/mllib/discretization/EMDDiscretizerSuite.scala
---
@@ -0,0 +1,60 @@
+package org.apache.spark.mllib.discret
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/216#discussion_r10898309
--- Diff:
mllib/src/test/scala/org/apache/spark/mllib/discretization/EMDDiscretizerSuite.scala
---
@@ -0,0 +1,60 @@
+package org.apache.spark.mllib.discret
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/216#discussion_r10898285
--- Diff:
mllib/src/test/scala/org/apache/spark/mllib/discretization/EMDDiscretizerSuite.scala
---
@@ -0,0 +1,60 @@
+package org.apache.spark.mllib.discret
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/216#discussion_r10898273
--- Diff:
mllib/src/test/scala/org/apache/spark/mllib/discretization/EMDDiscretizerSuite.scala
---
@@ -0,0 +1,60 @@
+package org.apache.spark.mllib.discret
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/216#discussion_r10898245
--- Diff:
mllib/src/test/scala/org/apache/spark/mllib/discretization/EMDDiscretizerSuite.scala
---
@@ -0,0 +1,60 @@
+package org.apache.spark.mllib.discret
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/216#discussion_r10898196
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/discretization/MapAccumulator.scala
---
@@ -0,0 +1,53 @@
+/*
+* Licensed to the Apache Software F
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/216#discussion_r10897344
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/discretization/DiscretizerModel.scala
---
@@ -0,0 +1,47 @@
+/*
+* Licensed to the Apache Software
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/216#discussion_r10897825
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/discretization/MapAccumulator.scala
---
@@ -0,0 +1,53 @@
+/*
+* Licensed to the Apache Software F
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/216#discussion_r10897487
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/discretization/EMDDiscretizer.scala
---
@@ -0,0 +1,402 @@
+/*
+* Licensed to the Apache Software
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/216#issuecomment-38472415
One or more automated tests failed
Refer to this link for build results:
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/13400/
---
If your p
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/216#issuecomment-38472412
Merged build finished.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/216#issuecomment-38472285
Merged build triggered.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not hav
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/216#issuecomment-38472286
Merged build started.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have t
Github user mengxr commented on the pull request:
https://github.com/apache/spark/pull/216#issuecomment-38472118
Jenkins, test this please.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have thi
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/216#issuecomment-38471741
Can one of the admins verify this patch?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your proj
GitHub user LIDIAgroup opened a pull request:
https://github.com/apache/spark/pull/216
[SPARK-1303] [MLLIB] Added discretization capability to MLlib.
https://spark-project.atlassian.net/browse/SPARK-1303
You can merge this pull request into a Git repository by running:
$ git p
77 matches
Mail list logo