[GitHub] spark pull request: [SPARK-2207][SPARK-3272]Add minimum informatio...

2014-09-09 Thread chouqin
GitHub user chouqin opened a pull request: https://github.com/apache/spark/pull/2332 [SPARK-2207][SPARK-3272]Add minimum information gain and minimum instances per node as training parameters for decision tree. These two parameters can act as early stop rules to do pre-pruning

[GitHub] spark pull request: [SPARK-3272][MLLib]Calculate prediction for no...

2014-09-09 Thread chouqin
Github user chouqin commented on the pull request: https://github.com/apache/spark/pull/2180#issuecomment-54944410 Close this PR and move to #2332 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not

[GitHub] spark pull request: [SPARK-3272][MLLib]Calculate prediction for no...

2014-09-09 Thread chouqin
Github user chouqin closed the pull request at: https://github.com/apache/spark/pull/2180 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is

[GitHub] spark pull request: [SPARK-2207][SPARK-3272]Add minimum informatio...

2014-09-09 Thread chouqin
Github user chouqin commented on a diff in the pull request: https://github.com/apache/spark/pull/2332#discussion_r17334675 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/tree/configuration/Strategy.scala --- @@ -49,6 +49,13 @@ import

[GitHub] spark pull request: [SPARK-2207][SPARK-3272]Add minimum informatio...

2014-09-09 Thread chouqin
Github user chouqin commented on a diff in the pull request: https://github.com/apache/spark/pull/2332#discussion_r17334865 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/tree/configuration/Strategy.scala --- @@ -61,6 +68,8 @@ class Strategy ( val maxBins: Int

[GitHub] spark pull request: [SPARK-2207][SPARK-3272]Add minimum informatio...

2014-09-09 Thread chouqin
Github user chouqin commented on a diff in the pull request: https://github.com/apache/spark/pull/2332#discussion_r17334970 --- Diff: mllib/src/test/scala/org/apache/spark/mllib/tree/DecisionTreeSuite.scala --- @@ -674,6 +676,45 @@ class DecisionTreeSuite extends FunSuite with

[GitHub] spark pull request: [SPARK-2207][SPARK-3272]Add minimum informatio...

2014-09-09 Thread chouqin
Github user chouqin commented on the pull request: https://github.com/apache/spark/pull/2332#issuecomment-55053558 @jkbradley thanks for your comments, I will change my code accordingly. As for the Predict class, I still think it is needed, for the following reasons: 1

[GitHub] spark pull request: [SPARK-2207][SPARK-3272][MLLib]Add minimum inf...

2014-09-09 Thread chouqin
Github user chouqin commented on the pull request: https://github.com/apache/spark/pull/2332#issuecomment-55056433 @jkbradley, yes, by "array of bins", I meaned that array of counts of each class, sorry for the missleading. --- If your project is set up for it, you can rep

[GitHub] spark pull request: [SPARK-2207][SPARK-3272][MLLib]Add minimum inf...

2014-09-09 Thread chouqin
Github user chouqin commented on a diff in the pull request: https://github.com/apache/spark/pull/2332#discussion_r17338800 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/tree/model/Split.scala --- @@ -66,3 +68,11 @@ private[tree] class DummyHighSplit(feature: Int

[GitHub] spark pull request: [SPARK-2207][SPARK-3272][MLLib]Add minimum inf...

2014-09-09 Thread chouqin
Github user chouqin commented on the pull request: https://github.com/apache/spark/pull/2332#issuecomment-55063959 @jkbradley I have removed `noSplit` object and add `private[tree]` to `Predict`. --- If your project is set up for it, you can reply to this email and have your reply

[GitHub] spark pull request: [SPARK-2207][SPARK-3272][MLLib]Add minimum inf...

2014-09-10 Thread chouqin
Github user chouqin commented on a diff in the pull request: https://github.com/apache/spark/pull/2332#discussion_r17355688 --- Diff: mllib/src/test/scala/org/apache/spark/mllib/tree/DecisionTreeSuite.scala --- @@ -674,6 +676,45 @@ class DecisionTreeSuite extends FunSuite with

[GitHub] spark pull request: [SPARK-2207][SPARK-3272][MLLib]Add minimum inf...

2014-09-10 Thread chouqin
Github user chouqin commented on the pull request: https://github.com/apache/spark/pull/2332#issuecomment-55122746 @jkbradley thanks for your replies. as I replied in your comments, I have changed `minInstancePerNode` to 2 in test cases, and add one more test case to test that when a

[GitHub] spark pull request: [SPARK-3160] [mllib] DecisionTree: eliminate p...

2014-09-10 Thread chouqin
Github user chouqin commented on the pull request: https://github.com/apache/spark/pull/2341#issuecomment-55216875 @jkbradley Thanks for your nice work, I have read your code and just have one question: Can we allocate a root node before the loop in `train()`, and allocate

[GitHub] spark pull request: [SPARK-3160] [mllib] DecisionTree: eliminate p...

2014-09-10 Thread chouqin
Github user chouqin commented on the pull request: https://github.com/apache/spark/pull/2341#issuecomment-55223947 Can we change the fields from `val` to `var`? `leftNode` and `rightNode` are `var`s, I wonder if we can change other fields too? --- If your project is set up for it

[GitHub] spark pull request: [SPARK-3160] [mllib] DecisionTree: eliminate p...

2014-09-11 Thread chouqin
Github user chouqin commented on the pull request: https://github.com/apache/spark/pull/2341#issuecomment-55346692 Sounds reasonable to me, go ahead with random forests first please. --- If your project is set up for it, you can reply to this email and have your reply appear on

[GitHub] spark pull request: [SPARK-1545] [mllib] Add Random Forests

2014-09-18 Thread chouqin
Github user chouqin commented on a diff in the pull request: https://github.com/apache/spark/pull/2435#discussion_r17764131 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/tree/impl/DTStatsAggregator.scala --- @@ -189,6 +160,230 @@ private[tree] class DTStatsAggregator

[GitHub] spark pull request: [SPARK-1545] [mllib] Add Random Forests

2014-09-18 Thread chouqin
Github user chouqin commented on a diff in the pull request: https://github.com/apache/spark/pull/2435#discussion_r17764228 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/tree/impl/DTStatsAggregator.scala --- @@ -189,6 +160,230 @@ private[tree] class DTStatsAggregator

[GitHub] spark pull request: [SPARK-1545] [mllib] Add Random Forests

2014-09-18 Thread chouqin
Github user chouqin commented on a diff in the pull request: https://github.com/apache/spark/pull/2435#discussion_r17765048 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/tree/impl/DTStatsAggregator.scala --- @@ -189,6 +160,230 @@ private[tree] class DTStatsAggregator

[GitHub] spark pull request: [SPARK-1545] [mllib] Add Random Forests

2014-09-18 Thread chouqin
Github user chouqin commented on a diff in the pull request: https://github.com/apache/spark/pull/2435#discussion_r17765183 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/tree/impl/DTStatsAggregator.scala --- @@ -189,6 +160,230 @@ private[tree] class DTStatsAggregator

[GitHub] spark pull request: [SPARK-1545] [mllib] Add Random Forests

2014-09-18 Thread chouqin
Github user chouqin commented on a diff in the pull request: https://github.com/apache/spark/pull/2435#discussion_r17765427 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/tree/impl/DTStatsAggregator.scala --- @@ -189,6 +160,230 @@ private[tree] class DTStatsAggregator

[GitHub] spark pull request: [SPARK-1545] [mllib] Add Random Forests

2014-09-21 Thread chouqin
Github user chouqin commented on the pull request: https://github.com/apache/spark/pull/2435#issuecomment-56319639 @jkbradley thanks, it looks good to me except comments in the code. --- If your project is set up for it, you can reply to this email and have your reply appear on

[GitHub] spark pull request: [SPARK-3366][MLLIB]Compute best splits distrib...

2014-09-30 Thread chouqin
GitHub user chouqin opened a pull request: https://github.com/apache/spark/pull/2595 [SPARK-3366][MLLIB]Compute best splits distributively in decision tree Currently, all best splits are computed on the driver, which makes the driver a bottleneck for both communication and

[GitHub] spark pull request: [SPARK-3366][MLLIB]Compute best splits distrib...

2014-09-30 Thread chouqin
Github user chouqin commented on a diff in the pull request: https://github.com/apache/spark/pull/2595#discussion_r18258483 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/tree/DecisionTree.scala --- @@ -518,30 +516,58 @@ object DecisionTree extends Serializable with Logging

[GitHub] spark pull request: [SPARK-3366][MLLIB]Compute best splits distrib...

2014-09-30 Thread chouqin
Github user chouqin commented on a diff in the pull request: https://github.com/apache/spark/pull/2595#discussion_r18258685 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/tree/DecisionTree.scala --- @@ -518,30 +516,58 @@ object DecisionTree extends Serializable with Logging

[GitHub] spark pull request: [SPARK-3366][MLLIB]Compute best splits distrib...

2014-09-30 Thread chouqin
Github user chouqin commented on the pull request: https://github.com/apache/spark/pull/2595#issuecomment-57411737 Hi, @mengxr @jkbradley thanks for your comments, I have changed my code accordingly. As for performance, I didn't test it on a large dataset, could you gi

[GitHub] spark pull request: [SPARK-3366][MLLIB]Compute best splits distrib...

2014-09-30 Thread chouqin
Github user chouqin commented on a diff in the pull request: https://github.com/apache/spark/pull/2595#discussion_r18260719 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/tree/model/InformationGainStats.scala --- @@ -38,6 +38,17 @@ class InformationGainStats

[GitHub] spark pull request: [SPARK-3366][MLLIB]Compute best splits distrib...

2014-09-30 Thread chouqin
Github user chouqin commented on a diff in the pull request: https://github.com/apache/spark/pull/2595#discussion_r18260795 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/tree/DecisionTree.scala --- @@ -437,6 +433,7 @@ object DecisionTree extends Serializable with Logging

[GitHub] spark pull request: [SPARK-3366][MLLIB]Compute best splits distrib...

2014-09-30 Thread chouqin
Github user chouqin commented on a diff in the pull request: https://github.com/apache/spark/pull/2595#discussion_r18261127 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/tree/impl/NodeStatsAggregator.scala --- @@ -0,0 +1,180 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request: [SPARK-3366][MLLIB]Compute best splits distrib...

2014-09-30 Thread chouqin
Github user chouqin commented on a diff in the pull request: https://github.com/apache/spark/pull/2595#discussion_r18261274 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/tree/impl/NodeStatsAggregator.scala --- @@ -0,0 +1,180 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request: [SPARK-3366][MLLIB]Compute best splits distrib...

2014-09-30 Thread chouqin
Github user chouqin commented on the pull request: https://github.com/apache/spark/pull/2595#issuecomment-57421044 @jkbradley I have adjust code based on your comments. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If

[GitHub] spark pull request: [SPARK-3366][MLLIB]Compute best splits distrib...

2014-09-30 Thread chouqin
Github user chouqin commented on the pull request: https://github.com/apache/spark/pull/2595#issuecomment-57421372 There are some unit case failed in pyspark, How do I run unit tests for pyspark in my own computer? --- If your project is set up for it, you can reply to this email

[GitHub] spark pull request: [SPARK-3366][MLLIB]Compute best splits distrib...

2014-09-30 Thread chouqin
Github user chouqin commented on the pull request: https://github.com/apache/spark/pull/2595#issuecomment-57423431 I don't know why unit test for pyspark has failed, I have tested in Scala using the same test data, and it passes. For example here is code I wrote: ```

[GitHub] spark pull request: [SPARK-3366][MLLIB]Compute best splits distrib...

2014-10-01 Thread chouqin
Github user chouqin commented on the pull request: https://github.com/apache/spark/pull/2595#issuecomment-57497897 @mengxr I also found it is early terminated in some cases, but it only occured in pyspark, use the same data and strategy in scala will get correct result. Here is the

[GitHub] spark pull request: [SPARK-3366][MLLIB]Compute best splits distrib...

2014-10-01 Thread chouqin
Github user chouqin commented on a diff in the pull request: https://github.com/apache/spark/pull/2595#discussion_r18317071 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/tree/impl/DTStatsAggregator.scala --- @@ -159,161 +166,15 @@ private[tree] abstract class

[GitHub] spark pull request: [SPARK-3366][MLLIB]Compute best splits distrib...

2014-10-01 Thread chouqin
Github user chouqin commented on the pull request: https://github.com/apache/spark/pull/2595#issuecomment-57565298 @mengxr @jkbradley thanks for your comments, it can pass unit test now, do you have any more suggestions? --- If your project is set up for it, you can reply to this

[GitHub] spark pull request: [SPARK-3366][MLLIB]Compute best splits distrib...

2014-10-01 Thread chouqin
Github user chouqin commented on the pull request: https://github.com/apache/spark/pull/2595#issuecomment-57571761 `NetworkReceiverSuite` in spark-streaming has failed, it is not related to this PR. --- If your project is set up for it, you can reply to this email and have your

[GitHub] spark pull request: [SPARK-3366][MLLIB]Compute best splits distrib...

2014-10-01 Thread chouqin
Github user chouqin commented on the pull request: https://github.com/apache/spark/pull/2595#issuecomment-57571778 Jenkins, retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: [SPARK-3366][MLLIB]Compute best splits distrib...

2014-10-02 Thread chouqin
Github user chouqin commented on the pull request: https://github.com/apache/spark/pull/2595#issuecomment-57733146 @jkbradley thanks for your comments, I have changed my code, could you please have a look? --- If your project is set up for it, you can reply to this email and have

[GitHub] spark pull request: [SPARK-3366][MLLIB]Compute best splits distrib...

2014-10-02 Thread chouqin
Github user chouqin commented on the pull request: https://github.com/apache/spark/pull/2595#issuecomment-57737757 @manishamde I think `maxMemoryInMB` is now setting for executors. Executor's memory is by default bigger than driver's memory(1G vs 512M), but it also needs

[GitHub] spark pull request: [SPARK-3158][MLLIB]Avoid 1 extra aggregation f...

2014-10-08 Thread chouqin
GitHub user chouqin opened a pull request: https://github.com/apache/spark/pull/2708 [SPARK-3158][MLLIB]Avoid 1 extra aggregation for DecisionTree training Currently, the implementation does one unnecessary aggregation step. The aggregation step for level L (to choose splits) gives

[GitHub] spark pull request: [SPARK-3158][MLLIB]Avoid 1 extra aggregation f...

2014-10-08 Thread chouqin
Github user chouqin commented on the pull request: https://github.com/apache/spark/pull/2708#issuecomment-58452580 @jkbradley thanks for your comments, I have adjust the code accordingly. I look forward to your timing test and hope that it will get some performance gain. --- If

[GitHub] spark pull request: [SPARK-3366][MLLIB]Compute best splits distrib...

2014-10-09 Thread chouqin
Github user chouqin commented on the pull request: https://github.com/apache/spark/pull/2595#issuecomment-58603045 @jkbradley I agree with you, we should support both options. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as

[GitHub] spark pull request: [SPARK-3366][MLLIB]Compute best splits distrib...

2014-10-12 Thread chouqin
Github user chouqin commented on the pull request: https://github.com/apache/spark/pull/2595#issuecomment-58836687 @manishamde @jkbradley I have created a JIRA for supporting both options([SPARK-3920](https://issues.apache.org/jira/browse/SPARK-3920)). I am dealing with [SPARK-3207

[GitHub] spark pull request: [SPARK-3207][MLLIB]Choose splits for continuou...

2014-10-12 Thread chouqin
GitHub user chouqin opened a pull request: https://github.com/apache/spark/pull/2780 [SPARK-3207][MLLIB]Choose splits for continuous features in DecisionTree more adaptively DecisionTree splits on continuous features by choosing an array of values from a subsample of the data

[GitHub] spark pull request: [SPARK-3207][MLLIB]Choose splits for continuou...

2014-10-13 Thread chouqin
Github user chouqin commented on the pull request: https://github.com/apache/spark/pull/2780#issuecomment-58865080 Jekins, retest this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: [SPARK-3207][MLLIB]Choose splits for continuou...

2014-10-13 Thread chouqin
Github user chouqin commented on the pull request: https://github.com/apache/spark/pull/2780#issuecomment-58865951 @jkbradley, RandomForestSuite fails because original splits are better fit for the training data(for example, 899.5 is a split threshold, which is close to 900.) I think

[GitHub] spark pull request: [SPARK-3934] [SPARK-3918] [mllib] Bug fixes fo...

2014-10-13 Thread chouqin
Github user chouqin commented on the pull request: https://github.com/apache/spark/pull/2785#issuecomment-58980429 @jkbradley Thanks for the PR! It looks good to me. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If

[GitHub] spark pull request: [SPARK-3207][MLLIB]Choose splits for continuou...

2014-10-14 Thread chouqin
Github user chouqin commented on the pull request: https://github.com/apache/spark/pull/2780#issuecomment-59007084 @manishamde thanks for your comments. I will adjust my code after #2785 gets merged. As for performance, yes, this is slower than the current implementation

[GitHub] spark pull request: [SPARK-3207][MLLIB]Choose splits for continuou...

2014-10-18 Thread chouqin
Github user chouqin commented on a diff in the pull request: https://github.com/apache/spark/pull/2780#discussion_r19057305 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/tree/DecisionTree.scala --- @@ -1011,4 +1014,99 @@ object DecisionTree extends Serializable with

[GitHub] spark pull request: [SPARK-3207][MLLIB]Choose splits for continuou...

2014-10-18 Thread chouqin
Github user chouqin commented on a diff in the pull request: https://github.com/apache/spark/pull/2780#discussion_r19057399 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/tree/DecisionTree.scala --- @@ -1011,4 +1014,99 @@ object DecisionTree extends Serializable with

[GitHub] spark pull request: [SPARK-3207][MLLIB]Choose splits for continuou...

2014-10-18 Thread chouqin
Github user chouqin commented on the pull request: https://github.com/apache/spark/pull/2780#issuecomment-59637054 @manishamde @jkbradley thanks for your comments, I have changed my code now. Do you have any more suggestions? --- If your project is set up for it, you can reply to

[GitHub] spark pull request: [SPARK-3207][MLLIB]Choose splits for continuou...

2014-10-18 Thread chouqin
Github user chouqin commented on the pull request: https://github.com/apache/spark/pull/2780#issuecomment-59637234 Jekins, test this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: [SPARK-3207][MLLIB]Choose splits for continuou...

2014-10-19 Thread chouqin
Github user chouqin commented on the pull request: https://github.com/apache/spark/pull/2780#issuecomment-59676055 @jkbradley I updated unit test to check splits returned by `findSplitsForContinuousFeature` are distinct. I have run the unit test for it and it passed

[GitHub] spark pull request: [SPARK-3161][MLLIB] Adding a node Id caching m...

2014-10-21 Thread chouqin
Github user chouqin commented on a diff in the pull request: https://github.com/apache/spark/pull/2868#discussion_r19132675 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/tree/impl/NodeIdCache.scala --- @@ -0,0 +1,171 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request: [SPARK-3161][MLLIB] Adding a node Id caching m...

2014-10-21 Thread chouqin
Github user chouqin commented on a diff in the pull request: https://github.com/apache/spark/pull/2868#discussion_r19132973 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/tree/impl/NodeIdCache.scala --- @@ -0,0 +1,171 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request: [SPARK-3161][MLLIB] Adding a node Id caching m...

2014-10-21 Thread chouqin
Github user chouqin commented on the pull request: https://github.com/apache/spark/pull/2868#issuecomment-59893259 @codedeft Thanks for your nice work. I have added some comments inline. Here are some high level comments: 1. Have you tested the performance after this change

[GitHub] spark pull request: [SPARK-3022] [mllib] FindBinsForLevel in decis...

2014-08-14 Thread chouqin
GitHub user chouqin opened a pull request: https://github.com/apache/spark/pull/1941 [SPARK-3022] [mllib] FindBinsForLevel in decision tree should call findBin only once for each feature `findbinsForLevel` is applied to every `LabeledPoint` to find bins for all nodes at a given

[GitHub] spark pull request: [SPARK-3022] [SPARK-3041] [mllib] Call findBin...

2014-08-14 Thread chouqin
Github user chouqin commented on a diff in the pull request: https://github.com/apache/spark/pull/1950#discussion_r16281333 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/tree/DecisionTree.scala --- @@ -689,37 +631,26 @@ object DecisionTree extends Serializable with Logging

[GitHub] spark pull request: [SPARK-3022] [SPARK-3041] [mllib] Call findBin...

2014-08-14 Thread chouqin
Github user chouqin commented on a diff in the pull request: https://github.com/apache/spark/pull/1950#discussion_r16281396 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/tree/DecisionTree.scala --- @@ -53,16 +55,28 @@ class DecisionTree (private val strategy: Strategy

[GitHub] spark pull request: [SPARK-3022] [SPARK-3041] [mllib] Call findBin...

2014-08-14 Thread chouqin
Github user chouqin commented on a diff in the pull request: https://github.com/apache/spark/pull/1950#discussion_r16281414 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/tree/DecisionTree.scala --- @@ -728,8 +659,10 @@ object DecisionTree extends Serializable with Logging

[GitHub] spark pull request: [SPARK-3022] [mllib] FindBinsForLevel in decis...

2014-08-14 Thread chouqin
Github user chouqin commented on the pull request: https://github.com/apache/spark/pull/1941#issuecomment-52277965 @mengxr @jkbradley never mind, I will help you review @1950 :) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as

[GitHub] spark pull request: [SPARK-3022] [mllib] FindBinsForLevel in decis...

2014-08-15 Thread chouqin
Github user chouqin commented on the pull request: https://github.com/apache/spark/pull/1941#issuecomment-52279909 I close this PR now and focus on #1950 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project

[GitHub] spark pull request: [SPARK-3022] [mllib] FindBinsForLevel in decis...

2014-08-15 Thread chouqin
Github user chouqin closed the pull request at: https://github.com/apache/spark/pull/1941 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is

[GitHub] spark pull request: Dt predict

2014-08-28 Thread chouqin
GitHub user chouqin opened a pull request: https://github.com/apache/spark/pull/2180 Dt predict In current implementation, prediction for a node is calculated along with calculation of information gain stats for each possible splits. The value to predict for a specific node is

[GitHub] spark pull request: [SPARK-3272][MLLib]Calculate prediction for no...

2014-08-28 Thread chouqin
Github user chouqin commented on a diff in the pull request: https://github.com/apache/spark/pull/2180#discussion_r16832992 --- Diff: mllib/src/test/scala/org/apache/spark/mllib/tree/DecisionTreeSuite.scala --- @@ -34,9 +34,9 @@ import

[GitHub] spark pull request: [SPARK-3272][MLLib]Calculate prediction for no...

2014-08-28 Thread chouqin
Github user chouqin commented on the pull request: https://github.com/apache/spark/pull/2180#issuecomment-53706696 @ScrapCodes thanks for you comments, I have changed indentation to meet the spark style guide just now. --- If your project is set up for it, you can reply to this

[GitHub] spark pull request: [SPARK-3291][SQL]TestcaseName in createQueryTe...

2014-08-28 Thread chouqin
GitHub user chouqin opened a pull request: https://github.com/apache/spark/pull/2191 [SPARK-3291][SQL]TestcaseName in createQueryTest should not contain ":" ":" is not allowed to appear in a file name of Windows system. If file name contains ":", this

[GitHub] spark pull request: [SPARK-3291][SQL]TestcaseName in createQueryTe...

2014-08-29 Thread chouqin
Github user chouqin commented on the pull request: https://github.com/apache/spark/pull/2191#issuecomment-53852376 Sorry, I should to change the file name `case sensitivity: Hive table-0-5d14d21a239daa42b086cc895215009a` to `case sensitivity when query Hive table-0

[GitHub] spark pull request: [SPARK-7181][CORE]fix inifite loop in External...

2015-04-27 Thread chouqin
GitHub user chouqin opened a pull request: https://github.com/apache/spark/pull/5737 [SPARK-7181][CORE]fix inifite loop in Externalsorter's mergeWithAggregation see [SPARK-7181](https://issues.apache.org/jira/browse/SPARK-7181). You can merge this pull request into a Git repos

[GitHub] spark pull request: [SPARK-7181][CORE]fix inifite loop in External...

2015-04-28 Thread chouqin
Github user chouqin commented on a diff in the pull request: https://github.com/apache/spark/pull/5737#discussion_r29304781 --- Diff: core/src/test/scala/org/apache/spark/util/collection/ExternalSorterSuite.scala --- @@ -506,7 +506,10 @@ class ExternalSorterSuite extends FunSuite