[GitHub] spark pull request: [SPARK-7810] [pyspark] solve python rdd socket...

2015-06-29 Thread AiHe
Github user AiHe commented on the pull request: https://github.com/apache/spark/pull/6338#issuecomment-116855167 Thank you guys! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark pull request: [SPARK-7810] [pyspark] solve python rdd socket...

2015-06-29 Thread AiHe
Github user AiHe commented on a diff in the pull request: https://github.com/apache/spark/pull/6338#discussion_r33499870 --- Diff: python/pyspark/rdd.py --- @@ -121,15 +121,30 @@ def _parse_memory(s): def _load_from_socket(port, serializer): -sock

[GitHub] spark pull request: [SPARK-7810] [pyspark] solve python rdd socket...

2015-06-29 Thread AiHe
Github user AiHe commented on the pull request: https://github.com/apache/spark/pull/6338#issuecomment-116774008 When I'm working on python networking problem, I usually follow that example as a python standard usage. Then I just apply the underlying idea here. I'm wond

[GitHub] spark pull request: [SPARK-7810] [pyspark] solve python rdd socket...

2015-06-29 Thread AiHe
Github user AiHe commented on a diff in the pull request: https://github.com/apache/spark/pull/6338#discussion_r33490643 --- Diff: python/pyspark/rdd.py --- @@ -121,15 +121,30 @@ def _parse_memory(s): def _load_from_socket(port, serializer): -sock

[GitHub] spark pull request: [SPARK-7810] [pyspark] solve python rdd socket...

2015-06-23 Thread AiHe
Github user AiHe commented on the pull request: https://github.com/apache/spark/pull/6338#issuecomment-114666888 It's confusing that all test have been passed but it ends up with a failure. --- If your project is set up for it, you can reply to this email and have your reply a

[GitHub] spark pull request: [SPARK-7810] [pyspark] solve python rdd socket...

2015-06-22 Thread AiHe
Github user AiHe commented on the pull request: https://github.com/apache/spark/pull/6338#issuecomment-114355741 @JoshRosen Hi, can you kindly review this PR again? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your

[GitHub] spark pull request: [SPARK-7810] [pyspark] solve python rdd socket...

2015-05-30 Thread AiHe
Github user AiHe commented on the pull request: https://github.com/apache/spark/pull/6338#issuecomment-107095064 @JoshRosen Could you let me know how to figure out the reason for this failure? --- If your project is set up for it, you can reply to this email and have your reply

[GitHub] spark pull request: [SPARK-7810] [pyspark] solve python rdd socket...

2015-05-21 Thread AiHe
GitHub user AiHe opened a pull request: https://github.com/apache/spark/pull/6338 [SPARK-7810] [pyspark] solve python rdd socket connection problem Method "_load_from_socket" in rdd.py cannot load data from jvm socket when ipv6 is used. The current method only works well

[GitHub] spark pull request: [SPARK-7473][MLLIB] Add reservoir sample in Ra...

2015-05-15 Thread AiHe
Github user AiHe commented on the pull request: https://github.com/apache/spark/pull/5988#issuecomment-102307055 @jkbradley Thanks for pointing out. Yes, the case seems not that helpful. Removed that out already. --- If your project is set up for it, you can reply to this email and

[GitHub] spark pull request: [SPARK-7473][MLLIB] Add reservoir sample in Ra...

2015-05-11 Thread AiHe
Github user AiHe commented on the pull request: https://github.com/apache/spark/pull/5988#issuecomment-101082530 @jkbradley Please test it. Thank you. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does

[GitHub] spark pull request: [MLLIB][tree] Add reservoir sample in RandomFo...

2015-05-08 Thread AiHe
Github user AiHe commented on the pull request: https://github.com/apache/spark/pull/5988#issuecomment-100387813 Yes, you are right. That's why two tree are not identical in the test. Thank you. --- If your project is set up for it, you can reply to this email and have your

[GitHub] spark pull request: [MLLIB][tree] Add reservoir sample in RandomFo...

2015-05-08 Thread AiHe
Github user AiHe commented on the pull request: https://github.com/apache/spark/pull/5988#issuecomment-100324432 Seems like SamplingUtils.reservoirSampleAndCount has a default value for the seed. I don't think that's the reason for the failure. --- If your project is set

[GitHub] spark pull request: [MLLIB][tree] Add reservoir sample in RandomFo...

2015-05-07 Thread AiHe
GitHub user AiHe opened a pull request: https://github.com/apache/spark/pull/5988 [MLLIB][tree] Add reservoir sample in RandomForest reservoir feature sample by using existing api You can merge this pull request into a Git repository by running: $ git pull https://github.com

[GitHub] spark pull request: [MLLIB][tree] Verify size of input rdd > 0 whe...

2015-05-04 Thread AiHe
GitHub user AiHe reopened a pull request: https://github.com/apache/spark/pull/5810 [MLLIB][tree] Verify size of input rdd > 0 when building meta data Require non empty input rdd such that we can take the first labeledpoint and get the feature size You can merge this pull requ

[GitHub] spark pull request: [MLLIB][tree] Verify size of input rdd > 0 whe...

2015-05-04 Thread AiHe
Github user AiHe closed the pull request at: https://github.com/apache/spark/pull/5810 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is

[GitHub] spark pull request: [MLLIB][tree] Verify size of input rdd > 0 whe...

2015-05-04 Thread AiHe
Github user AiHe commented on a diff in the pull request: https://github.com/apache/spark/pull/5810#discussion_r29630338 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/tree/impl/DecisionTreeMetadata.scala --- @@ -107,8 +107,11 @@ private[tree] object DecisionTreeMetadata

[GitHub] spark pull request: [MLLIB][tree] Verify size of input rdd > 0 whe...

2015-05-01 Thread AiHe
Github user AiHe commented on a diff in the pull request: https://github.com/apache/spark/pull/5810#discussion_r29541953 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/tree/impl/DecisionTreeMetadata.scala --- @@ -107,8 +107,11 @@ private[tree] object DecisionTreeMetadata

[GitHub] spark pull request: [MLLIB][tree] Verify size of input rdd > 0 whe...

2015-04-30 Thread AiHe
Github user AiHe commented on a diff in the pull request: https://github.com/apache/spark/pull/5810#discussion_r29475404 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/tree/impl/DecisionTreeMetadata.scala --- @@ -107,8 +107,11 @@ private[tree] object DecisionTreeMetadata

[GitHub] spark pull request: [MLLIB][tree] Verify size of input rdd > 0 whe...

2015-04-30 Thread AiHe
Github user AiHe commented on a diff in the pull request: https://github.com/apache/spark/pull/5810#discussion_r29469236 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/tree/impl/DecisionTreeMetadata.scala --- @@ -107,8 +107,11 @@ private[tree] object DecisionTreeMetadata

[GitHub] spark pull request: [MLLIB] Add reservoir sample for feature in Ra...

2015-04-30 Thread AiHe
Github user AiHe commented on the pull request: https://github.com/apache/spark/pull/5752#issuecomment-97895236 @srowen Sorry for this mixed RP. After reading the wiki, I guess I commit "Before proceeding, contributors should evaluate if the proposed change is likely to be rel

[GitHub] spark pull request: [MLLIB][tree] Verify size of input rdd > 0 whe...

2015-04-30 Thread AiHe
GitHub user AiHe opened a pull request: https://github.com/apache/spark/pull/5810 [MLLIB][tree] Verify size of input rdd > 0 when building meta data Require non empty input rdd such that we can take the first labeledpoint and get the feature size You can merge this pull requ

[GitHub] spark pull request: [PYSPARK] Add percentile method in rdd as nump...

2015-04-30 Thread AiHe
Github user AiHe closed the pull request at: https://github.com/apache/spark/pull/5686 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is

[GitHub] spark pull request: [MLLIB] Add reservoir sample for feature in Ra...

2015-04-28 Thread AiHe
Github user AiHe closed the pull request at: https://github.com/apache/spark/pull/5752 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is

[GitHub] spark pull request: [MLLIB] Add reservoir sample for feature in Ra...

2015-04-28 Thread AiHe
GitHub user AiHe opened a pull request: https://github.com/apache/spark/pull/5752 [MLLIB] Add reservoir sample for feature in RandomForest and fix other issues in tree 1. Reservoir sample for feature 2. Verify input rdd size > 0 when building DecisionTreeMetadata 3.

[GitHub] spark pull request: [Minor][MLLIB] Refactor toString method in MLL...

2015-04-25 Thread AiHe
Github user AiHe commented on the pull request: https://github.com/apache/spark/pull/5687#issuecomment-9631 @srowen Sure. Pass the style checking and will do that for PRs in the feature. --- If your project is set up for it, you can reply to this email and have your reply appear

[GitHub] spark pull request: [Minor][MLLIB] Refactor toString method in MLL...

2015-04-25 Thread AiHe
Github user AiHe commented on the pull request: https://github.com/apache/spark/pull/5687#issuecomment-96234324 Good to go. @jkbradley Thank you. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does

[GitHub] spark pull request: [Minor][MLLIB] Refactor toString method in MLL...

2015-04-24 Thread AiHe
Github user AiHe commented on the pull request: https://github.com/apache/spark/pull/5687#issuecomment-96133857 Get your point and change all toString methods in MLLIB. There are a large number of uses of 'old' way in the statement like logInfo, which makes it hard to

[GitHub] spark pull request: [Minor][MLLIB] Fix a formatting bug in toStrin...

2015-04-24 Thread AiHe
Github user AiHe commented on the pull request: https://github.com/apache/spark/pull/5687#issuecomment-96102493 Okay, sounds better to use a modern style. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project

[GitHub] spark pull request: [Minor][MLLIB] Fix a formatting bug in toStrin...

2015-04-24 Thread AiHe
Github user AiHe commented on the pull request: https://github.com/apache/spark/pull/5687#issuecomment-96058819 If you're referring to the modification on Predict.scala, I just follow the style of previous code. --- If your project is set up for it, you can reply to this emai

[GitHub] spark pull request: [Minor][MLLIB] Fix a formatting bug in toStrin...

2015-04-24 Thread AiHe
Github user AiHe commented on the pull request: https://github.com/apache/spark/pull/5687#issuecomment-96055725 Modify Predict.scala and decouple the dependency of Node.scala on Predict.scala. --- If your project is set up for it, you can reply to this email and have your reply

[GitHub] spark pull request: [Minor][MLLIB] Fix a formatting bug in toStrin...

2015-04-24 Thread AiHe
GitHub user AiHe opened a pull request: https://github.com/apache/spark/pull/5687 [Minor][MLLIB] Fix a formatting bug in toString method in Node 1. predict(predict.toString) has already output prefix “predict” thus it’s duplicated to print ", predict = " again

[GitHub] spark pull request: [PYSPARK] Add percentile method in rdd as nump...

2015-04-24 Thread AiHe
GitHub user AiHe opened a pull request: https://github.com/apache/spark/pull/5686 [PYSPARK] Add percentile method in rdd as numpy 1. Add percentile method in rdd 2. By default, get the kth percentile element from bottom(ascending order) 3. By specifying key, it can

[GitHub] spark pull request: [PYSPARK] Fix a typo in "fold" function in rdd...

2015-04-21 Thread AiHe
Github user AiHe closed the pull request at: https://github.com/apache/spark/pull/5587 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is

[GitHub] spark pull request: [Minor][MLLIB] Fix a minor formatting bug in t...

2015-04-21 Thread AiHe
Github user AiHe commented on the pull request: https://github.com/apache/spark/pull/5621#issuecomment-94979257 @mengxr Okay. Will do that in the feature. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project

[GitHub] spark pull request: [Minor][MLLIB] Fix a minor formatting bug in t...

2015-04-21 Thread AiHe
GitHub user AiHe opened a pull request: https://github.com/apache/spark/pull/5621 [Minor][MLLIB] Fix a minor formatting bug in toString methods in Node.scala add missing comma and space You can merge this pull request into a Git repository by running: $ git pull https

[GitHub] spark pull request: [PYSPARK] Fix a typo in "fold" function in rdd...

2015-04-20 Thread AiHe
Github user AiHe commented on the pull request: https://github.com/apache/spark/pull/5587#issuecomment-94533106 Thanks for replying. I guess it might be better to remain it unchanged and point it out at docs. I will be trying to do the modification at the doc. --- If your project is

[GitHub] spark pull request: [PYSPARK] Fix a typo in "fold" function in rdd...

2015-04-20 Thread AiHe
Github user AiHe commented on the pull request: https://github.com/apache/spark/pull/5587#issuecomment-94509911 @JoshRosen I just follow the example of NAStatCounter in the book of "Advanced Analysis with Spark". NAStatCounter is supposed to get stats of a dat

[GitHub] spark pull request: [PYSPARK] Fix a typo in "fold" function in rdd...

2015-04-19 Thread AiHe
GitHub user AiHe opened a pull request: https://github.com/apache/spark/pull/5587 [PYSPARK] Fix a typo in "fold" function in rdd.py This will make the “fold” function consistent with the "fold" in rdd.scala and other "aggregate" functions where â€