Github user AiHe commented on the pull request:
https://github.com/apache/spark/pull/6338#issuecomment-116855167
Thank you guys!
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
Github user AiHe commented on a diff in the pull request:
https://github.com/apache/spark/pull/6338#discussion_r33499870
--- Diff: python/pyspark/rdd.py ---
@@ -121,15 +121,30 @@ def _parse_memory(s):
def _load_from_socket(port, serializer):
-sock
Github user AiHe commented on the pull request:
https://github.com/apache/spark/pull/6338#issuecomment-116774008
When I'm working on python networking problem, I usually follow that
example as a python standard usage. Then I just apply the underlying idea here.
I'm wond
Github user AiHe commented on a diff in the pull request:
https://github.com/apache/spark/pull/6338#discussion_r33490643
--- Diff: python/pyspark/rdd.py ---
@@ -121,15 +121,30 @@ def _parse_memory(s):
def _load_from_socket(port, serializer):
-sock
Github user AiHe commented on the pull request:
https://github.com/apache/spark/pull/6338#issuecomment-114666888
It's confusing that all test have been passed but it ends up with a failure.
---
If your project is set up for it, you can reply to this email and have your
reply a
Github user AiHe commented on the pull request:
https://github.com/apache/spark/pull/6338#issuecomment-114355741
@JoshRosen Hi, can you kindly review this PR again?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your
Github user AiHe commented on the pull request:
https://github.com/apache/spark/pull/6338#issuecomment-107095064
@JoshRosen Could you let me know how to figure out the reason for this
failure?
---
If your project is set up for it, you can reply to this email and have your
reply
GitHub user AiHe opened a pull request:
https://github.com/apache/spark/pull/6338
[SPARK-7810] [pyspark] solve python rdd socket connection problem
Method "_load_from_socket" in rdd.py cannot load data from jvm socket when
ipv6 is used. The current method only works well
Github user AiHe commented on the pull request:
https://github.com/apache/spark/pull/5988#issuecomment-102307055
@jkbradley Thanks for pointing out. Yes, the case seems not that helpful.
Removed that out already.
---
If your project is set up for it, you can reply to this email and
Github user AiHe commented on the pull request:
https://github.com/apache/spark/pull/5988#issuecomment-101082530
@jkbradley Please test it. Thank you.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does
Github user AiHe commented on the pull request:
https://github.com/apache/spark/pull/5988#issuecomment-100387813
Yes, you are right. That's why two tree are not identical in the test.
Thank you.
---
If your project is set up for it, you can reply to this email and have your
Github user AiHe commented on the pull request:
https://github.com/apache/spark/pull/5988#issuecomment-100324432
Seems like SamplingUtils.reservoirSampleAndCount has a default value for
the seed. I don't think that's the reason for the failure.
---
If your project is set
GitHub user AiHe opened a pull request:
https://github.com/apache/spark/pull/5988
[MLLIB][tree] Add reservoir sample in RandomForest
reservoir feature sample by using existing api
You can merge this pull request into a Git repository by running:
$ git pull https://github.com
GitHub user AiHe reopened a pull request:
https://github.com/apache/spark/pull/5810
[MLLIB][tree] Verify size of input rdd > 0 when building meta data
Require non empty input rdd such that we can take the first labeledpoint
and get the feature size
You can merge this pull requ
Github user AiHe closed the pull request at:
https://github.com/apache/spark/pull/5810
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is
Github user AiHe commented on a diff in the pull request:
https://github.com/apache/spark/pull/5810#discussion_r29630338
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/tree/impl/DecisionTreeMetadata.scala
---
@@ -107,8 +107,11 @@ private[tree] object DecisionTreeMetadata
Github user AiHe commented on a diff in the pull request:
https://github.com/apache/spark/pull/5810#discussion_r29541953
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/tree/impl/DecisionTreeMetadata.scala
---
@@ -107,8 +107,11 @@ private[tree] object DecisionTreeMetadata
Github user AiHe commented on a diff in the pull request:
https://github.com/apache/spark/pull/5810#discussion_r29475404
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/tree/impl/DecisionTreeMetadata.scala
---
@@ -107,8 +107,11 @@ private[tree] object DecisionTreeMetadata
Github user AiHe commented on a diff in the pull request:
https://github.com/apache/spark/pull/5810#discussion_r29469236
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/tree/impl/DecisionTreeMetadata.scala
---
@@ -107,8 +107,11 @@ private[tree] object DecisionTreeMetadata
Github user AiHe commented on the pull request:
https://github.com/apache/spark/pull/5752#issuecomment-97895236
@srowen Sorry for this mixed RP. After reading the wiki, I guess I commit
"Before proceeding, contributors should evaluate if the proposed change is
likely to be rel
GitHub user AiHe opened a pull request:
https://github.com/apache/spark/pull/5810
[MLLIB][tree] Verify size of input rdd > 0 when building meta data
Require non empty input rdd such that we can take the first labeledpoint
and get the feature size
You can merge this pull requ
Github user AiHe closed the pull request at:
https://github.com/apache/spark/pull/5686
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is
Github user AiHe closed the pull request at:
https://github.com/apache/spark/pull/5752
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is
GitHub user AiHe opened a pull request:
https://github.com/apache/spark/pull/5752
[MLLIB] Add reservoir sample for feature in RandomForest and fix other
issues in tree
1. Reservoir sample for feature
2. Verify input rdd size > 0 when building DecisionTreeMetadata
3.
Github user AiHe commented on the pull request:
https://github.com/apache/spark/pull/5687#issuecomment-9631
@srowen Sure. Pass the style checking and will do that for PRs in the
feature.
---
If your project is set up for it, you can reply to this email and have your
reply appear
Github user AiHe commented on the pull request:
https://github.com/apache/spark/pull/5687#issuecomment-96234324
Good to go.
@jkbradley Thank you.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does
Github user AiHe commented on the pull request:
https://github.com/apache/spark/pull/5687#issuecomment-96133857
Get your point and change all toString methods in MLLIB.
There are a large number of uses of 'old' way in the statement like
logInfo, which makes it hard to
Github user AiHe commented on the pull request:
https://github.com/apache/spark/pull/5687#issuecomment-96102493
Okay, sounds better to use a modern style.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project
Github user AiHe commented on the pull request:
https://github.com/apache/spark/pull/5687#issuecomment-96058819
If you're referring to the modification on Predict.scala, I just follow the
style of previous code.
---
If your project is set up for it, you can reply to this emai
Github user AiHe commented on the pull request:
https://github.com/apache/spark/pull/5687#issuecomment-96055725
Modify Predict.scala and decouple the dependency of Node.scala on
Predict.scala.
---
If your project is set up for it, you can reply to this email and have your
reply
GitHub user AiHe opened a pull request:
https://github.com/apache/spark/pull/5687
[Minor][MLLIB] Fix a formatting bug in toString method in Node
1. predict(predict.toString) has already output prefix âpredictâ thus
itâs duplicated to print ", predict = " again
GitHub user AiHe opened a pull request:
https://github.com/apache/spark/pull/5686
[PYSPARK] Add percentile method in rdd as numpy
1. Add percentile method in rdd
2. By default, get the kth percentile element from bottom(ascending
order)
3. By specifying key, it can
Github user AiHe closed the pull request at:
https://github.com/apache/spark/pull/5587
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is
Github user AiHe commented on the pull request:
https://github.com/apache/spark/pull/5621#issuecomment-94979257
@mengxr Okay. Will do that in the feature.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project
GitHub user AiHe opened a pull request:
https://github.com/apache/spark/pull/5621
[Minor][MLLIB] Fix a minor formatting bug in toString methods in Node.scala
add missing comma and space
You can merge this pull request into a Git repository by running:
$ git pull https
Github user AiHe commented on the pull request:
https://github.com/apache/spark/pull/5587#issuecomment-94533106
Thanks for replying. I guess it might be better to remain it unchanged and
point it out at docs. I will be trying to do the modification at the doc.
---
If your project is
Github user AiHe commented on the pull request:
https://github.com/apache/spark/pull/5587#issuecomment-94509911
@JoshRosen
I just follow the example of NAStatCounter in the book of "Advanced
Analysis with Spark". NAStatCounter is supposed to get stats of a dat
GitHub user AiHe opened a pull request:
https://github.com/apache/spark/pull/5587
[PYSPARK] Fix a typo in "fold" function in rdd.py
This will make the âfoldâ function consistent with the "fold" in
rdd.scala and other "aggregate" functions where â
38 matches
Mail list logo