Github user mengxr commented on the pull request:
https://github.com/apache/spark/pull/2378#issuecomment-56242298
Merged. Thanks a lot!
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this fe
Github user asfgit closed the pull request at:
https://github.com/apache/spark/pull/2378
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enab
Github user mengxr commented on the pull request:
https://github.com/apache/spark/pull/2378#issuecomment-56241679
@davies LGTM except few linear algebra operators and caching. But those are
orthogonal to this PR. I'm merging this and we will update the linear algebra
ops later.
---
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/2378#issuecomment-56216817
[QA tests have
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20576/consoleFull)
for PR 2378 at commit
[`dffbba2`](https://github.com/a
Github user davies commented on the pull request:
https://github.com/apache/spark/pull/2378#issuecomment-56211052
@mengxr In this PR, I just tried to avoid other changes except
serialization, we could change the cache behavior or compression later.
It's will be good to have so
Github user davies commented on the pull request:
https://github.com/apache/spark/pull/2378#issuecomment-56210084
@mengxr PickleSerializer do not compress data, there is CompressSerializer
can do it using gzip(level 1). Compression can help for small range of double
or repeated values
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/2378#issuecomment-56207099
[QA tests have
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20576/consoleFull)
for PR 2378 at commit
[`dffbba2`](https://github.com/ap
Github user mengxr commented on the pull request:
https://github.com/apache/spark/pull/2378#issuecomment-56147622
@davies Does `PickleSerializer` compress data? If not, maybe we should
cache the deserialized RDD instead of the one from `_.reserialize`. They have
the same storage. I un
Github user mengxr commented on the pull request:
https://github.com/apache/spark/pull/2378#issuecomment-56136476
test this please
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/2378#issuecomment-56122608
[QA tests have
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20560/consoleFull)
for PR 2378 at commit
[`810f97f`](https://github.com/a
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/2378#issuecomment-56117852
[QA tests have
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20560/consoleFull)
for PR 2378 at commit
[`810f97f`](https://github.com/ap
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/2378#issuecomment-56116566
[QA tests have
finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/132/consoleFull)
for PR 2378 at commit
[`032cd62`](https://github.com/
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/2378#issuecomment-56114946
[QA tests have
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20554/consoleFull)
for PR 2378 at commit
[`032cd62`](https://github.com/a
Github user jkbradley commented on a diff in the pull request:
https://github.com/apache/spark/pull/2378#discussion_r17760498
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/api/python/PythonMLLibAPI.scala ---
@@ -476,259 +436,167 @@ class PythonMLLibAPI extends Serializable
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/2378#issuecomment-56112010
[QA tests have
started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/132/consoleFull)
for PR 2378 at commit
[`032cd62`](https://github.com/a
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/2378#issuecomment-56110037
[QA tests have
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20554/consoleFull)
for PR 2378 at commit
[`032cd62`](https://github.com/ap
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/2378#issuecomment-56110091
[QA tests have
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20551/consoleFull)
for PR 2378 at commit
[`bd738ab`](https://github.com/a
Github user davies commented on the pull request:
https://github.com/apache/spark/pull/2378#issuecomment-56109944
@jkbradley I should have addressed all your comments, or leave comments if
I have not figure out how to do now, thanks for reviewing this huge PR.
---
If your project is
Github user davies commented on a diff in the pull request:
https://github.com/apache/spark/pull/2378#discussion_r17757949
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/api/python/PythonMLLibAPI.scala ---
@@ -476,259 +436,167 @@ class PythonMLLibAPI extends Serializable {
Github user davies commented on a diff in the pull request:
https://github.com/apache/spark/pull/2378#discussion_r17757207
--- Diff: python/pyspark/mllib/tests.py ---
@@ -198,41 +212,36 @@ def test_serialize(self):
lil[1, 0] = 1
lil[3, 0] = 2
s
Github user jkbradley commented on a diff in the pull request:
https://github.com/apache/spark/pull/2378#discussion_r17756431
--- Diff: python/pyspark/mllib/tests.py ---
@@ -198,41 +212,36 @@ def test_serialize(self):
lil[1, 0] = 1
lil[3, 0] = 2
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/2378#issuecomment-56104238
[QA tests have
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20551/consoleFull)
for PR 2378 at commit
[`bd738ab`](https://github.com/ap
Github user davies commented on a diff in the pull request:
https://github.com/apache/spark/pull/2378#discussion_r17752597
--- Diff: python/pyspark/mllib/linalg.py ---
@@ -23,14 +23,148 @@
SciPy is available in their environment.
"""
-import numpy
-from numpy
Github user davies commented on a diff in the pull request:
https://github.com/apache/spark/pull/2378#discussion_r17752588
--- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/Matrices.scala
---
@@ -64,6 +64,12 @@ class DenseMatrix(val numRows: Int, val numCols: Int,
val v
Github user davies commented on a diff in the pull request:
https://github.com/apache/spark/pull/2378#discussion_r17752465
--- Diff: python/pyspark/mllib/linalg.py ---
@@ -23,14 +23,148 @@
SciPy is available in their environment.
"""
-import numpy
-from numpy
Github user davies commented on a diff in the pull request:
https://github.com/apache/spark/pull/2378#discussion_r17752055
--- Diff: python/pyspark/mllib/linalg.py ---
@@ -257,10 +410,34 @@ def stringify(vector):
>>> Vectors.stringify(Vectors.dense([0.0, 1.0]))
Github user davies commented on a diff in the pull request:
https://github.com/apache/spark/pull/2378#discussion_r17751963
--- Diff: python/pyspark/mllib/tests.py ---
@@ -198,41 +212,36 @@ def test_serialize(self):
lil[1, 0] = 1
lil[3, 0] = 2
s
Github user jkbradley commented on the pull request:
https://github.com/apache/spark/pull/2378#issuecomment-55987147
@davies This looks like a great PR! I donât see major issues, though +1
to the remarks about checking for performance regressions. Pending performance
testing and
Github user jkbradley commented on a diff in the pull request:
https://github.com/apache/spark/pull/2378#discussion_r17703595
--- Diff: python/pyspark/mllib/tree.py ---
@@ -90,53 +89,24 @@ class DecisionTree(object):
EXPERIMENTAL: This is an experimental API.
Github user jkbradley commented on a diff in the pull request:
https://github.com/apache/spark/pull/2378#discussion_r17703466
--- Diff: python/pyspark/mllib/tests.py ---
@@ -198,41 +212,36 @@ def test_serialize(self):
lil[1, 0] = 1
lil[3, 0] = 2
Github user jkbradley commented on a diff in the pull request:
https://github.com/apache/spark/pull/2378#discussion_r17702101
--- Diff: python/pyspark/mllib/linalg.py ---
@@ -257,10 +410,34 @@ def stringify(vector):
>>> Vectors.stringify(Vectors.dense([0.0, 1.0]))
Github user jkbradley commented on a diff in the pull request:
https://github.com/apache/spark/pull/2378#discussion_r17702050
--- Diff: python/pyspark/mllib/linalg.py ---
@@ -257,10 +410,34 @@ def stringify(vector):
>>> Vectors.stringify(Vectors.dense([0.0, 1.0]))
Github user jkbradley commented on a diff in the pull request:
https://github.com/apache/spark/pull/2378#discussion_r17701626
--- Diff: python/pyspark/mllib/linalg.py ---
@@ -61,16 +195,19 @@ def __init__(self, size, *args):
if type(pairs) == dict:
Github user jkbradley commented on a diff in the pull request:
https://github.com/apache/spark/pull/2378#discussion_r17701227
--- Diff: python/pyspark/mllib/linalg.py ---
@@ -61,16 +195,19 @@ def __init__(self, size, *args):
if type(pairs) == dict:
Github user jkbradley commented on a diff in the pull request:
https://github.com/apache/spark/pull/2378#discussion_r17701086
--- Diff: python/pyspark/mllib/linalg.py ---
@@ -61,16 +195,19 @@ def __init__(self, size, *args):
if type(pairs) == dict:
Github user jkbradley commented on a diff in the pull request:
https://github.com/apache/spark/pull/2378#discussion_r17700987
--- Diff: python/pyspark/mllib/linalg.py ---
@@ -23,14 +23,148 @@
SciPy is available in their environment.
"""
-import numpy
-from nu
Github user jkbradley commented on a diff in the pull request:
https://github.com/apache/spark/pull/2378#discussion_r17700471
--- Diff: python/pyspark/mllib/linalg.py ---
@@ -23,14 +23,148 @@
SciPy is available in their environment.
"""
-import numpy
-from nu
Github user jkbradley commented on a diff in the pull request:
https://github.com/apache/spark/pull/2378#discussion_r17700424
--- Diff: python/pyspark/mllib/linalg.py ---
@@ -23,14 +23,148 @@
SciPy is available in their environment.
"""
-import numpy
-from nu
Github user jkbradley commented on a diff in the pull request:
https://github.com/apache/spark/pull/2378#discussion_r17700232
--- Diff: python/pyspark/mllib/linalg.py ---
@@ -23,14 +23,148 @@
SciPy is available in their environment.
"""
-import numpy
-from nu
Github user jkbradley commented on a diff in the pull request:
https://github.com/apache/spark/pull/2378#discussion_r17698673
--- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/Matrices.scala
---
@@ -64,6 +64,12 @@ class DenseMatrix(val numRows: Int, val numCols: Int,
va
Github user jkbradley commented on a diff in the pull request:
https://github.com/apache/spark/pull/2378#discussion_r17698519
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/api/python/PythonMLLibAPI.scala ---
@@ -476,259 +436,167 @@ class PythonMLLibAPI extends Serializable
Github user jkbradley commented on a diff in the pull request:
https://github.com/apache/spark/pull/2378#discussion_r17697397
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/api/python/PythonMLLibAPI.scala ---
@@ -476,259 +436,167 @@ class PythonMLLibAPI extends Serializable
Github user jkbradley commented on a diff in the pull request:
https://github.com/apache/spark/pull/2378#discussion_r17697102
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/api/python/PythonMLLibAPI.scala ---
@@ -476,259 +436,167 @@ class PythonMLLibAPI extends Serializable
Github user jkbradley commented on a diff in the pull request:
https://github.com/apache/spark/pull/2378#discussion_r17694320
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/api/python/PythonMLLibAPI.scala ---
@@ -40,11 +43,11 @@ import org.apache.spark.mllib.util.MLUtils
Github user jkbradley commented on a diff in the pull request:
https://github.com/apache/spark/pull/2378#discussion_r17693196
--- Diff: python/pyspark/mllib/recommendation.py ---
@@ -54,34 +64,51 @@ def __del__(self):
def predict(self, user, product):
return se
Github user staple commented on a diff in the pull request:
https://github.com/apache/spark/pull/2378#discussion_r17687682
--- Diff: python/pyspark/mllib/recommendation.py ---
@@ -54,34 +64,51 @@ def __del__(self):
def predict(self, user, product):
return self.
Github user davies commented on a diff in the pull request:
https://github.com/apache/spark/pull/2378#discussion_r17686887
--- Diff: python/pyspark/mllib/recommendation.py ---
@@ -54,34 +64,51 @@ def __del__(self):
def predict(self, user, product):
return self.
Github user davies commented on a diff in the pull request:
https://github.com/apache/spark/pull/2378#discussion_r17686849
--- Diff: python/pyspark/mllib/recommendation.py ---
@@ -54,34 +64,51 @@ def __del__(self):
def predict(self, user, product):
return self.
Github user staple commented on a diff in the pull request:
https://github.com/apache/spark/pull/2378#discussion_r17686208
--- Diff: python/pyspark/mllib/recommendation.py ---
@@ -54,34 +64,51 @@ def __del__(self):
def predict(self, user, product):
return self.
Github user davies commented on the pull request:
https://github.com/apache/spark/pull/2378#issuecomment-55916761
@mengxr it's ready to review now, thanks.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project d
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/2378#issuecomment-55860805
[QA tests have
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20453/consoleFull)
for PR 2378 at commit
[`e431377`](https://github.com/a
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/2378#issuecomment-55860743
[QA tests have
finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/122/consoleFull)
for PR 2378 at commit
[`e431377`](https://github.com/
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/2378#issuecomment-55855377
[QA tests have
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20453/consoleFull)
for PR 2378 at commit
[`e431377`](https://github.com/ap
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/2378#issuecomment-55855348
[QA tests have
started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/122/consoleFull)
for PR 2378 at commit
[`e431377`](https://github.com/a
54 matches
Mail list logo