[GitHub] spark pull request: [SPARK-2679] [MLLib] Ser/De for Double

2014-07-27 Thread asfgit
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/1581 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is

[GitHub] spark pull request: [SPARK-2679] [MLLib] Ser/De for Double

2014-07-25 Thread mengxr
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/1581#issuecomment-50114457 Jenkins, retest this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: [SPARK-2679] [MLLib] Ser/De for Double

2014-07-25 Thread mengxr
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/1581#issuecomment-50114441 I agree it is safer to put the magic byte in front of every record. However, this is not a public API where users can throw in an arbitrary RDD and ask the serializer to

[GitHub] spark pull request: [SPARK-2679] [MLLib] Ser/De for Double

2014-07-25 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1581#issuecomment-50114833 QA tests have started for PR 1581. This patch merges cleanly. brView progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17175/consoleFull ---

[GitHub] spark pull request: [SPARK-2679] [MLLib] Ser/De for Double

2014-07-25 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1581#issuecomment-50199723 QA tests have started for PR 1581. This patch merges cleanly. brView progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17196/consoleFull ---

[GitHub] spark pull request: [SPARK-2679] [MLLib] Ser/De for Double

2014-07-25 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1581#issuecomment-50204094 QA results for PR 1581:br- This patch FAILED unit tests.br- This patch merges cleanlybr- This patch adds no public classesbrbrFor more information see test

[GitHub] spark pull request: [SPARK-2679] [MLLib] Ser/De for Double

2014-07-25 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1581#issuecomment-50204778 QA tests have started for PR 1581. This patch merges cleanly. brView progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17198/consoleFull ---

[GitHub] spark pull request: [SPARK-2679] [MLLib] Ser/De for Double

2014-07-25 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1581#issuecomment-50209389 QA results for PR 1581:br- This patch FAILED unit tests.br- This patch merges cleanlybr- This patch adds no public classesbrbrFor more information see test

[GitHub] spark pull request: [SPARK-2679] [MLLib] Ser/De for Double

2014-07-25 Thread dorx
Github user dorx commented on the pull request: https://github.com/apache/spark/pull/1581#issuecomment-50213453 Jenkins, retest this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: [SPARK-2679] [MLLib] Ser/De for Double

2014-07-25 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1581#issuecomment-50213667 QA tests have started for PR 1581. This patch merges cleanly. brView progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17204/consoleFull ---

[GitHub] spark pull request: [SPARK-2679] [MLLib] Ser/De for Double

2014-07-25 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1581#issuecomment-50215860 QA results for PR 1581:br- This patch PASSES unit tests.br- This patch merges cleanlybr- This patch adds no public classesbrbrFor more information see test

[GitHub] spark pull request: [SPARK-2679] [MLLib] Ser/De for Double

2014-07-24 Thread dorx
GitHub user dorx opened a pull request: https://github.com/apache/spark/pull/1581 [SPARK-2679] [MLLib] Ser/De for Double Added a set of serializer/deserializer for Double in _common.py and PythonMLLibAPI in MLLib. You can merge this pull request into a Git repository by running:

[GitHub] spark pull request: [SPARK-2679] [MLLib] Ser/De for Double

2014-07-24 Thread dorx
Github user dorx commented on the pull request: https://github.com/apache/spark/pull/1581#issuecomment-50091128 @falaki @mengxr Created a separate PR for this so I can use it in both the python correlation and python randomRDD additions. --- If your project is set up for it, you can

[GitHub] spark pull request: [SPARK-2679] [MLLib] Ser/De for Double

2014-07-24 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1581#issuecomment-50091566 QA tests have started for PR 1581. This patch merges cleanly. brView progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17142/consoleFull ---

[GitHub] spark pull request: [SPARK-2679] [MLLib] Ser/De for Double

2014-07-24 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/1581#discussion_r15379709 --- Diff: mllib/src/test/scala/org/apache/spark/mllib/api/python/PythonMLLibAPISuite.scala --- @@ -57,4 +57,12 @@ class PythonMLLibAPISuite extends FunSuite

[GitHub] spark pull request: [SPARK-2679] [MLLib] Ser/De for Double

2014-07-24 Thread mengxr
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/1581#issuecomment-50092457 @dorx Does double SerDe need the magic byte? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your

[GitHub] spark pull request: [SPARK-2679] [MLLib] Ser/De for Double

2014-07-24 Thread dorx
Github user dorx commented on the pull request: https://github.com/apache/spark/pull/1581#issuecomment-50092950 @mengxr Given the current list of supported types, no, but if someone down the road adds Long or arrays of chars/shorts, etc, which isn't far-fetched, then it becomes

[GitHub] spark pull request: [SPARK-2679] [MLLib] Ser/De for Double

2014-07-24 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1581#issuecomment-50093947 QA tests have started for PR 1581. This patch merges cleanly. brView progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17143/consoleFull ---

[GitHub] spark pull request: [SPARK-2679] [MLLib] Ser/De for Double

2014-07-24 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1581#issuecomment-50094529 QA results for PR 1581:br- This patch PASSES unit tests.br- This patch merges cleanlybr- This patch adds no public classesbrbrFor more information see test

[GitHub] spark pull request: [SPARK-2679] [MLLib] Ser/De for Double

2014-07-24 Thread mengxr
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/1581#issuecomment-50096544 This is a SerDe for Double only there is no other double type. It is different from the vector SerDe but similar to the Rating SerDe. --- If your project is set up for

[GitHub] spark pull request: [SPARK-2679] [MLLib] Ser/De for Double

2014-07-24 Thread mengxr
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/1581#issuecomment-50096692 If we need a complex type in the future, we can add a separate SerDe for it. Since double only has 8 byte, adding the magic byte increases the storage by 12.5%. --- If

[GitHub] spark pull request: [SPARK-2679] [MLLib] Ser/De for Double

2014-07-24 Thread dorx
Github user dorx commented on the pull request: https://github.com/apache/spark/pull/1581#issuecomment-50101155 The issue is what other things we can reasonably serialize into 8 bytes. Not sure how other types of doubles are relevant here since the size would be different and cause