spark git commit: [SPARK-8541] [PYSPARK] test the absolute error in approx doctests

2015-06-23 Thread joshrosen
Repository: spark Updated Branches: refs/heads/master 9b618fb0d - f0dcbe8a7 [SPARK-8541] [PYSPARK] test the absolute error in approx doctests A minor change but one which is (presumably) visible on the public api docs webpage. Author: Scott Taylor git...@megatron.me.uk Closes #6942 from

spark git commit: [SPARK-8541] [PYSPARK] test the absolute error in approx doctests

2015-06-23 Thread joshrosen
Repository: spark Updated Branches: refs/heads/branch-1.4 22cc1ab66 - d0943afbc [SPARK-8541] [PYSPARK] test the absolute error in approx doctests A minor change but one which is (presumably) visible on the public api docs webpage. Author: Scott Taylor git...@megatron.me.uk Closes #6942

spark git commit: [SPARK-8541] [PYSPARK] test the absolute error in approx doctests

2015-06-23 Thread joshrosen
Repository: spark Updated Branches: refs/heads/branch-1.3 45b4527e3 - 716dcf631 [SPARK-8541] [PYSPARK] test the absolute error in approx doctests A minor change but one which is (presumably) visible on the public api docs webpage. Author: Scott Taylor git...@megatron.me.uk Closes #6942

spark git commit: [SPARK-8462] [DOCS] Documentation fixes for Spark SQL

2015-06-18 Thread joshrosen
Repository: spark Updated Branches: refs/heads/branch-1.4 152f4465d - bd9bbd611 [SPARK-8462] [DOCS] Documentation fixes for Spark SQL This fixes various minor documentation issues on the Spark SQL page Author: Lars Francke lars.fran...@gmail.com Closes #6890 from lfrancke/SPARK-8462 and

spark git commit: [SPARK-8462] [DOCS] Documentation fixes for Spark SQL

2015-06-18 Thread joshrosen
Repository: spark Updated Branches: refs/heads/master 43f50decd - 4ce3bab89 [SPARK-8462] [DOCS] Documentation fixes for Spark SQL This fixes various minor documentation issues on the Spark SQL page Author: Lars Francke lars.fran...@gmail.com Closes #6890 from lfrancke/SPARK-8462 and

spark git commit: [SPARK-8135] Don't load defaults when reconstituting Hadoop Configurations

2015-06-18 Thread joshrosen
Repository: spark Updated Branches: refs/heads/master dc4131389 - 43f50decd [SPARK-8135] Don't load defaults when reconstituting Hadoop Configurations Author: Sandy Ryza sa...@cloudera.com Closes #6679 from sryza/sandy-spark-8135 and squashes the following commits: c5554ff [Sandy Ryza]

spark git commit: [SPARK-8202] [PYSPARK] fix infinite loop during external sort in PySpark

2015-06-18 Thread joshrosen
that the items usually have similar size, so we don't need to adjust the batch size after first spill. cc JoshRosen rxin angelini Author: Davies Liu dav...@databricks.com Closes #6714 from davies/batch_size and squashes the following commits: b170dfb [Davies Liu] update test b9be832 [Davies Liu] Merge

spark git commit: [SPARK-8353] [DOCS] Show anchor links when hovering over documentation headers

2015-06-18 Thread joshrosen
(it was introduced for the old AMPCamp training, but isn't used anymore). Author: Josh Rosen joshro...@databricks.com Closes #6808 from JoshRosen/SPARK-8353 and squashes the following commits: e59d8a7 [Josh Rosen] Suppress underline on hover f518b6a [Josh Rosen] Turn on for all headers, since we use

spark git commit: [SPARK-8353] [DOCS] Show anchor links when hovering over documentation headers

2015-06-18 Thread joshrosen
(it was introduced for the old AMPCamp training, but isn't used anymore). Author: Josh Rosen joshro...@databricks.com Closes #6808 from JoshRosen/SPARK-8353 and squashes the following commits: e59d8a7 [Josh Rosen] Suppress underline on hover f518b6a [Josh Rosen] Turn on for all headers, since we use H1s

spark git commit: [SPARK-7017] [BUILD] [PROJECT INFRA] Refactor dev/run-tests into Python

2015-06-17 Thread joshrosen
Repository: spark Updated Branches: refs/heads/master 6765ef98d - 50a0496a4 [SPARK-7017] [BUILD] [PROJECT INFRA] Refactor dev/run-tests into Python All, this is a first attempt at refactoring `dev/run-tests` into Python. Initially I merely converted all Bash calls over to Python, then moved

spark git commit: [SPARK-8381][SQL]reuse typeConvert when convert Seq[Row] to catalyst type

2015-06-17 Thread joshrosen
Repository: spark Updated Branches: refs/heads/master 3b6107704 - 9db73ec12 [SPARK-8381][SQL]reuse typeConvert when convert Seq[Row] to catalyst type reuse-typeConvert when convert Seq[Row] to CatalystType Author: Lianhui Wang lianhuiwan...@gmail.com Closes #6831 from

spark git commit: [HOTFIX] [PROJECT-INFRA] Fix bug in dev/run-tests for MLlib-only PRs

2015-06-17 Thread joshrosen
Repository: spark Updated Branches: refs/heads/master d1069cba4 - 165f52f2f [HOTFIX] [PROJECT-INFRA] Fix bug in dev/run-tests for MLlib-only PRs Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/165f52f2 Tree:

spark git commit: [SPARK-8354] [SQL] Fix off-by-factor-of-8 error when allocating scratch space in UnsafeFixedWidthAggregationMap

2015-06-14 Thread joshrosen
of UnsafeRows, since UnsafeRowConverter already used integers when calculating the size requirements for rows. Author: Josh Rosen joshro...@databricks.com Closes #6809 from JoshRosen/sql-bytes-vs-words-fix and squashes the following commits: 6520339 [Josh Rosen] Updates to reflect fact

spark git commit: [SPARK-8354] [SQL] Fix off-by-factor-of-8 error when allocating scratch space in UnsafeFixedWidthAggregationMap

2015-06-14 Thread joshrosen
of UnsafeRows, since UnsafeRowConverter already used integers when calculating the size requirements for rows. Author: Josh Rosen joshro...@databricks.com Closes #6809 from JoshRosen/sql-bytes-vs-words-fix and squashes the following commits: 6520339 [Josh Rosen] Updates to reflect fact

spark git commit: [SPARK-8319] [CORE] [SQL] Update logic related to key orderings in shuffle dependencies

2015-06-13 Thread joshrosen
should be safe. Author: Josh Rosen joshro...@databricks.com Closes #6773 from JoshRosen/SPARK-8319 and squashes the following commits: 7a14129 [Josh Rosen] Revise comments; add handler to guard against future ShuffleManager implementations 07bb2c9 [Josh Rosen] Update comment to clarify

spark git commit: [SPARK-8062] Fix NullPointerException in SparkHadoopUtil.getFileSystemThreadStatistics (branch-1.2)

2015-06-08 Thread joshrosen
this in followup patches. Author: Josh Rosen joshro...@databricks.com Closes #6618 from JoshRosen/SPARK-8062-branch-1.2 and squashes the following commits: 652fa3c [Josh Rosen] Re-name test and reapply fix 66fc600 [Josh Rosen] Fix and minimize regression test (verified that it still fails) 1d8d125

spark git commit: [HOTFIX] Remove trailing whitespace to fix Scalastyle checks

2015-05-31 Thread joshrosen
Repository: spark Updated Branches: refs/heads/branch-1.4 f1d4e7e31 - df0bf71ee [HOTFIX] Remove trailing whitespace to fix Scalastyle checks 866652c903d06d1cb4356283e0741119d84dcc21 enabled this check. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit:

spark git commit: [HOTFIX] Replace FunSuite with SparkFunSuite.

2015-05-30 Thread joshrosen
Repository: spark Updated Branches: refs/heads/master 1281a3518 - 66a53a696 [HOTFIX] Replace FunSuite with SparkFunSuite. This fixes a build break introduced by merging a6430028ecd7a6130f1eb15af9ec00e242c46725, which fails the new style checks that ensure that we use SparkFunSuite instead of

spark git commit: [SPARK-7910] [TINY] [JAVAAPI] expose partitioner information in javardd

2015-05-29 Thread joshrosen
Repository: spark Updated Branches: refs/heads/master 1c5b19827 - 82a396c2f [SPARK-7910] [TINY] [JAVAAPI] expose partitioner information in javardd Author: Holden Karau hol...@pigscanfly.ca Closes #6464 from holdenk/SPARK-7910-expose-partitioner-information-in-javardd and squashes the

spark git commit: [SPARK-7766] KryoSerializerInstance reuse is unsafe when auto-reset is disabled

2015-05-22 Thread joshrosen
only occurs when auto-reset is disabled and reference-tracking is enabled. Author: Josh Rosen joshro...@databricks.com Closes #6293 from JoshRosen/kryo-instance-reuse-bug and squashes the following commits: e19726d [Josh Rosen] Add fix for SPARK-7766. 71845e3 [Josh Rosen] Add failing regression

spark git commit: [SPARK-7766] KryoSerializerInstance reuse is unsafe when auto-reset is disabled

2015-05-22 Thread joshrosen
that this problem only occurs when auto-reset is disabled and reference-tracking is enabled. Author: Josh Rosen joshro...@databricks.com Closes #6293 from JoshRosen/kryo-instance-reuse-bug and squashes the following commits: e19726d [Josh Rosen] Add fix for SPARK-7766. 71845e3 [Josh Rosen] Add failing

spark git commit: [SPARK-7760] add /json back into master worker pages; add test

2015-05-22 Thread joshrosen
Repository: spark Updated Branches: refs/heads/branch-1.4 d6cb04463 - afde4019b [SPARK-7760] add /json back into master worker pages; add test Author: Imran Rashid iras...@cloudera.com Closes #6284 from squito/SPARK-7760 and squashes the following commits: 5e02d8a [Imran Rashid] style;

spark git commit: [SPARK-7795] [CORE] Speed up task scheduling in standalone mode by reusing serializer

2015-05-22 Thread joshrosen
Repository: spark Updated Branches: refs/heads/master 63a5ce75e - a16357413 [SPARK-7795] [CORE] Speed up task scheduling in standalone mode by reusing serializer My experiments with scheduling very short tasks in standalone cluster mode indicated that a significant amount of time was being

spark git commit: [SPARK-7711] Add a startTime property to match the corresponding one in Scala

2015-05-21 Thread joshrosen
Repository: spark Updated Branches: refs/heads/branch-1.4 e597692ac - c9a80fc40 [SPARK-7711] Add a startTime property to match the corresponding one in Scala Author: Holden Karau hol...@pigscanfly.ca Closes #6275 from holdenk/SPARK-771-startTime-is-missing-from-pyspark and squashes the

spark git commit: [SPARK-7711] Add a startTime property to match the corresponding one in Scala

2015-05-21 Thread joshrosen
Repository: spark Updated Branches: refs/heads/master 3d085 - 6b18cdc1b [SPARK-7711] Add a startTime property to match the corresponding one in Scala Author: Holden Karau hol...@pigscanfly.ca Closes #6275 from holdenk/SPARK-771-startTime-is-missing-from-pyspark and squashes the

spark git commit: [BUILD] Always run SQL tests in master build.

2015-05-21 Thread joshrosen
build and a regular build. If a build is a regular one, we always set _RUN_SQL_TESTS to true. cc JoshRosen nchammas Author: Yin Huai yh...@databricks.com Closes #5955 from yhuai/runSQLTests and squashes the following commits: 3d399bc [Yin Huai] Always run SQL tests in master build. (cherry

spark git commit: [BUILD] Always run SQL tests in master build.

2015-05-21 Thread joshrosen
build and a regular build. If a build is a regular one, we always set _RUN_SQL_TESTS to true. cc JoshRosen nchammas Author: Yin Huai yh...@databricks.com Closes #5955 from yhuai/runSQLTests and squashes the following commits: 3d399bc [Yin Huai] Always run SQL tests in master build. Project: http

spark git commit: [SPARK-7800] isDefined should not marked too early in putNewKey

2015-05-21 Thread joshrosen
and will cause problem because it is too early and before some assert checking. E.g., if an attempt with incorrect `keyLengthBytes` marks `isDefined` as true, the location can not be used later. ping JoshRosen Author: Liang-Chi Hsieh vii...@gmail.com Closes #6324 from viirya/dup_isdefined

spark git commit: [SPARK-7251] Perform sequential scan when iterating over BytesToBytesMap

2015-05-20 Thread joshrosen
JoshRosen/SPARK-7251 and squashes the following commits: 05bd90a [Josh Rosen] Compare capacity, not size, to MAX_CAPACITY 2a20d71 [Josh Rosen] Fix maximum BytesToBytesMap capacity bc4854b [Josh Rosen] Guard against overflow when growing BytesToBytesMap f5feadf [Josh Rosen] Add test for iterating over

spark git commit: [SPARK-7719] Re-add UnsafeShuffleWriterSuite test that was removed for Java 6 compat

2015-05-20 Thread joshrosen
now use Guava's `Iterators.emptyIterator()` in place of `Collections.emptyIterator()`, which isn't present in all Java 6 versions. Author: Josh Rosen joshro...@databricks.com Closes #6298 from JoshRosen/SPARK-7719-fix-java-6-test-code and squashes the following commits: 5c9bd85 [Josh Rosen] Re

spark git commit: [SPARK-7719] Re-add UnsafeShuffleWriterSuite test that was removed for Java 6 compat

2015-05-20 Thread joshrosen
. We now use Guava's `Iterators.emptyIterator()` in place of `Collections.emptyIterator()`, which isn't present in all Java 6 versions. Author: Josh Rosen joshro...@databricks.com Closes #6298 from JoshRosen/SPARK-7719-fix-java-6-test-code and squashes the following commits: 5c9bd85 [Josh Rosen

spark git commit: [SPARK-7698] Cache and reuse buffers in ExecutorMemoryAllocator when using heap allocation

2015-05-20 Thread joshrosen
on a stress test which launches huge numbers of short-lived shuffle map tasks back-to-back in the same JVM. Author: Josh Rosen joshro...@databricks.com Closes #6227 from JoshRosen/SPARK-7698 and squashes the following commits: fd6cb55 [Josh Rosen] SoftReference - WeakReference b154e86 [Josh

spark git commit: [SPARK-7698] Cache and reuse buffers in ExecutorMemoryAllocator when using heap allocation

2015-05-20 Thread joshrosen
on a stress test which launches huge numbers of short-lived shuffle map tasks back-to-back in the same JVM. Author: Josh Rosen joshro...@databricks.com Closes #6227 from JoshRosen/SPARK-7698 and squashes the following commits: fd6cb55 [Josh Rosen] SoftReference - WeakReference b154e86 [Josh Rosen

spark git commit: [SPARK-6216] [PYSPARK] check python version of worker with driver

2015-05-18 Thread joshrosen
Repository: spark Updated Branches: refs/heads/master 9dadf019b - 32fbd297d [SPARK-6216] [PYSPARK] check python version of worker with driver This PR revert #5404, change to pass the version of python in driver into JVM, check it in worker before deserializing closure, then it can works with

spark git commit: [SPARK-7660] Wrap SnappyOutputStream to work around snappy-java bug

2015-05-17 Thread joshrosen
fix. Author: Josh Rosen joshro...@databricks.com Closes #6176 from JoshRosen/SPARK-7660-wrap-snappy and squashes the following commits: 8b77aae [Josh Rosen] Wrap SnappyOutputStream to fix SPARK-7660 (cherry picked from commit f2cc6b5bccc3a70fd7d69183b1a068800831fe19) Signed-off-by: Josh Rosen

spark git commit: [SPARK-7660] Wrap SnappyOutputStream to work around snappy-java bug

2015-05-17 Thread joshrosen
. Author: Josh Rosen joshro...@databricks.com Closes #6176 from JoshRosen/SPARK-7660-wrap-snappy and squashes the following commits: 8b77aae [Josh Rosen] Wrap SnappyOutputStream to fix SPARK-7660 Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org

spark git commit: [SPARK-7660] Wrap SnappyOutputStream to work around snappy-java bug

2015-05-17 Thread joshrosen
fix. Author: Josh Rosen joshro...@databricks.com Closes #6176 from JoshRosen/SPARK-7660-wrap-snappy and squashes the following commits: 8b77aae [Josh Rosen] Wrap SnappyOutputStream to fix SPARK-7660 (cherry picked from commit f2cc6b5bccc3a70fd7d69183b1a068800831fe19) Signed-off-by: Josh Rosen

spark git commit: [SPARK-7660] Wrap SnappyOutputStream to work around snappy-java bug

2015-05-17 Thread joshrosen
fix. Author: Josh Rosen joshro...@databricks.com Closes #6176 from JoshRosen/SPARK-7660-wrap-snappy and squashes the following commits: 8b77aae [Josh Rosen] Wrap SnappyOutputStream to fix SPARK-7660 (cherry picked from commit f2cc6b5bccc3a70fd7d69183b1a068800831fe19) Signed-off-by: Josh Rosen

spark git commit: [HOTFIX] Add workaround for SPARK-7660 to fix JavaAPISuite failures.

2015-05-15 Thread joshrosen
Repository: spark Updated Branches: refs/heads/master e8f0e016e - 7da33ce50 [HOTFIX] Add workaround for SPARK-7660 to fix JavaAPISuite failures. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/7da33ce5 Tree:

spark git commit: [HOTFIX] Add workaround for SPARK-7660 to fix JavaAPISuite failures.

2015-05-15 Thread joshrosen
Repository: spark Updated Branches: refs/heads/branch-1.4 7aa269f4b - 1206a5597 [HOTFIX] Add workaround for SPARK-7660 to fix JavaAPISuite failures. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/1206a559 Tree:

spark git commit: [SPARK-7436] Fixed instantiation of custom recovery mode factory and added tests

2015-05-08 Thread joshrosen
Repository: spark Updated Branches: refs/heads/master 008a60dd3 - 35d6a99cb [SPARK-7436] Fixed instantiation of custom recovery mode factory and added tests Author: Jacek Lewandowski lewandowski.ja...@gmail.com Closes #5977 from jacek-lewandowski/SPARK-7436 and squashes the following

[2/2] spark git commit: [SPARK-6627] Finished rename to ShuffleBlockResolver

2015-05-08 Thread joshrosen
[SPARK-6627] Finished rename to ShuffleBlockResolver The previous cleanup-commit for SPARK-6627 renamed ShuffleBlockManager to ShuffleBlockResolver, but didn't rename the associated subclasses and variables; this commit does that. I'm unsure whether it's ok to rename ExternalShuffleBlockManager,

[2/2] spark git commit: [SPARK-6627] Finished rename to ShuffleBlockResolver

2015-05-08 Thread joshrosen
[SPARK-6627] Finished rename to ShuffleBlockResolver The previous cleanup-commit for SPARK-6627 renamed ShuffleBlockManager to ShuffleBlockResolver, but didn't rename the associated subclasses and variables; this commit does that. I'm unsure whether it's ok to rename ExternalShuffleBlockManager,

[1/2] spark git commit: [SPARK-6627] Finished rename to ShuffleBlockResolver

2015-05-08 Thread joshrosen
Repository: spark Updated Branches: refs/heads/master 2d05f325d - 4b3bb0e43 http://git-wip-us.apache.org/repos/asf/spark/blob/4b3bb0e4/network/shuffle/src/test/java/org/apache/spark/network/shuffle/ExternalShuffleBlockResolverSuite.java

spark git commit: [SPARK-7436] Fixed instantiation of custom recovery mode factory and added tests

2015-05-08 Thread joshrosen
Repository: spark Updated Branches: refs/heads/branch-1.3 edcd3643a - 7fd212b57 [SPARK-7436] Fixed instantiation of custom recovery mode factory and added tests Author: Jacek Lewandowski lewandowski.ja...@gmail.com Closes #5975 from jacek-lewandowski/SPARK-7436-1.3 and squashes the following

spark git commit: [SPARK-7436] Fixed instantiation of custom recovery mode factory and added tests

2015-05-08 Thread joshrosen
Repository: spark Updated Branches: refs/heads/branch-1.4 4f01f5b56 - 89d94878f [SPARK-7436] Fixed instantiation of custom recovery mode factory and added tests Author: Jacek Lewandowski lewandowski.ja...@gmail.com Closes #5976 from jacek-lewandowski/SPARK-7436-1.4 and squashes the following

spark git commit: Add `Private` annotation.

2015-05-06 Thread joshrosen
Repository: spark Updated Branches: refs/heads/master 002c12384 - 845d1d4d0 Add `Private` annotation. This was originally added as part of #4435, which was reverted. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit:

spark git commit: Add `Private` annotation.

2015-05-06 Thread joshrosen
Repository: spark Updated Branches: refs/heads/branch-1.4 d651e2838 - 2163367ea Add `Private` annotation. This was originally added as part of #4435, which was reverted. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit:

spark git commit: [SPARK-7311] Introduce internal Serializer API for determining if serializers support object relocation

2015-05-06 Thread joshrosen
and comments clarifying when this works for KryoSerializer. This change allows the optimizations in #4450 to be applied for shuffles that use `SqlSerializer2`. Author: Josh Rosen joshro...@databricks.com Closes #5924 from JoshRosen/SPARK-7311 and squashes the following commits: 50a68ca [Josh Rosen

spark git commit: [SPARK-7311] Introduce internal Serializer API for determining if serializers support object relocation

2015-05-06 Thread joshrosen
and comments clarifying when this works for KryoSerializer. This change allows the optimizations in #4450 to be applied for shuffles that use `SqlSerializer2`. Author: Josh Rosen joshro...@databricks.com Closes #5924 from JoshRosen/SPARK-7311 and squashes the following commits: 50a68ca [Josh Rosen

spark git commit: Some minor cleanup after SPARK-4550.

2015-05-05 Thread joshrosen
Repository: spark Updated Branches: refs/heads/master c688e3c5e - 0092abb47 Some minor cleanup after SPARK-4550. JoshRosen this PR addresses the comments you left on #4450 after it got merged. Author: Sandy Ryza sa...@cloudera.com Closes #5916 from sryza/sandy-spark-4550-cleanup

spark git commit: Some minor cleanup after SPARK-4550.

2015-05-05 Thread joshrosen
Repository: spark Updated Branches: refs/heads/branch-1.4 4afb578b7 - 762ff2e11 Some minor cleanup after SPARK-4550. JoshRosen this PR addresses the comments you left on #4450 after it got merged. Author: Sandy Ryza sa...@cloudera.com Closes #5916 from sryza/sandy-spark-4550-cleanup

spark git commit: [SPARK-6661] Python type errors should print type, not object

2015-04-20 Thread joshrosen
Repository: spark Updated Branches: refs/heads/master 968ad9721 - 77176619a [SPARK-6661] Python type errors should print type, not object Author: Elisey Zanko elisey.za...@gmail.com Closes #5361 from 31z4/spark-6661 and squashes the following commits: 73c5d79 [Elisey Zanko] Python type

[5/6] spark git commit: [SPARK-4897] [PySpark] Python 3 support

2015-04-17 Thread joshrosen
http://git-wip-us.apache.org/repos/asf/spark/blob/04e44b37/examples/src/main/python/sort.py -- diff --git a/examples/src/main/python/sort.py b/examples/src/main/python/sort.py index bb686f1..f6b0ecb 100755 ---

[1/6] spark git commit: [SPARK-4897] [PySpark] Python 3 support

2015-04-17 Thread joshrosen
Repository: spark Updated Branches: refs/heads/master 55f553a97 - 04e44b37c http://git-wip-us.apache.org/repos/asf/spark/blob/04e44b37/python/pyspark/tests.py -- diff --git a/python/pyspark/tests.py b/python/pyspark/tests.py

[6/6] spark git commit: [SPARK-4897] [PySpark] Python 3 support

2015-04-16 Thread joshrosen
[SPARK-4897] [PySpark] Python 3 support This PR update PySpark to support Python 3 (tested with 3.4). Known issue: unpickle array from Pyrolite is broken in Python 3, those tests are skipped. TODO: ec2/spark-ec2.py is not fully tested with python3. Author: Davies Liu dav...@databricks.com

[2/6] spark git commit: [SPARK-4897] [PySpark] Python 3 support

2015-04-16 Thread joshrosen
http://git-wip-us.apache.org/repos/asf/spark/blob/04e44b37/python/pyspark/sql/types.py -- diff --git a/python/pyspark/sql/types.py b/python/pyspark/sql/types.py deleted file mode 100644 index ef76d84..000 ---

[3/6] spark git commit: [SPARK-4897] [PySpark] Python 3 support

2015-04-16 Thread joshrosen
http://git-wip-us.apache.org/repos/asf/spark/blob/04e44b37/python/pyspark/sql/_types.py -- diff --git a/python/pyspark/sql/_types.py b/python/pyspark/sql/_types.py new file mode 100644 index 000..492c0cb --- /dev/null +++

spark git commit: [SPARK-6886] [PySpark] fix big closure with shuffle

2015-04-15 Thread joshrosen
in Python may be GCed, then the broadcast will be destroyed in JVM before the PythonRDD. This PR change to use PythonRDD to track the lifecycle of the broadcast object. It also have a refactor about getNumPartitions() to avoid unnecessary creation of PythonRDD, which could be heavy. cc JoshRosen

spark git commit: Revert [SPARK-5634] [core] Show correct message in HS when no incomplete apps f...

2015-04-15 Thread joshrosen
Repository: spark Updated Branches: refs/heads/branch-1.2 964f54478 - 8e9fc27aa Revert [SPARK-5634] [core] Show correct message in HS when no incomplete apps f... This reverts commit 5845a62361c39eb97df5de01c982821c8858de76. This was reverted because it broke compilation for branch-1.2.

spark git commit: [SPARK-6886] [PySpark] fix big closure with shuffle

2015-04-15 Thread joshrosen
may be GCed, then the broadcast will be destroyed in JVM before the PythonRDD. This PR change to use PythonRDD to track the lifecycle of the broadcast object. It also have a refactor about getNumPartitions() to avoid unnecessary creation of PythonRDD, which could be heavy. cc JoshRosen Author

spark git commit: [SPARK-6886] [PySpark] fix big closure with shuffle

2015-04-15 Thread joshrosen
in Python may be GCed, then the broadcast will be destroyed in JVM before the PythonRDD. This PR change to use PythonRDD to track the lifecycle of the broadcast object. It also have a refactor about getNumPartitions() to avoid unnecessary creation of PythonRDD, which could be heavy. cc JoshRosen

spark git commit: [SPARK-6905] Upgrade to snappy-java 1.1.1.7

2015-04-14 Thread joshrosen
/xerial/snappy-java/issues/100). Author: Josh Rosen joshro...@databricks.com Closes #5512 from JoshRosen/snappy-1.1.1.7 and squashes the following commits: f1ac0f8 [Josh Rosen] Upgrade to snappy-java 1.1.1.7. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip

spark git commit: [SPARK-6905] Upgrade to snappy-java 1.1.1.7

2015-04-14 Thread joshrosen
://github.com/xerial/snappy-java/issues/100). Author: Josh Rosen joshro...@databricks.com Closes #5512 from JoshRosen/snappy-1.1.1.7 and squashes the following commits: f1ac0f8 [Josh Rosen] Upgrade to snappy-java 1.1.1.7. (cherry picked from commit 6adb8bcbf0a1a7bfe2990de18c59c66cd7a0aeb8) Signed-off

spark git commit: [SPARK-6905] Upgrade to snappy-java 1.1.1.7

2015-04-14 Thread joshrosen
://github.com/xerial/snappy-java/issues/100). Author: Josh Rosen joshro...@databricks.com Closes #5512 from JoshRosen/snappy-1.1.1.7 and squashes the following commits: f1ac0f8 [Josh Rosen] Upgrade to snappy-java 1.1.1.7. (cherry picked from commit 6adb8bcbf0a1a7bfe2990de18c59c66cd7a0aeb8) Signed-off

spark git commit: Revert [SPARK-6352] [SQL] Add DirectParquetOutputCommitter

2015-04-14 Thread joshrosen
Repository: spark Updated Branches: refs/heads/master 4d4b24927 - a76b921a9 Revert [SPARK-6352] [SQL] Add DirectParquetOutputCommitter This reverts commit b29663eeea440b1d1a288d41b5ddf67e77c5bd54. I'm reverting this because it broke test compilation for the Hadoop 1.x profiles. Project:

spark git commit: [HOTFIX] Add explicit return types to fix lint errors

2015-04-11 Thread joshrosen
Repository: spark Updated Branches: refs/heads/master 5c2844c51 - dea5dacc5 [HOTFIX] Add explicit return types to fix lint errors Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/dea5dacc Tree:

spark git commit: [SPARK-6677] [SQL] [PySpark] fix cached classes

2015-04-11 Thread joshrosen
Repository: spark Updated Branches: refs/heads/branch-1.3 ea13948b9 - 8d4176132 [SPARK-6677] [SQL] [PySpark] fix cached classes It's possible to have two DataType object with same id (memory address) at different time, we should check the cached classes to verify that it's generated by

spark git commit: [SPARK-6677] [SQL] [PySpark] fix cached classes

2015-04-11 Thread joshrosen
Repository: spark Updated Branches: refs/heads/master 0cc8fcb4c - 5d8f7b9e8 [SPARK-6677] [SQL] [PySpark] fix cached classes It's possible to have two DataType object with same id (memory address) at different time, we should check the cached classes to verify that it's generated by given

spark git commit: [SPARK-5969][PySpark] Fix descending pyspark.rdd.sortByKey.

2015-04-10 Thread joshrosen
Repository: spark Updated Branches: refs/heads/master b9baa4cd9 - 0375134f4 [SPARK-5969][PySpark] Fix descending pyspark.rdd.sortByKey. The samples should always be sorted in ascending order, because bisect.bisect_left is used on it. The reverse order of the result is already achieved in

spark git commit: [SPARK-6216] [PySpark] check the python version in worker

2015-04-10 Thread joshrosen
Repository: spark Updated Branches: refs/heads/master 0375134f4 - 4740d6a15 [SPARK-6216] [PySpark] check the python version in worker Author: Davies Liu dav...@databricks.com Closes #5404 from davies/check_version and squashes the following commits: e559248 [Davies Liu] add tests ec33b5f

spark git commit: [SPARK-5969][PySpark] Fix descending pyspark.rdd.sortByKey.

2015-04-10 Thread joshrosen
Repository: spark Updated Branches: refs/heads/branch-1.3 ec3e76f1e - 48321b83d [SPARK-5969][PySpark] Fix descending pyspark.rdd.sortByKey. The samples should always be sorted in ascending order, because bisect.bisect_left is used on it. The reverse order of the result is already achieved

spark git commit: [SPARK-5969][PySpark] Fix descending pyspark.rdd.sortByKey.

2015-04-10 Thread joshrosen
Repository: spark Updated Branches: refs/heads/branch-1.2 7a1583917 - daec1c635 [SPARK-5969][PySpark] Fix descending pyspark.rdd.sortByKey. The samples should always be sorted in ascending order, because bisect.bisect_left is used on it. The reverse order of the result is already achieved

spark git commit: [SPARK-6506] [pyspark] Do not try to retrieve SPARK_HOME when not needed...

2015-04-08 Thread joshrosen
Repository: spark Updated Branches: refs/heads/master 15e0d2bd1 - f7e21dd1e [SPARK-6506] [pyspark] Do not try to retrieve SPARK_HOME when not needed... In particular, this makes pyspark in yarn-cluster mode fail unless SPARK_HOME is set, when it's not really needed. Author: Marcelo

spark git commit: [SPARK-6506] [pyspark] Do not try to retrieve SPARK_HOME when not needed...

2015-04-08 Thread joshrosen
Repository: spark Updated Branches: refs/heads/branch-1.3 cdef7d080 - e967ecaca [SPARK-6506] [pyspark] Do not try to retrieve SPARK_HOME when not needed... In particular, this makes pyspark in yarn-cluster mode fail unless SPARK_HOME is set, when it's not really needed. Author: Marcelo

spark git commit: [SPARK-6753] Clone SparkConf in ShuffleSuite tests

2015-04-08 Thread joshrosen
that subclass ShuffleSuite.scala. This commit fixes that problem. JoshRosen would be great if you could take a look at this, since you wrote this test originally. Author: Kay Ousterhout kayousterh...@gmail.com Closes #5401 from kayousterhout/SPARK-6753 and squashes the following commits: 368c540 [Kay

spark git commit: [SPARK-6753] Clone SparkConf in ShuffleSuite tests

2015-04-08 Thread joshrosen
that subclass ShuffleSuite.scala. This commit fixes that problem. JoshRosen would be great if you could take a look at this, since you wrote this test originally. Author: Kay Ousterhout kayousterh...@gmail.com Closes #5401 from kayousterhout/SPARK-6753 and squashes the following commits: 368c540 [Kay

spark git commit: [SPARK-6753] Clone SparkConf in ShuffleSuite tests

2015-04-08 Thread joshrosen
that subclass ShuffleSuite.scala. This commit fixes that problem. JoshRosen would be great if you could take a look at this, since you wrote this test originally. Author: Kay Ousterhout kayousterh...@gmail.com Closes #5401 from kayousterhout/SPARK-6753 and squashes the following commits: 368c540 [Kay

spark git commit: [SPARK-6636] Use public DNS hostname everywhere in spark_ec2.py

2015-04-07 Thread joshrosen
Repository: spark Updated Branches: refs/heads/branch-1.3 1cde04f21 - ab1b8edb8 [SPARK-6636] Use public DNS hostname everywhere in spark_ec2.py The spark_ec2.py script uses public_dns_name everywhere in the script except for testing ssh availability, which is done using the public ip address

spark git commit: [SPARK-6636] Use public DNS hostname everywhere in spark_ec2.py

2015-04-07 Thread joshrosen
Repository: spark Updated Branches: refs/heads/master a0846c4b6 - 6f0d55d76 [SPARK-6636] Use public DNS hostname everywhere in spark_ec2.py The spark_ec2.py script uses public_dns_name everywhere in the script except for testing ssh availability, which is done using the public ip address of

spark git commit: [SPARK-6716] Change SparkContext.DRIVER_IDENTIFIER from driver to driver

2015-04-07 Thread joshrosen
by metrics users, but it's probably okay to do this in a major release as long as we document it in the release notes. Author: Josh Rosen joshro...@databricks.com Closes #5372 from JoshRosen/driver-id-fix and squashes the following commits: 42d3c10 [Josh Rosen] Clarify comment 0c5d04b [Josh Rosen

spark git commit: [SPARK-6737] Fix memory leak in OutputCommitCoordinator

2015-04-07 Thread joshrosen
: Josh Rosen joshro...@databricks.com Closes #5397 from JoshRosen/SPARK-6737 and squashes the following commits: af3b02f [Josh Rosen] Consolidate stage completion handling code in a single method. e96ce3a [Josh Rosen] Consolidate stage completion handling code in a single method. 3052aea [Josh

spark git commit: [SPARK-6737] Fix memory leak in OutputCommitCoordinator

2015-04-07 Thread joshrosen
. Author: Josh Rosen joshro...@databricks.com Closes #5397 from JoshRosen/SPARK-6737 and squashes the following commits: af3b02f [Josh Rosen] Consolidate stage completion handling code in a single method. e96ce3a [Josh Rosen] Consolidate stage completion handling code in a single method. 3052aea

spark git commit: [SPARK-6209] Clean up connections in ExecutorClassLoader after failing to load classes (branch-1.2)

2015-04-05 Thread joshrosen
reproduction. This patch fixes this issue by ensuring proper cleanup of these resources. It also adds logging for unexpected error cases. (See #4944 for the corresponding PR for 1.3/1.4). Author: Josh Rosen joshro...@databricks.com Closes #5174 from JoshRosen/executorclassloaderleak-branch-1.2

spark git commit: SPARK-6414: Spark driver failed with NPE on job cancelation

2015-04-02 Thread joshrosen
Repository: spark Updated Branches: refs/heads/master 0cce5451a - e3202aa2e SPARK-6414: Spark driver failed with NPE on job cancelation Use Option for ActiveJob.properties to avoid NPE bug Author: Hung Lin hung@gmail.com Closes #5124 from hunglin/SPARK-6414 and squashes the following

spark git commit: SPARK-6414: Spark driver failed with NPE on job cancelation

2015-04-02 Thread joshrosen
Repository: spark Updated Branches: refs/heads/branch-1.3 a6664dcd8 - 58e2b3fcd SPARK-6414: Spark driver failed with NPE on job cancelation Use Option for ActiveJob.properties to avoid NPE bug Author: Hung Lin hung@gmail.com Closes #5124 from hunglin/SPARK-6414 and squashes the

spark git commit: [SPARK-6079] Use index to speed up StatusTracker.getJobIdsForGroup()

2015-04-02 Thread joshrosen
operation if there are many (e.g. thousands) of retained jobs. This patch adds a new map to `JobProgressListener` in order to speed up these lookups. Author: Josh Rosen joshro...@databricks.com Closes #4830 from JoshRosen/statustracker-job-group-indexing and squashes the following commits

spark git commit: SPARK-6414: Spark driver failed with NPE on job cancelation

2015-04-02 Thread joshrosen
Repository: spark Updated Branches: refs/heads/branch-1.2 a73055f7f - 8fa09a480 SPARK-6414: Spark driver failed with NPE on job cancelation Use Option for ActiveJob.properties to avoid NPE bug Author: Hung Lin hung@gmail.com Closes #5124 from hunglin/SPARK-6414 and squashes the

spark git commit: [SPARK-6621][Core] Fix the bug that calling EventLoop.stop in EventLoop.onReceive/onError/onStart doesn't call onStop

2015-04-02 Thread joshrosen
Repository: spark Updated Branches: refs/heads/master 6e1c1ec67 - 440ea31b7 [SPARK-6621][Core] Fix the bug that calling EventLoop.stop in EventLoop.onReceive/onError/onStart doesn't call onStop Author: zsxwing zsxw...@gmail.com Closes #5280 from zsxwing/SPARK-6621 and squashes the following

spark git commit: [SPARK-6621][Core] Fix the bug that calling EventLoop.stop in EventLoop.onReceive/onError/onStart doesn't call onStop

2015-04-02 Thread joshrosen
Repository: spark Updated Branches: refs/heads/branch-1.3 d21f77988 - ac705aa83 [SPARK-6621][Core] Fix the bug that calling EventLoop.stop in EventLoop.onReceive/onError/onStart doesn't call onStop Author: zsxwing zsxw...@gmail.com Closes #5280 from zsxwing/SPARK-6621 and squashes the

spark git commit: [SPARK-6667] [PySpark] remove setReuseAddress

2015-04-02 Thread joshrosen
Repository: spark Updated Branches: refs/heads/branch-1.3 1160cc9e1 - ee2bd70a4 [SPARK-6667] [PySpark] remove setReuseAddress The reused address on server side had caused the server can not acknowledge the connected connections, remove it. This PR will retry once after timeout, it also add

spark git commit: [SPARK-6667] [PySpark] remove setReuseAddress

2015-04-02 Thread joshrosen
Repository: spark Updated Branches: refs/heads/master 424e987df - 0cce5451a [SPARK-6667] [PySpark] remove setReuseAddress The reused address on server side had caused the server can not acknowledge the connected connections, remove it. This PR will retry once after timeout, it also add a

spark git commit: [SPARK-6667] [PySpark] remove setReuseAddress

2015-04-02 Thread joshrosen
Repository: spark Updated Branches: refs/heads/branch-1.2 758ebf77d - a73055f7f [SPARK-6667] [PySpark] remove setReuseAddress The reused address on server side had caused the server can not acknowledge the connected connections, remove it. This PR will retry once after timeout, it also add

spark git commit: [SPARK-6553] [pyspark] Support functools.partial as UDF

2015-04-01 Thread joshrosen
Repository: spark Updated Branches: refs/heads/branch-1.3 bc04fa2e2 - 98f72dfc1 [SPARK-6553] [pyspark] Support functools.partial as UDF Use `f.__repr__()` instead of `f.__name__` when instantiating `UserDefinedFunction`s, so `functools.partial`s may be used. Author: ksonj k...@siberie.de

spark git commit: [SPARK-6553] [pyspark] Support functools.partial as UDF

2015-04-01 Thread joshrosen
Repository: spark Updated Branches: refs/heads/master 86b439935 - 757b2e917 [SPARK-6553] [pyspark] Support functools.partial as UDF Use `f.__repr__()` instead of `f.__name__` when instantiating `UserDefinedFunction`s, so `functools.partial`s may be used. Author: ksonj k...@siberie.de

spark git commit: [SPARK-6614] OutputCommitCoordinator should clear authorized committer only after authorized committer fails, not after any failure

2015-03-31 Thread joshrosen
. Author: Josh Rosen joshro...@databricks.com Closes #5276 from JoshRosen/SPARK-6614 and squashes the following commits: d532ba7 [Josh Rosen] Check whether failed task was authorized committer cbb3784 [Josh Rosen] Add regression test for SPARK-6614 Project: http://git-wip-us.apache.org/repos/asf

spark git commit: [SPARK-6614] OutputCommitCoordinator should clear authorized committer only after authorized committer fails, not after any failure

2015-03-31 Thread joshrosen
: Josh Rosen joshro...@databricks.com Closes #5276 from JoshRosen/SPARK-6614 and squashes the following commits: d532ba7 [Josh Rosen] Check whether failed task was authorized committer cbb3784 [Josh Rosen] Add regression test for SPARK-6614 Project: http://git-wip-us.apache.org/repos/asf/spark

spark git commit: [SPARK-3266] Use intermediate abstract classes to fix type erasure issues in Java APIs

2015-03-24 Thread joshrosen
to this bug. Author: Josh Rosen joshro...@databricks.com Closes #5050 from JoshRosen/javardd-si-8905-fix and squashes the following commits: 2feb068 [Josh Rosen] Use intermediate abstract classes to work around SPARK-3266 d5f3e5d [Josh Rosen] Add failing regression tests for SPARK-3266 (cherry

spark git commit: [SPARK-6219] [Build] Check that Python code compiles

2015-03-19 Thread joshrosen
Repository: spark Updated Branches: refs/heads/master 3b5aaa6a5 - f17d43b03 [SPARK-6219] [Build] Check that Python code compiles This PR expands the Python lint checks so that they check for obvious compilation errors in our Python code. For example: ``` $ ./dev/lint-python Python lint

spark git commit: [SPARK-6394][Core] cleanup BlockManager companion object and improve the getCacheLocs method in DAGScheduler

2015-03-18 Thread joshrosen
Repository: spark Updated Branches: refs/heads/master 3db138742 - 540b2a4ea [SPARK-6394][Core] cleanup BlockManager companion object and improve the getCacheLocs method in DAGScheduler The current implementation include searching a HashMap many times, we can avoid this. Actually if you look

<    1   2   3   4   5   6   7   8   9   10   >