git commit: [SPARK-1745] Move interrupted flag from TaskContext constructor (minor)

2014-05-10 Thread adav
Repository: spark Updated Branches: refs/heads/master 44dd57fb6 -> c3f8b78c2 [SPARK-1745] Move interrupted flag from TaskContext constructor (minor) It makes little sense to start a TaskContext that is interrupted. Indeed, I searched for all use cases of it and didn't find a single instance i

git commit: MINOR: Removing dead code.

2014-05-10 Thread pwendell
Repository: spark Updated Branches: refs/heads/master 7db47c463 -> 4c60fd1e8 MINOR: Removing dead code. Meant to do this when patching up the last merge. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/4c60fd1e Tree: http

git commit: [SPARK-1688] Propagate PySpark worker stderr to driver

2014-05-10 Thread adav
Repository: spark Updated Branches: refs/heads/branch-1.0 0759ee790 -> 82c8e89c9 [SPARK-1688] Propagate PySpark worker stderr to driver When at least one of the following conditions is true, PySpark cannot be loaded: 1. PYTHONPATH is not set 2. PYTHONPATH does not contain the python directory

git commit: [SPARK-1743][MLLIB] add loadLibSVMFile and saveAsLibSVMFile to pyspark

2014-05-10 Thread pwendell
Repository: spark Updated Branches: refs/heads/branch-1.0 879bd -> bb90e87f6 [SPARK-1743][MLLIB] add loadLibSVMFile and saveAsLibSVMFile to pyspark Make loading/saving labeled data easier for pyspark users. Also changed type check in `SparseVector` to allow numpy integers. Author: Xiangr

git commit: Include the sbin/spark-config.sh in spark-executor

2014-05-10 Thread pwendell
Repository: spark Updated Branches: refs/heads/master 191279ce4 -> 2fd2752e5 Include the sbin/spark-config.sh in spark-executor This is needed because broadcast values are broken on pyspark on Mesos, it tries to import pyspark but can't, as the PYTHONPATH is not set due to changes in ff5be9a

git commit: SPARK-1565 (Addendum): Replace `run-example` with `spark-submit`.

2014-05-10 Thread pwendell
Repository: spark Updated Branches: refs/heads/branch-1.0 7e1933451 -> f6323eb3b SPARK-1565 (Addendum): Replace `run-example` with `spark-submit`. Gives a nicely formatted message to the user when `run-example` is run to tell them to use `spark-submit`. Author: Patrick Wendell Closes #704 f

git commit: Fixing typo in als.py

2014-05-10 Thread shivaram
Repository: spark Updated Branches: refs/heads/branch-1.0 6f701ff55 -> 98944a973 Fixing typo in als.py XtY should be Xty. Author: Evan Sparks Closes #696 from etrain/patch-2 and squashes the following commits: 634cb8d [Evan Sparks] Fixing typo in als.py Project: http://git-wip-us.apache.

git commit: Bug fix of sparse vector conversion

2014-05-10 Thread pwendell
Repository: spark Updated Branches: refs/heads/branch-1.0 34529975e -> 9ed17ff34 Bug fix of sparse vector conversion Fixed a small bug caused by the inconsistency of index/data array size and vector length. Author: Funes Author: funes Closes #661 from funes/bugfix and squashes the followi

git commit: [HOTFIX] SPARK-1637: There are some Streaming examples added after the PR #571 was last updated.

2014-05-10 Thread pwendell
Repository: spark Updated Branches: refs/heads/branch-1.0 51e277557 -> ade47562b [HOTFIX] SPARK-1637: There are some Streaming examples added after the PR #571 was last updated. This resulted in Compilation Errors. cc @mateiz project not compiling currently. Author: Sandeep Closes #673 fro

git commit: [SQL] Improve SparkSQL Aggregates

2014-05-10 Thread rxin
Repository: spark Updated Branches: refs/heads/master 6ed7e2cd0 -> 19c8fb02b [SQL] Improve SparkSQL Aggregates * Add native min/max (was using hive before). * Handle nulls correctly in Avg and Sum. Author: Michael Armbrust Closes #683 from marmbrus/aggFixes and squashes the following commit

git commit: MLlib documentation fix

2014-05-10 Thread pwendell
Repository: spark Updated Branches: refs/heads/master 322b1808d -> d38febee4 MLlib documentation fix Fixed the documentation for that `loadLibSVMData` is changed to `loadLibSVMFile`. Author: DB Tsai Closes #703 from dbtsai/dbtsai-docfix and squashes the following commits: 71dd508 [DB Tsai

git commit: Add Python includes to path before depickling broadcast values

2014-05-10 Thread pwendell
Repository: spark Updated Branches: refs/heads/branch-1.0 71ad53f81 -> 2a669a70d Add Python includes to path before depickling broadcast values This fixes https://issues.apache.org/jira/browse/SPARK-1731 by adding the Python includes to the PYTHONPATH before depickling the broadcast values @

git commit: [SQL] Upgrade parquet library.

2014-05-10 Thread pwendell
Repository: spark Updated Branches: refs/heads/master 561510867 -> 4d6055329 [SQL] Upgrade parquet library. I think we are hitting this issue in some perf tests: https://github.com/Parquet/parquet-mr/commit/6aed5288fd4a1398063a5a219b2ae4a9f71b02cf Credit to @aarondav ! Author: Michael Armbr

git commit: SPARK-1775: Unneeded lock in ShuffleMapTask.deserializeInfo

2014-05-10 Thread matei
Repository: spark Updated Branches: refs/heads/branch-0.9 9e2c59efe -> bea2be308 SPARK-1775: Unneeded lock in ShuffleMapTask.deserializeInfo This was used in the past to have a cache of deserialized ShuffleMapTasks, but that's been removed, so there's no need for a lock. It slows down Spark w

git commit: SPARK-1686: keep schedule() calling in the main thread

2014-05-10 Thread adav
Repository: spark Updated Branches: refs/heads/branch-1.0 8202276c9 -> adf8cdd0b SPARK-1686: keep schedule() calling in the main thread https://issues.apache.org/jira/browse/SPARK-1686 moved from original JIRA (by @markhamstra): In deploy.master.Master, the completeRecovery method is the las

git commit: Add Python includes to path before depickling broadcast values

2014-05-10 Thread pwendell
Repository: spark Updated Branches: refs/heads/master c05d11bb3 -> 3776f2f28 Add Python includes to path before depickling broadcast values This fixes https://issues.apache.org/jira/browse/SPARK-1731 by adding the Python includes to the PYTHONPATH before depickling the broadcast values @airh

git commit: SPARK-1686: keep schedule() calling in the main thread

2014-05-10 Thread adav
Repository: spark Updated Branches: refs/heads/master 59577df14 -> 2f452cbaf SPARK-1686: keep schedule() calling in the main thread https://issues.apache.org/jira/browse/SPARK-1686 moved from original JIRA (by @markhamstra): In deploy.master.Master, the completeRecovery method is the last th

git commit: [SPARK-1778] [SQL] Add 'limit' transformation to SchemaRDD.

2014-05-10 Thread pwendell
Repository: spark Updated Branches: refs/heads/branch-1.0 a61b71cad -> 7486474d6 [SPARK-1778] [SQL] Add 'limit' transformation to SchemaRDD. Add `limit` transformation to `SchemaRDD`. Author: Takuya UESHIN Closes #711 from ueshin/issues/SPARK-1778 and squashes the following commits: 33169d

git commit: [SPARK-1690] Tolerating empty elements when saving Python RDD to text files

2014-05-10 Thread pwendell
Repository: spark Updated Branches: refs/heads/branch-1.0 2a669a70d -> ac86af8ac [SPARK-1690] Tolerating empty elements when saving Python RDD to text files Tolerate empty strings in PythonRDD Author: Kan Zhang Closes #644 from kanzhang/SPARK-1690 and squashes the following commits: c62ad3

git commit: [SPARK-1690] Tolerating empty elements when saving Python RDD to text files

2014-05-10 Thread pwendell
Repository: spark Updated Branches: refs/heads/master 3776f2f28 -> 6c2691d0a [SPARK-1690] Tolerating empty elements when saving Python RDD to text files Tolerate empty strings in PythonRDD Author: Kan Zhang Closes #644 from kanzhang/SPARK-1690 and squashes the following commits: c62ad33 [K

git commit: SPARK-1708. Add a ClassTag on Serializer and things that depend on it

2014-05-10 Thread pwendell
Repository: spark Updated Branches: refs/heads/master 8e94d2721 -> 7eefc9d2b SPARK-1708. Add a ClassTag on Serializer and things that depend on it This pull request contains a rebased patch from @heathermiller (https://github.com/heathermiller/spark/pull/1) to add ClassTags on Serializer and

git commit: fix broken in link in python docs

2014-05-10 Thread pwendell
Repository: spark Updated Branches: refs/heads/branch-1.0 9fbb22c20 -> 71ad53f81 fix broken in link in python docs Author: Andy Konwinski Closes #650 from andyk/python-docs-link-fix and squashes the following commits: a1f9d51 [Andy Konwinski] fix broken in link in python docs (cherry picked

git commit: SPARK-1789. Multiple versions of Netty dependencies cause FlumeStreamSuite failure

2014-05-10 Thread pwendell
Repository: spark Updated Branches: refs/heads/master 905173df5 -> 2b7bd29eb SPARK-1789. Multiple versions of Netty dependencies cause FlumeStreamSuite failure TL;DR is there is a bit of JAR hell trouble with Netty, that can be mostly resolved and will resolve a test failure. I hit the erro

git commit: SPARK-1789. Multiple versions of Netty dependencies cause FlumeStreamSuite failure

2014-05-10 Thread pwendell
Repository: spark Updated Branches: refs/heads/branch-1.0 4e9a0cb4a -> c7253daeb SPARK-1789. Multiple versions of Netty dependencies cause FlumeStreamSuite failure TL;DR is there is a bit of JAR hell trouble with Netty, that can be mostly resolved and will resolve a test failure. I hit the

git commit: [SPARK-1774] Respect SparkSubmit --jars on YARN (client)

2014-05-10 Thread pwendell
Repository: spark Updated Branches: refs/heads/master 2b7bd29eb -> 83e0424d8 [SPARK-1774] Respect SparkSubmit --jars on YARN (client) SparkSubmit ignores `--jars` for YARN client. This is a bug. This PR also automatically adds the application jar to `spark.jar`. Previously, when running as y

git commit: [SPARK-1774] Respect SparkSubmit --jars on YARN (client)

2014-05-10 Thread pwendell
Repository: spark Updated Branches: refs/heads/branch-1.0 c7253daeb -> 012f90427 [SPARK-1774] Respect SparkSubmit --jars on YARN (client) SparkSubmit ignores `--jars` for YARN client. This is a bug. This PR also automatically adds the application jar to `spark.jar`. Previously, when running

git commit: Enabled incremental build that comes with sbt 0.13.2

2014-05-10 Thread pwendell
Repository: spark Updated Branches: refs/heads/master 83e0424d8 -> 70bcdef48 Enabled incremental build that comes with sbt 0.13.2 More info at. https://github.com/sbt/sbt/issues/1010 Author: Prashant Sharma Closes #525 from ScrapCodes/sbt-inc-opt and squashes the following commits: ba8fa42

git commit: Enabled incremental build that comes with sbt 0.13.2

2014-05-10 Thread pwendell
Repository: spark Updated Branches: refs/heads/branch-1.0 012f90427 -> 71ce7eb0e Enabled incremental build that comes with sbt 0.13.2 More info at. https://github.com/sbt/sbt/issues/1010 Author: Prashant Sharma Closes #525 from ScrapCodes/sbt-inc-opt and squashes the following commits: ba8

git commit: Revert "Enabled incremental build that comes with sbt 0.13.2"

2014-05-10 Thread pwendell
Repository: spark Updated Branches: refs/heads/branch-1.0 71ce7eb0e -> 758e5439f Revert "Enabled incremental build that comes with sbt 0.13.2" This reverts commit 71ce7eb0e5878f0bafd64bdd201ae257a3bfe106. I meant only to merge this into master. It's an experimental build feature. Project: h

git commit: [SPARK-1778] [SQL] Add 'limit' transformation to SchemaRDD.

2014-05-10 Thread pwendell
Repository: spark Updated Branches: refs/heads/master 4d6055329 -> 8e94d2721 [SPARK-1778] [SQL] Add 'limit' transformation to SchemaRDD. Add `limit` transformation to `SchemaRDD`. Author: Takuya UESHIN Closes #711 from ueshin/issues/SPARK-1778 and squashes the following commits: 33169df [T

git commit: SPARK-1708. Add a ClassTag on Serializer and things that depend on it

2014-05-10 Thread pwendell
Repository: spark Updated Branches: refs/heads/branch-1.0 7486474d6 -> 9fbb22c20 SPARK-1708. Add a ClassTag on Serializer and things that depend on it This pull request contains a rebased patch from @heathermiller (https://github.com/heathermiller/spark/pull/1) to add ClassTags on Serializer