[1/3] [SPARK-2179][SQL] Public API for DataTypes and Schema

2014-07-30 Thread marmbrus
Repository: spark Updated Branches: refs/heads/master 4ce92ccaf -> 7003c163d http://git-wip-us.apache.org/repos/asf/spark/blob/7003c163/sql/core/src/test/java/org/apache/spark/sql/api/java/JavaApplySchemaSuite.java -- diff --gi

[2/3] [SPARK-2179][SQL] Public API for DataTypes and Schema

2014-07-30 Thread marmbrus
http://git-wip-us.apache.org/repos/asf/spark/blob/7003c163/sql/core/src/main/java/org/apache/spark/sql/api/java/types/DecimalType.java -- diff --git a/sql/core/src/main/java/org/apache/spark/sql/api/java/types/DecimalType.java b/

[3/3] git commit: [SPARK-2179][SQL] Public API for DataTypes and Schema

2014-07-30 Thread marmbrus
[SPARK-2179][SQL] Public API for DataTypes and Schema The current PR contains the following changes: * Expose `DataType`s in the sql package (internal details are private to sql). * Users can create Rows. * Introduce `applySchema` to create a `SchemaRDD` by applying a `schema: StructType` to an `

git commit: SPARK-2543: Allow user to set maximum Kryo buffer size

2014-07-30 Thread pwendell
Repository: spark Updated Branches: refs/heads/master 7003c163d -> 7c5fc28af SPARK-2543: Allow user to set maximum Kryo buffer size Author: Koert Kuipers Closes #735 from koertkuipers/feat-kryo-max-buffersize and squashes the following commits: 15f6d81 [Koert Kuipers] change default for sp

git commit: SPARK-2748 [MLLIB] [GRAPHX] Loss of precision for small arguments to Math.exp, Math.log

2014-07-30 Thread meng
Repository: spark Updated Branches: refs/heads/master 7c5fc28af -> ee07541e9 SPARK-2748 [MLLIB] [GRAPHX] Loss of precision for small arguments to Math.exp, Math.log In a few places in MLlib, an expression of the form `log(1.0 + p)` is evaluated. When p is so small that `1.0 + p == 1.0`, the

git commit: [SPARK-2521] Broadcast RDD object (instead of sending it along with every task)

2014-07-30 Thread rxin
Repository: spark Updated Branches: refs/heads/master ee07541e9 -> 774142f55 [SPARK-2521] Broadcast RDD object (instead of sending it along with every task) This is a resubmission of #1452. It was reverted because it broke the build. Currently (as of Spark 1.0.1), Spark sends RDD object (whic

git commit: [SPARK-2747] git diff --dirstat can miss sql changes and not run Hive tests

2014-07-30 Thread rxin
Repository: spark Updated Branches: refs/heads/master 774142f55 -> 3bc3f1801 [SPARK-2747] git diff --dirstat can miss sql changes and not run Hive tests dev/run-tests use "git diff --dirstat master" to check whether sql is changed. However, --dirstat won't show sql if sql's change is negligib

git commit: Avoid numerical instability

2014-07-30 Thread meng
Repository: spark Updated Branches: refs/heads/master 3bc3f1801 -> e3d85b7e4 Avoid numerical instability This avoids basically doing 1 - 1, for example: ```python >>> from math import exp >>> margin = -40 >>> 1 - 1 / (1 + exp(margin)) 0.0 >>> exp(margin) / (1 + exp(margin)) 4.248354255291589e

git commit: [SPARK-2544][MLLIB] Improve ALS algorithm resource usage

2014-07-30 Thread meng
Repository: spark Updated Branches: refs/heads/master e3d85b7e4 -> fc47bb696 [SPARK-2544][MLLIB] Improve ALS algorithm resource usage Author: GuoQiang Li Author: witgo Closes #929 from witgo/improve_als and squashes the following commits: ea25033 [GuoQiang Li] checkpoint products 3,6,9 ...

git commit: [SPARK-2746] Set SBT_MAVEN_PROFILES only when it is not set explicitly by the user.

2014-07-30 Thread rxin
Repository: spark Updated Branches: refs/heads/master fc47bb696 -> ff511bacf [SPARK-2746] Set SBT_MAVEN_PROFILES only when it is not set explicitly by the user. Author: Reynold Xin Closes #1655 from rxin/SBT_MAVEN_PROFILES and squashes the following commits: b268c4b [Reynold Xin] [SPARK-27

git commit: Wrap FWDIR in quotes.

2014-07-30 Thread rxin
Repository: spark Updated Branches: refs/heads/master ff511bacf -> f2eb84fe7 Wrap FWDIR in quotes. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/f2eb84fe Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/f2eb84fe D

git commit: Wrap FWDIR in quotes in dev/check-license.

2014-07-30 Thread rxin
Repository: spark Updated Branches: refs/heads/master f2eb84fe7 -> 95cf20393 Wrap FWDIR in quotes in dev/check-license. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/95cf2039 Tree: http://git-wip-us.apache.org/repos/asf/

git commit: More wrapping FWDIR in quotes.

2014-07-30 Thread rxin
Repository: spark Updated Branches: refs/heads/master 95cf20393 -> 0feb349ea More wrapping FWDIR in quotes. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/0feb349e Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/0

git commit: [SQL] Fix compiling of catalyst docs.

2014-07-30 Thread marmbrus
Repository: spark Updated Branches: refs/heads/master 0feb349ea -> 2248891a4 [SQL] Fix compiling of catalyst docs. Author: Michael Armbrust Closes #1653 from marmbrus/fixDocs and squashes the following commits: 0aa1feb [Michael Armbrust] Fix compiling of catalyst docs. Project: http://git

git commit: dev/check-license wrap folders in quotes.

2014-07-30 Thread rxin
Repository: spark Updated Branches: refs/heads/master 2248891a4 -> 437dc8c5b dev/check-license wrap folders in quotes. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/437dc8c5 Tree: http://git-wip-us.apache.org/repos/asf/s

[2/2] git commit: [SPARK-2024] Add saveAsSequenceFile to PySpark

2014-07-30 Thread joshrosen
[SPARK-2024] Add saveAsSequenceFile to PySpark JIRA issue: https://issues.apache.org/jira/browse/SPARK-2024 This PR is a followup to #455 and adds capabilities for saving PySpark RDDs using SequenceFile or any Hadoop OutputFormats. * Added RDD methods ```saveAsSequenceFile```, ```saveAsHadoopFi

[1/2] [SPARK-2024] Add saveAsSequenceFile to PySpark

2014-07-30 Thread joshrosen
Repository: spark Updated Branches: refs/heads/master 437dc8c5b -> 94d1f46fc http://git-wip-us.apache.org/repos/asf/spark/blob/94d1f46f/python/pyspark/tests.py -- diff --git a/python/pyspark/tests.py b/python/pyspark/tests.py i

git commit: Wrap JAR_DL in dev/check-license.

2014-07-30 Thread rxin
Repository: spark Updated Branches: refs/heads/master 94d1f46fc -> 7c7ce5452 Wrap JAR_DL in dev/check-license. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/7c7ce545 Tree: http://git-wip-us.apache.org/repos/asf/spark/tre

git commit: Set AMPLAB_JENKINS_BUILD_PROFILE.

2014-07-30 Thread rxin
Repository: spark Updated Branches: refs/heads/master 7c7ce5452 -> 109732753 Set AMPLAB_JENKINS_BUILD_PROFILE. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/10973275 Tree: http://git-wip-us.apache.org/repos/asf/spark/tre

git commit: Properly pass SBT_MAVEN_PROFILES into sbt.

2014-07-30 Thread rxin
Repository: spark Updated Branches: refs/heads/master 109732753 -> 2f4b17056 Properly pass SBT_MAVEN_PROFILES into sbt. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/2f4b1705 Tree: http://git-wip-us.apache.org/repos/asf/

git commit: SPARK-2749 [BUILD]. Spark SQL Java tests aren't compiling in Jenkins' Maven builds; missing junit:junit dep

2014-07-30 Thread rxin
Repository: spark Updated Branches: refs/heads/master 2f4b17056 -> 6ab96a6fd SPARK-2749 [BUILD]. Spark SQL Java tests aren't compiling in Jenkins' Maven builds; missing junit:junit dep The Maven-based builds in the build matrix have been failing for a few days: https://amplab.cs.berkeley.edu

git commit: SPARK-2741 - Publish version of spark assembly which does not contain Hive

2014-07-30 Thread pwendell
Repository: spark Updated Branches: refs/heads/master 6ab96a6fd -> 2ac37db7a SPARK-2741 - Publish version of spark assembly which does not contain Hive Provide a version of the Spark tarball which does not package Hive. This is meant for HIve + Spark users. Author: Brock Noland Closes #166

git commit: [SPARK-2734][SQL] Remove tables from cache when DROP TABLE is run.

2014-07-30 Thread marmbrus
Repository: spark Updated Branches: refs/heads/master 2ac37db7a -> 88a519db9 [SPARK-2734][SQL] Remove tables from cache when DROP TABLE is run. Author: Michael Armbrust Closes #1650 from marmbrus/dropCached and squashes the following commits: e6ab80b [Michael Armbrust] Support if exists. 83

git commit: SPARK-2341 [MLLIB] loadLibSVMFile doesn't handle regression datasets

2014-07-30 Thread meng
Repository: spark Updated Branches: refs/heads/master 88a519db9 -> e9b275b76 SPARK-2341 [MLLIB] loadLibSVMFile doesn't handle regression datasets Per discussion at https://issues.apache.org/jira/browse/SPARK-2341 , this is a look at deprecating the multiclass parameter. Thoughts welcome of co

git commit: Update DecisionTreeRunner.scala

2014-07-30 Thread meng
Repository: spark Updated Branches: refs/heads/master e9b275b76 -> da5017668 Update DecisionTreeRunner.scala Author: strat0sphere Closes #1676 from strat0sphere/patch-1 and squashes the following commits: 044d2fa [strat0sphere] Update DecisionTreeRunner.scala Project: http://git-wip-us.ap

[2/2] git commit: SPARK-2045 Sort-based shuffle

2014-07-30 Thread rxin
SPARK-2045 Sort-based shuffle This adds a new ShuffleManager based on sorting, as described in https://issues.apache.org/jira/browse/SPARK-2045. The bulk of the code is in an ExternalSorter class that is similar to ExternalAppendOnlyMap, but sorts key-value pairs by partition ID and can be used

[1/2] SPARK-2045 Sort-based shuffle

2014-07-30 Thread rxin
Repository: spark Updated Branches: refs/heads/master da5017668 -> e96628440 http://git-wip-us.apache.org/repos/asf/spark/blob/e9662844/core/src/test/scala/org/apache/spark/ContextCleanerSuite.scala -- diff --git a/core/src/tes

git commit: [SPARK-2758] UnionRDD's UnionPartition should not reference parent RDDs

2014-07-30 Thread rxin
Repository: spark Updated Branches: refs/heads/master e96628440 -> 894d48ffb [SPARK-2758] UnionRDD's UnionPartition should not reference parent RDDs Author: Reynold Xin Closes #1675 from rxin/unionrdd and squashes the following commits: 941d316 [Reynold Xin] Clear RDDs for checkpointing. c9

git commit: Required AM memory is "amMem", not "args.amMemory"

2014-07-30 Thread pwendell
Repository: spark Updated Branches: refs/heads/master 894d48ffb -> 118c1c422 Required AM memory is "amMem", not "args.amMemory" "ERROR yarn.Client: Required AM memory (1024) is above the max threshold (1048) of this cluster" appears if this code is not changed. obviously, 1024 is less than 1

git commit: [SPARK-2340] Resolve event logging and History Server paths properly

2014-07-30 Thread pwendell
Repository: spark Updated Branches: refs/heads/master 118c1c422 -> a7c305b86 [SPARK-2340] Resolve event logging and History Server paths properly We resolve relative paths to the local `file:/` system for `--jars` and `--files` in spark submit (#853). We should do the same for the history ser

git commit: [SPARK-2737] Add retag() method for changing RDDs' ClassTags.

2014-07-30 Thread joshrosen
Repository: spark Updated Branches: refs/heads/master a7c305b86 -> 4fb259353 [SPARK-2737] Add retag() method for changing RDDs' ClassTags. The Java API's use of fake ClassTags doesn't seem to cause any problems for Java users, but it can lead to issues when passing JavaRDDs' underlying RDDs t

git commit: [SPARK-2497] Included checks for module symbols too.

2014-07-30 Thread pwendell
Repository: spark Updated Branches: refs/heads/master 4fb259353 -> 5a110da25 [SPARK-2497] Included checks for module symbols too. Author: Prashant Sharma Closes #1463 from ScrapCodes/SPARK-2497/mima-exclude-all and squashes the following commits: 72077b1 [Prashant Sharma] Check separately

git commit: automatically set master according to `spark.master` in `spark-defaults....

2014-07-30 Thread pwendell
Repository: spark Updated Branches: refs/heads/master 5a110da25 -> 669e3f058 automatically set master according to `spark.master` in `spark-defaults automatically set master according to `spark.master` in `spark-defaults.conf` Author: CrazyJvm Closes #1644 from CrazyJvm/standalone-guide