[1/3] [SPARK-2179][SQL] Public API for DataTypes and Schema

2014-07-30 Thread marmbrus
Repository: spark Updated Branches: refs/heads/master 4ce92ccaf - 7003c163d http://git-wip-us.apache.org/repos/asf/spark/blob/7003c163/sql/core/src/test/java/org/apache/spark/sql/api/java/JavaApplySchemaSuite.java -- diff

[3/3] git commit: [SPARK-2179][SQL] Public API for DataTypes and Schema

2014-07-30 Thread marmbrus
[SPARK-2179][SQL] Public API for DataTypes and Schema The current PR contains the following changes: * Expose `DataType`s in the sql package (internal details are private to sql). * Users can create Rows. * Introduce `applySchema` to create a `SchemaRDD` by applying a `schema: StructType` to an

git commit: SPARK-2543: Allow user to set maximum Kryo buffer size

2014-07-30 Thread pwendell
Repository: spark Updated Branches: refs/heads/master 7003c163d - 7c5fc28af SPARK-2543: Allow user to set maximum Kryo buffer size Author: Koert Kuipers ko...@tresata.com Closes #735 from koertkuipers/feat-kryo-max-buffersize and squashes the following commits: 15f6d81 [Koert Kuipers]

git commit: SPARK-2748 [MLLIB] [GRAPHX] Loss of precision for small arguments to Math.exp, Math.log

2014-07-30 Thread meng
Repository: spark Updated Branches: refs/heads/master 7c5fc28af - ee07541e9 SPARK-2748 [MLLIB] [GRAPHX] Loss of precision for small arguments to Math.exp, Math.log In a few places in MLlib, an expression of the form `log(1.0 + p)` is evaluated. When p is so small that `1.0 + p == 1.0`, the

git commit: [SPARK-2747] git diff --dirstat can miss sql changes and not run Hive tests

2014-07-30 Thread rxin
Repository: spark Updated Branches: refs/heads/master 774142f55 - 3bc3f1801 [SPARK-2747] git diff --dirstat can miss sql changes and not run Hive tests dev/run-tests use git diff --dirstat master to check whether sql is changed. However, --dirstat won't show sql if sql's change is negligible

git commit: Avoid numerical instability

2014-07-30 Thread meng
Repository: spark Updated Branches: refs/heads/master 3bc3f1801 - e3d85b7e4 Avoid numerical instability This avoids basically doing 1 - 1, for example: ```python from math import exp margin = -40 1 - 1 / (1 + exp(margin)) 0.0 exp(margin) / (1 + exp(margin)) 4.248354255291589e-18 ```

git commit: [SPARK-2544][MLLIB] Improve ALS algorithm resource usage

2014-07-30 Thread meng
Repository: spark Updated Branches: refs/heads/master e3d85b7e4 - fc47bb696 [SPARK-2544][MLLIB] Improve ALS algorithm resource usage Author: GuoQiang Li wi...@qq.com Author: witgo wi...@qq.com Closes #929 from witgo/improve_als and squashes the following commits: ea25033 [GuoQiang Li]

git commit: [SPARK-2746] Set SBT_MAVEN_PROFILES only when it is not set explicitly by the user.

2014-07-30 Thread rxin
Repository: spark Updated Branches: refs/heads/master fc47bb696 - ff511bacf [SPARK-2746] Set SBT_MAVEN_PROFILES only when it is not set explicitly by the user. Author: Reynold Xin r...@apache.org Closes #1655 from rxin/SBT_MAVEN_PROFILES and squashes the following commits: b268c4b [Reynold

git commit: More wrapping FWDIR in quotes.

2014-07-30 Thread rxin
Repository: spark Updated Branches: refs/heads/master 95cf20393 - 0feb349ea More wrapping FWDIR in quotes. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/0feb349e Tree:

git commit: [SQL] Fix compiling of catalyst docs.

2014-07-30 Thread marmbrus
Repository: spark Updated Branches: refs/heads/master 0feb349ea - 2248891a4 [SQL] Fix compiling of catalyst docs. Author: Michael Armbrust mich...@databricks.com Closes #1653 from marmbrus/fixDocs and squashes the following commits: 0aa1feb [Michael Armbrust] Fix compiling of catalyst docs.

git commit: dev/check-license wrap folders in quotes.

2014-07-30 Thread rxin
Repository: spark Updated Branches: refs/heads/master 2248891a4 - 437dc8c5b dev/check-license wrap folders in quotes. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/437dc8c5 Tree:

[2/2] git commit: [SPARK-2024] Add saveAsSequenceFile to PySpark

2014-07-30 Thread joshrosen
[SPARK-2024] Add saveAsSequenceFile to PySpark JIRA issue: https://issues.apache.org/jira/browse/SPARK-2024 This PR is a followup to #455 and adds capabilities for saving PySpark RDDs using SequenceFile or any Hadoop OutputFormats. * Added RDD methods ```saveAsSequenceFile```,

[1/2] [SPARK-2024] Add saveAsSequenceFile to PySpark

2014-07-30 Thread joshrosen
Repository: spark Updated Branches: refs/heads/master 437dc8c5b - 94d1f46fc http://git-wip-us.apache.org/repos/asf/spark/blob/94d1f46f/python/pyspark/tests.py -- diff --git a/python/pyspark/tests.py b/python/pyspark/tests.py

git commit: Wrap JAR_DL in dev/check-license.

2014-07-30 Thread rxin
Repository: spark Updated Branches: refs/heads/master 94d1f46fc - 7c7ce5452 Wrap JAR_DL in dev/check-license. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/7c7ce545 Tree:

git commit: Properly pass SBT_MAVEN_PROFILES into sbt.

2014-07-30 Thread rxin
Repository: spark Updated Branches: refs/heads/master 109732753 - 2f4b17056 Properly pass SBT_MAVEN_PROFILES into sbt. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/2f4b1705 Tree:

git commit: SPARK-2749 [BUILD]. Spark SQL Java tests aren't compiling in Jenkins' Maven builds; missing junit:junit dep

2014-07-30 Thread rxin
Repository: spark Updated Branches: refs/heads/master 2f4b17056 - 6ab96a6fd SPARK-2749 [BUILD]. Spark SQL Java tests aren't compiling in Jenkins' Maven builds; missing junit:junit dep The Maven-based builds in the build matrix have been failing for a few days:

git commit: SPARK-2741 - Publish version of spark assembly which does not contain Hive

2014-07-30 Thread pwendell
Repository: spark Updated Branches: refs/heads/master 6ab96a6fd - 2ac37db7a SPARK-2741 - Publish version of spark assembly which does not contain Hive Provide a version of the Spark tarball which does not package Hive. This is meant for HIve + Spark users. Author: Brock Noland

git commit: Update DecisionTreeRunner.scala

2014-07-30 Thread meng
Repository: spark Updated Branches: refs/heads/master e9b275b76 - da5017668 Update DecisionTreeRunner.scala Author: strat0sphere stratos.dimopou...@gmail.com Closes #1676 from strat0sphere/patch-1 and squashes the following commits: 044d2fa [strat0sphere] Update DecisionTreeRunner.scala

[1/2] SPARK-2045 Sort-based shuffle

2014-07-30 Thread rxin
Repository: spark Updated Branches: refs/heads/master da5017668 - e96628440 http://git-wip-us.apache.org/repos/asf/spark/blob/e9662844/core/src/test/scala/org/apache/spark/ContextCleanerSuite.scala -- diff --git

git commit: [SPARK-2758] UnionRDD's UnionPartition should not reference parent RDDs

2014-07-30 Thread rxin
Repository: spark Updated Branches: refs/heads/master e96628440 - 894d48ffb [SPARK-2758] UnionRDD's UnionPartition should not reference parent RDDs Author: Reynold Xin r...@apache.org Closes #1675 from rxin/unionrdd and squashes the following commits: 941d316 [Reynold Xin] Clear RDDs for

git commit: Required AM memory is amMem, not args.amMemory

2014-07-30 Thread pwendell
Repository: spark Updated Branches: refs/heads/master 894d48ffb - 118c1c422 Required AM memory is amMem, not args.amMemory ERROR yarn.Client: Required AM memory (1024) is above the max threshold (1048) of this cluster appears if this code is not changed. obviously, 1024 is less than 1048,

git commit: [SPARK-2340] Resolve event logging and History Server paths properly

2014-07-30 Thread pwendell
Repository: spark Updated Branches: refs/heads/master 118c1c422 - a7c305b86 [SPARK-2340] Resolve event logging and History Server paths properly We resolve relative paths to the local `file:/` system for `--jars` and `--files` in spark submit (#853). We should do the same for the history

git commit: [SPARK-2737] Add retag() method for changing RDDs' ClassTags.

2014-07-30 Thread joshrosen
Repository: spark Updated Branches: refs/heads/master a7c305b86 - 4fb259353 [SPARK-2737] Add retag() method for changing RDDs' ClassTags. The Java API's use of fake ClassTags doesn't seem to cause any problems for Java users, but it can lead to issues when passing JavaRDDs' underlying RDDs