spark git commit: minor doc fix for Row.scala

2016-10-12 Thread rxin
Repository: spark Updated Branches: refs/heads/master 064d6650e -> 7222a25a1 minor doc fix for Row.scala ## What changes were proposed in this pull request? minor doc fix for "getAnyValAs" in class Row ## How was this patch tested? None. (If this patch involves UI changes, please attach a

spark git commit: minor doc fix for Row.scala

2016-10-12 Thread rxin
Repository: spark Updated Branches: refs/heads/branch-2.0 ab00e410c -> d38f38a09 minor doc fix for Row.scala ## What changes were proposed in this pull request? minor doc fix for "getAnyValAs" in class Row ## How was this patch tested? None. (If this patch involves UI changes, please

spark git commit: [SPARK-17866][SPARK-17867][SQL] Fix Dataset.dropduplicates

2016-10-12 Thread wenchen
Repository: spark Updated Branches: refs/heads/master edeb51a39 -> 064d6650e [SPARK-17866][SPARK-17867][SQL] Fix Dataset.dropduplicates ## What changes were proposed in this pull request? Two issues regarding Dataset.dropduplicates: 1. Dataset.dropDuplicates should consider the columns with

spark git commit: [SPARK-17876] Write StructuredStreaming WAL to a stream instead of materializing all at once

2016-10-12 Thread zsxwing
Repository: spark Updated Branches: refs/heads/branch-2.0 5903dabc5 -> ab00e410c [SPARK-17876] Write StructuredStreaming WAL to a stream instead of materializing all at once ## What changes were proposed in this pull request? The CompactibleFileStreamLog materializes the whole metadata log

spark git commit: [SPARK-17876] Write StructuredStreaming WAL to a stream instead of materializing all at once

2016-10-12 Thread zsxwing
Repository: spark Updated Branches: refs/heads/master 21cb59f1c -> edeb51a39 [SPARK-17876] Write StructuredStreaming WAL to a stream instead of materializing all at once ## What changes were proposed in this pull request? The CompactibleFileStreamLog materializes the whole metadata log in

spark git commit: [SPARK-16827][BRANCH-2.0] Avoid reporting spill metrics as shuffle metrics

2016-10-12 Thread rxin
Repository: spark Updated Branches: refs/heads/branch-2.0 050b8177e -> 5903dabc5 [SPARK-16827][BRANCH-2.0] Avoid reporting spill metrics as shuffle metrics ## What changes were proposed in this pull request? Fix a bug where spill metrics were being reported as shuffle metrics. Eventually

spark git commit: [SPARK-17835][ML][MLLIB] Optimize NaiveBayes mllib wrapper to eliminate extra pass on data

2016-10-12 Thread yliang
Repository: spark Updated Branches: refs/heads/master 0d4a69527 -> 21cb59f1c [SPARK-17835][ML][MLLIB] Optimize NaiveBayes mllib wrapper to eliminate extra pass on data ## What changes were proposed in this pull request? [SPARK-14077](https://issues.apache.org/jira/browse/SPARK-14077) copied

spark git commit: [SPARK-17745][ML][PYSPARK] update NB python api - add weight col parameter

2016-10-12 Thread yliang
Repository: spark Updated Branches: refs/heads/master 6f20a92ca -> 0d4a69527 [SPARK-17745][ML][PYSPARK] update NB python api - add weight col parameter ## What changes were proposed in this pull request? update python api for NaiveBayes: add weight col parameter. ## How was this patch

spark git commit: [SPARK-17845] [SQL] More self-evident window function frame boundary API

2016-10-12 Thread davies
Repository: spark Updated Branches: refs/heads/master f9a56a153 -> 6f20a92ca [SPARK-17845] [SQL] More self-evident window function frame boundary API ## What changes were proposed in this pull request? This patch improves the window function frame boundary API to make it more obvious to read

spark git commit: [SPARK-17782][STREAMING][KAFKA] alternative eliminate race condition of poll twice

2016-10-12 Thread zsxwing
Repository: spark Updated Branches: refs/heads/branch-2.0 d55ba3063 -> 050b8177e [SPARK-17782][STREAMING][KAFKA] alternative eliminate race condition of poll twice ## What changes were proposed in this pull request? Alternative approach to https://github.com/apache/spark/pull/15387 Author:

spark git commit: [SPARK-17782][STREAMING][KAFKA] alternative eliminate race condition of poll twice

2016-10-12 Thread zsxwing
Repository: spark Updated Branches: refs/heads/master 9ce7d3e54 -> f9a56a153 [SPARK-17782][STREAMING][KAFKA] alternative eliminate race condition of poll twice ## What changes were proposed in this pull request? Alternative approach to https://github.com/apache/spark/pull/15387 Author:

spark git commit: [SPARK-17675][CORE] Expand Blacklist for TaskSets

2016-10-12 Thread irashid
Repository: spark Updated Branches: refs/heads/master 47776e7c0 -> 9ce7d3e54 [SPARK-17675][CORE] Expand Blacklist for TaskSets ## What changes were proposed in this pull request? This is a step along the way to SPARK-8425. To enable incremental review, the first step proposed here is to

spark git commit: [SPARK-17850][CORE] Add a flag to ignore corrupt files

2016-10-12 Thread mridulm80
Repository: spark Updated Branches: refs/heads/master eb69335cd -> 47776e7c0 [SPARK-17850][CORE] Add a flag to ignore corrupt files ## What changes were proposed in this pull request? Add a flag to ignore corrupt files. For Spark core, the configuration is `spark.files.ignoreCorruptFiles`.

spark git commit: [BUILD] Closing stale PRs

2016-10-12 Thread vanzin
Repository: spark Updated Branches: refs/heads/master f8062b63f -> eb69335cd [BUILD] Closing stale PRs Closes #15303 Closes #15078 Closes #15080 Closes #15135 Closes #14565 Closes #12355 Closes #15404 Author: Sean Owen Closes #15451 from srowen/CloseStalePRs. Project:

spark git commit: [SPARK-17840][DOCS] Add some pointers for wiki/CONTRIBUTING.md in README.md and some warnings in PULL_REQUEST_TEMPLATE

2016-10-12 Thread rxin
Repository: spark Updated Branches: refs/heads/master 5cc503f4f -> f8062b63f [SPARK-17840][DOCS] Add some pointers for wiki/CONTRIBUTING.md in README.md and some warnings in PULL_REQUEST_TEMPLATE ## What changes were proposed in this pull request? Link to contributing wiki in PR template,

spark git commit: [SPARK-17790][SPARKR] Support for parallelizing R data.frame larger than 2GB

2016-10-12 Thread felixcheung
Repository: spark Updated Branches: refs/heads/branch-2.0 5451541d1 -> d55ba3063 [SPARK-17790][SPARKR] Support for parallelizing R data.frame larger than 2GB ## What changes were proposed in this pull request? If the R data structure that is being parallelized is larger than `INT_MAX` we use

spark git commit: [SPARK-17790][SPARKR] Support for parallelizing R data.frame larger than 2GB

2016-10-12 Thread felixcheung
Repository: spark Updated Branches: refs/heads/master d5580ebaa -> 5cc503f4f [SPARK-17790][SPARKR] Support for parallelizing R data.frame larger than 2GB ## What changes were proposed in this pull request? If the R data structure that is being parallelized is larger than `INT_MAX` we use

spark git commit: [SPARK-17884][SQL] To resolve Null pointer exception when casting from empty string to interval type.

2016-10-12 Thread rxin
Repository: spark Updated Branches: refs/heads/branch-2.0 4dcbde48d -> 5451541d1 [SPARK-17884][SQL] To resolve Null pointer exception when casting from empty string to interval type. ## What changes were proposed in this pull request? This change adds a check in castToInterval method of Cast

spark git commit: [SPARK-17884][SQL] To resolve Null pointer exception when casting from empty string to interval type.

2016-10-12 Thread rxin
Repository: spark Updated Branches: refs/heads/master 8880fd13e -> d5580ebaa [SPARK-17884][SQL] To resolve Null pointer exception when casting from empty string to interval type. ## What changes were proposed in this pull request? This change adds a check in castToInterval method of Cast

spark git commit: [SPARK-14761][SQL] Reject invalid join methods when join columns are not specified in PySpark DataFrame join.

2016-10-12 Thread rxin
Repository: spark Updated Branches: refs/heads/master 8d33e1e5b -> 8880fd13e [SPARK-14761][SQL] Reject invalid join methods when join columns are not specified in PySpark DataFrame join. ## What changes were proposed in this pull request? In PySpark, the invalid join type will not throw

spark git commit: [SPARK-17808][PYSPARK] Upgraded version of Pyrolite to 4.13

2016-10-12 Thread srowen
Repository: spark Updated Branches: refs/heads/branch-2.0 f12b74c02 -> 4dcbde48d [SPARK-17808][PYSPARK] Upgraded version of Pyrolite to 4.13 ## What changes were proposed in this pull request? Upgraded to a newer version of Pyrolite which supports serialization of a BinaryType StructField

spark git commit: [SPARK-17853][STREAMING][KAFKA][DOC] make it clear that reusing group.id is bad

2016-10-12 Thread rxin
Repository: spark Updated Branches: refs/heads/branch-2.0 f3d82b53c -> f12b74c02 [SPARK-17853][STREAMING][KAFKA][DOC] make it clear that reusing group.id is bad ## What changes were proposed in this pull request? Documentation fix to make it clear that reusing group id for different streams

spark git commit: [SPARK-17853][STREAMING][KAFKA][DOC] make it clear that reusing group.id is bad

2016-10-12 Thread rxin
Repository: spark Updated Branches: refs/heads/master b512f04f8 -> c264ef9b1 [SPARK-17853][STREAMING][KAFKA][DOC] make it clear that reusing group.id is bad ## What changes were proposed in this pull request? Documentation fix to make it clear that reusing group id for different streams is