spark git commit: Closes #12407 Closes #12408 Closes #12401

2016-04-14 Thread rxin
Repository: spark Updated Branches: refs/heads/master b9613239d -> a9324a06e Closes #12407 Closes #12408 Closes #12401 Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/a9324a06 Tree: http://git-wip-us.apache.org/repos/asf/s

spark git commit: [SPARK-14374][ML][PYSPARK] PySpark ml GBTClassifier, Regressor support export/import

2016-04-14 Thread meng
Repository: spark Updated Branches: refs/heads/master 297ba3f1b -> b9613239d [SPARK-14374][ML][PYSPARK] PySpark ml GBTClassifier, Regressor support export/import ## What changes were proposed in this pull request? PySpark ml GBTClassifier, Regressor support export/import. ## How was this pat

spark git commit: [SPARK-14275][SQL] Reimplement TypedAggregateExpression to DeclarativeAggregate

2016-04-14 Thread wenchen
Repository: spark Updated Branches: refs/heads/master b5c60bcdc -> 297ba3f1b [SPARK-14275][SQL] Reimplement TypedAggregateExpression to DeclarativeAggregate ## What changes were proposed in this pull request? `ExpressionEncoder` is just a container for serialization and deserialization expre

spark git commit: [SPARK-14447][SQL] Speed up TungstenAggregate w/ keys using VectorizedHashMap

2016-04-14 Thread yhuai
Repository: spark Updated Branches: refs/heads/master ff9ae61a3 -> b5c60bcdc [SPARK-14447][SQL] Speed up TungstenAggregate w/ keys using VectorizedHashMap ## What changes were proposed in this pull request? This patch speeds up group-by aggregates by around 3-5x by leveraging an in-memory `A

spark git commit: [SPARK-14601][DOC] Minor doc/usage changes related to removal of Spark assembly

2016-04-14 Thread rxin
Repository: spark Updated Branches: refs/heads/master c80586d9e -> ff9ae61a3 [SPARK-14601][DOC] Minor doc/usage changes related to removal of Spark assembly ## What changes were proposed in this pull request? Removing references to assembly jar in documentation. Adding an additional (previous

spark git commit: [SPARK-12869] Implemented an improved version of the toIndexedRowMatrix

2016-04-14 Thread meng
Repository: spark Updated Branches: refs/heads/master 01dd1f5c0 -> c80586d9e [SPARK-12869] Implemented an improved version of the toIndexedRowMatrix Hi guys, I've implemented an improved version of the `toIndexedRowMatrix` function on the `BlockMatrix`. I needed this for a project, but would

spark git commit: [SPARK-14565][ML] RandomForest should use parseInt and parseDouble for feature subset size instead of regexes

2016-04-14 Thread meng
Repository: spark Updated Branches: refs/heads/master d7e124edf -> 01dd1f5c0 [SPARK-14565][ML] RandomForest should use parseInt and parseDouble for feature subset size instead of regexes ## What changes were proposed in this pull request? This fix tries to change RandomForest's supported str

spark git commit: [SPARK-14545][SQL] Improve `LikeSimplification` by adding `a%b` rule

2016-04-14 Thread rxin
Repository: spark Updated Branches: refs/heads/master bc748b7b8 -> d7e124edf [SPARK-14545][SQL] Improve `LikeSimplification` by adding `a%b` rule ## What changes were proposed in this pull request? Current `LikeSimplification` handles the following four rules. - 'a%' => expr.StartsWith("a") -

spark git commit: [SPARK-14238][ML][MLLIB][PYSPARK] Add binary toggle Param to PySpark HashingTF in ML & MLlib

2016-04-14 Thread mlnick
Repository: spark Updated Branches: refs/heads/master bf65c87f7 -> bc748b7b8 [SPARK-14238][ML][MLLIB][PYSPARK] Add binary toggle Param to PySpark HashingTF in ML & MLlib ## What changes were proposed in this pull request? This fix tries to add binary toggle Param to PySpark HashingTF in ML &

spark git commit: [SPARK-14618][ML][DOC] Updated RegressionEvaluator.metricName param doc

2016-04-14 Thread jkbradley
Repository: spark Updated Branches: refs/heads/branch-1.5 cb7a90ad5 -> 6043fa8df [SPARK-14618][ML][DOC] Updated RegressionEvaluator.metricName param doc ## What changes were proposed in this pull request? In Spark 1.4, we negated some metrics from RegressionEvaluator since CrossValidator alw

spark git commit: [SPARK-14618][ML][DOC] Updated RegressionEvaluator.metricName param doc

2016-04-14 Thread jkbradley
Repository: spark Updated Branches: refs/heads/branch-1.6 413d0600e -> 93c9a63ea [SPARK-14618][ML][DOC] Updated RegressionEvaluator.metricName param doc ## What changes were proposed in this pull request? In Spark 1.4, we negated some metrics from RegressionEvaluator since CrossValidator alw

spark git commit: [SPARK-14618][ML][DOC] Updated RegressionEvaluator.metricName param doc

2016-04-14 Thread jkbradley
Repository: spark Updated Branches: refs/heads/master c5172f820 -> bf65c87f7 [SPARK-14618][ML][DOC] Updated RegressionEvaluator.metricName param doc ## What changes were proposed in this pull request? In Spark 1.4, we negated some metrics from RegressionEvaluator since CrossValidator always

spark git commit: [SPARK-13967][PYSPARK][ML] Added binary Param to Python CountVectorizer

2016-04-14 Thread mlnick
Repository: spark Updated Branches: refs/heads/master 28efdd3fd -> c5172f820 [SPARK-13967][PYSPARK][ML] Added binary Param to Python CountVectorizer Added binary toggle param to CountVectorizer feature transformer in PySpark. Created a unit test for using CountVectorizer with the binary toggl

spark git commit: [SPARK-14592][SQL] Native support for CREATE TABLE LIKE DDL command

2016-04-14 Thread andrewor14
Repository: spark Updated Branches: refs/heads/master c971aee40 -> 28efdd3fd [SPARK-14592][SQL] Native support for CREATE TABLE LIKE DDL command ## What changes were proposed in this pull request? JIRA: https://issues.apache.org/jira/browse/SPARK-14592 This patch adds native support for DDL c

spark git commit: [SPARK-14499][SQL][TEST] Drop Partition Does Not Delete Data of External Tables

2016-04-14 Thread andrewor14
Repository: spark Updated Branches: refs/heads/master 1d04c86fc -> c971aee40 [SPARK-14499][SQL][TEST] Drop Partition Does Not Delete Data of External Tables What changes were proposed in this pull request? This PR is to add a test to ensure drop partitions of an external table will not d

spark git commit: [SPARK-14558][CORE] In ClosureCleaner, clean the outer pointer if it's a REPL line object

2016-04-14 Thread andrewor14
Repository: spark Updated Branches: refs/heads/master a46f98d3f -> 1d04c86fc [SPARK-14558][CORE] In ClosureCleaner, clean the outer pointer if it's a REPL line object ## What changes were proposed in this pull request? When we clean a closure, if its outermost parent is not a closure, we won

spark git commit: [SPARK-14617] Remove deprecated APIs in TaskMetrics

2016-04-14 Thread andrewor14
Repository: spark Updated Branches: refs/heads/master dac40b68d -> a46f98d3f [SPARK-14617] Remove deprecated APIs in TaskMetrics ## What changes were proposed in this pull request? This patch removes some of the deprecated APIs in TaskMetrics. This is part of my bigger effort to simplify accu

spark git commit: [SPARK-14619] Track internal accumulators (metrics) by stage attempt

2016-04-14 Thread andrewor14
Repository: spark Updated Branches: refs/heads/master 9fa43a33b -> dac40b68d [SPARK-14619] Track internal accumulators (metrics) by stage attempt ## What changes were proposed in this pull request? When there are multiple attempts for a stage, we currently only reset internal accumulator valu

spark git commit: [SPARK-14612][ML] Consolidate the version of dependencies in mllib and mllib-local into one place

2016-04-14 Thread dbtsai
Repository: spark Updated Branches: refs/heads/master 3e27940a1 -> 9fa43a33b [SPARK-14612][ML] Consolidate the version of dependencies in mllib and mllib-local into one place ## What changes were proposed in this pull request? Move json4s, breeze dependency declaration into parent ## How wa

spark git commit: [SPARK-14630][BUILD][CORE][SQL][STREAMING] Code style: public abstract methods should have explicit return types

2016-04-14 Thread rxin
Repository: spark Updated Branches: refs/heads/master de2ad5285 -> 3e27940a1 [SPARK-14630][BUILD][CORE][SQL][STREAMING] Code style: public abstract methods should have explicit return types ## What changes were proposed in this pull request? Currently many public abstract methods (in abstrac

spark git commit: [SPARK-14625] TaskUIData and ExecutorUIData shouldn't be case classes

2016-04-14 Thread rxin
Repository: spark Updated Branches: refs/heads/master 0d22092cd -> de2ad5285 [SPARK-14625] TaskUIData and ExecutorUIData shouldn't be case classes ## What changes were proposed in this pull request? I was trying to understand the accumulator and metrics update source code and these two classe

spark git commit: [SPARK-14125][SQL] Native DDL Support: Alter View

2016-04-14 Thread yhuai
Repository: spark Updated Branches: refs/heads/master f83ba454a -> 0d22092cd [SPARK-14125][SQL] Native DDL Support: Alter View What changes were proposed in this pull request? This PR is to provide a native DDL support for the following three Alter View commands: Based on the Hive DDL d

spark git commit: [SPARK-14572][DOC] Update config docs to allow -Xms in extraJavaOptions

2016-04-14 Thread tgraves
Repository: spark Updated Branches: refs/heads/master 3cf3db17b -> f83ba454a [SPARK-14572][DOC] Update config docs to allow -Xms in extraJavaOptions ## What changes were proposed in this pull request? The configuration docs are updated to reflect the changes introduced with [SPARK-12384](http

spark git commit: [SPARK-14518][SQL] Support Comment in CREATE VIEW

2016-04-14 Thread yhuai
Repository: spark Updated Branches: refs/heads/master 6fc3dc883 -> 3cf3db17b [SPARK-14518][SQL] Support Comment in CREATE VIEW What changes were proposed in this pull request? **HQL Syntax**: [Create View](https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManual

spark git commit: [MINOR][SQL] Remove extra anonymous closure within functional transformations

2016-04-14 Thread srowen
Repository: spark Updated Branches: refs/heads/master 478af2f45 -> 6fc3dc883 [MINOR][SQL] Remove extra anonymous closure within functional transformations ## What changes were proposed in this pull request? This PR removes extra anonymous closure within functional transformations. For exampl

spark git commit: [SPARK-14573][PYSPARK][BUILD] Fix PyDoc Makefile & highlighting issues

2016-04-14 Thread srowen
Repository: spark Updated Branches: refs/heads/master b4819404a -> 478af2f45 [SPARK-14573][PYSPARK][BUILD] Fix PyDoc Makefile & highlighting issues ## What changes were proposed in this pull request? The PyDoc Makefile used "=" rather than "?=" for setting env variables so it overwrote the u

spark git commit: [SPARK-14596][SQL] Remove not used SqlNewHadoopRDD and some more unused imports

2016-04-14 Thread wenchen
Repository: spark Updated Branches: refs/heads/master 62b7f306f -> b4819404a [SPARK-14596][SQL] Remove not used SqlNewHadoopRDD and some more unused imports ## What changes were proposed in this pull request? Old `HadoopFsRelation` API includes `buildInternalScan()` which uses `SqlNewHadoopR