spark git commit: [SPARK-12988][SQL] Can't drop top level columns that contain dots

2016-05-31 Thread yhuai
Repository: spark Updated Branches: refs/heads/branch-2.0 7f240eaee -> b8de4ad7d [SPARK-12988][SQL] Can't drop top level columns that contain dots ## What changes were proposed in this pull request? Fixes "Can't drop top level columns that contain dots". This work is based on dilipbiswal's

svn commit: r1748776 [2/2] - in /spark: news/_posts/ site/ site/graphx/ site/mllib/ site/news/ site/releases/ site/screencasts/ site/sql/ site/streaming/

2016-06-16 Thread yhuai
Modified: spark/site/news/spark-tips-from-quantifind.html URL: http://svn.apache.org/viewvc/spark/site/news/spark-tips-from-quantifind.html?rev=1748776=1748775=1748776=diff == ---

svn commit: r1748776 [1/2] - in /spark: news/_posts/ site/ site/graphx/ site/mllib/ site/news/ site/releases/ site/screencasts/ site/sql/ site/streaming/

2016-06-16 Thread yhuai
Author: yhuai Date: Thu Jun 16 22:14:05 2016 New Revision: 1748776 URL: http://svn.apache.org/viewvc?rev=1748776=rev Log: Add a new news for CFP of Spark Summit 2016 EU Added: spark/news/_posts/2016-06-16-submit-talks-to-spark-summit-eu-2016.md spark/site/news/submit-talks-to-spark

spark git commit: [SPARK-16037][SQL] Follow-up: add DataFrameWriter.insertInto() test cases for by position resolution

2016-06-21 Thread yhuai
Repository: spark Updated Branches: refs/heads/master b76e35537 -> f4a3d45e3 [SPARK-16037][SQL] Follow-up: add DataFrameWriter.insertInto() test cases for by position resolution ## What changes were proposed in this pull request? This PR migrates some test cases introduced in #12313 as a

spark git commit: [SPARK-16037][SQL] Follow-up: add DataFrameWriter.insertInto() test cases for by position resolution

2016-06-21 Thread yhuai
Repository: spark Updated Branches: refs/heads/branch-2.0 f805b989b -> 0d7e1d11d [SPARK-16037][SQL] Follow-up: add DataFrameWriter.insertInto() test cases for by position resolution ## What changes were proposed in this pull request? This PR migrates some test cases introduced in #12313 as

spark git commit: [SPARK-16002][SQL] Sleep when no new data arrives to avoid 100% CPU usage

2016-06-21 Thread yhuai
Repository: spark Updated Branches: refs/heads/branch-2.0 0d7e1d11d -> afa14b71b [SPARK-16002][SQL] Sleep when no new data arrives to avoid 100% CPU usage ## What changes were proposed in this pull request? Add a configuration to allow people to set a minimum polling delay when no new data

spark git commit: [SPARK-16002][SQL] Sleep when no new data arrives to avoid 100% CPU usage

2016-06-21 Thread yhuai
Repository: spark Updated Branches: refs/heads/master f4a3d45e3 -> c399c7f0e [SPARK-16002][SQL] Sleep when no new data arrives to avoid 100% CPU usage ## What changes were proposed in this pull request? Add a configuration to allow people to set a minimum polling delay when no new data

spark git commit: [SPARK-16033][SQL] insertInto() can't be used together with partitionBy()

2016-06-17 Thread yhuai
Repository: spark Updated Branches: refs/heads/master ebb9a3b6f -> 10b671447 [SPARK-16033][SQL] insertInto() can't be used together with partitionBy() ## What changes were proposed in this pull request? When inserting into an existing partitioned table, partitioning columns should always be

spark git commit: [SPARK-16036][SPARK-16037][SQL] fix various table insertion problems

2016-06-18 Thread yhuai
Repository: spark Updated Branches: refs/heads/branch-2.0 f159eb521 -> 8d2fc010b [SPARK-16036][SPARK-16037][SQL] fix various table insertion problems ## What changes were proposed in this pull request? The current table insertion has some weird behaviours: 1. inserting into a partitioned

spark git commit: [SPARK-16036][SPARK-16037][SQL] fix various table insertion problems

2016-06-18 Thread yhuai
Repository: spark Updated Branches: refs/heads/master e574c9973 -> 3d010c837 [SPARK-16036][SPARK-16037][SQL] fix various table insertion problems ## What changes were proposed in this pull request? The current table insertion has some weird behaviours: 1. inserting into a partitioned table

spark git commit: [SPARK-16034][SQL] Checks the partition columns when calling dataFrame.write.mode("append").saveAsTable

2016-06-18 Thread yhuai
Repository: spark Updated Branches: refs/heads/branch-2.0 8d2fc010b -> ee6eea644 [SPARK-16034][SQL] Checks the partition columns when calling dataFrame.write.mode("append").saveAsTable ## What changes were proposed in this pull request? `DataFrameWriter` can be used to append data to

spark git commit: [SPARK-16034][SQL] Checks the partition columns when calling dataFrame.write.mode("append").saveAsTable

2016-06-18 Thread yhuai
Repository: spark Updated Branches: refs/heads/master 3d010c837 -> ce3b98bae [SPARK-16034][SQL] Checks the partition columns when calling dataFrame.write.mode("append").saveAsTable ## What changes were proposed in this pull request? `DataFrameWriter` can be used to append data to existing

spark git commit: [SPARK-15443][SQL] Fix 'explain' for streaming Dataset

2016-06-23 Thread yhuai
Repository: spark Updated Branches: refs/heads/branch-2.0 6cb24de99 -> 05677bb5a [SPARK-15443][SQL] Fix 'explain' for streaming Dataset ## What changes were proposed in this pull request? - Fix the `explain` command for streaming Dataset/DataFrame. E.g., ``` == Parsed Logical Plan ==

spark git commit: [SPARK-15443][SQL] Fix 'explain' for streaming Dataset

2016-06-23 Thread yhuai
Repository: spark Updated Branches: refs/heads/master 91b1ef28d -> 0e4bdebec [SPARK-15443][SQL] Fix 'explain' for streaming Dataset ## What changes were proposed in this pull request? - Fix the `explain` command for streaming Dataset/DataFrame. E.g., ``` == Parsed Logical Plan ==

spark git commit: [SPARK-13709][SQL] Initialize deserializer with both table and partition properties when reading partitioned tables

2016-06-24 Thread yhuai
Repository: spark Updated Branches: refs/heads/master cc6778ee0 -> 2d2f607bf [SPARK-13709][SQL] Initialize deserializer with both table and partition properties when reading partitioned tables ## What changes were proposed in this pull request? When reading partitions of a partitioned Hive

spark git commit: [SPARK-13709][SQL] Initialize deserializer with both table and partition properties when reading partitioned tables

2016-06-24 Thread yhuai
Repository: spark Updated Branches: refs/heads/branch-2.0 3d8d95644 -> 3ccdd6b9c [SPARK-13709][SQL] Initialize deserializer with both table and partition properties when reading partitioned tables ## What changes were proposed in this pull request? When reading partitions of a partitioned

spark git commit: [SPARK-12744][SQL] Change parsing JSON integers to timestamps to treat integers as number of seconds

2016-01-11 Thread yhuai
Repository: spark Updated Branches: refs/heads/master 8fe928b4f -> 9559ac5f7 [SPARK-12744][SQL] Change parsing JSON integers to timestamps to treat integers as number of seconds JIRA: https://issues.apache.org/jira/browse/SPARK-12744 This PR makes parsing JSON integers to timestamps

spark git commit: [SPARK-12833][HOT-FIX] Fix scala 2.11 compilation.

2016-01-15 Thread yhuai
ks.com> Closes #10774 from yhuai/fixScala211Compile. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/513266c0 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/513266c0 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff

spark git commit: [SPARK-12692][BUILD][HOT-FIX] Fix the scala style of KinesisBackedBlockRDDSuite.scala.

2016-01-13 Thread yhuai
sue. Author: Yin Huai <yh...@databricks.com> Closes #10742 from yhuai/fixStyle. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/d6fd9b37 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/d6fd9b37 Diff: http:

spark git commit: [SPARK-12833][HOT-FIX] Reset the locale after we set it.

2016-01-15 Thread yhuai
Repository: spark Updated Branches: refs/heads/master 5f843781e -> f6ddbb360 [SPARK-12833][HOT-FIX] Reset the locale after we set it. Author: Yin Huai <yh...@databricks.com> Closes #10778 from yhuai/resetLocale. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: h

spark git commit: [SPARK-12558][FOLLOW-UP] AnalysisException when multiple functions applied in GROUP BY clause

2016-01-18 Thread yhuai
Repository: spark Updated Branches: refs/heads/master 233d6cee9 -> db9a86058 [SPARK-12558][FOLLOW-UP] AnalysisException when multiple functions applied in GROUP BY clause Addresses the comments from Yin. https://github.com/apache/spark/pull/10520 Author: Dilip Biswal

spark git commit: [SPARK-12558][FOLLOW-UP] AnalysisException when multiple functions applied in GROUP BY clause

2016-01-18 Thread yhuai
Repository: spark Updated Branches: refs/heads/branch-1.6 5803fce90 -> 53184ce77 [SPARK-12558][FOLLOW-UP] AnalysisException when multiple functions applied in GROUP BY clause Addresses the comments from Yin. https://github.com/apache/spark/pull/10520 Author: Dilip Biswal

spark git commit: [SPARK-12841][SQL][BRANCH-1.6] fix cast in filter

2016-01-18 Thread yhuai
Repository: spark Updated Branches: refs/heads/branch-1.6 d43704d7f -> 68265ac23 [SPARK-12841][SQL][BRANCH-1.6] fix cast in filter In SPARK-10743 we wrap cast with `UnresolvedAlias` to give `Cast` a better alias if possible. However, for cases like filter, the `UnresolvedAlias` can't be

spark git commit: [SPARK-12841][SQL] fix cast in filter

2016-01-18 Thread yhuai
Repository: spark Updated Branches: refs/heads/master 38c3c0e31 -> 4f11e3f2a [SPARK-12841][SQL] fix cast in filter In SPARK-10743 we wrap cast with `UnresolvedAlias` to give `Cast` a better alias if possible. However, for cases like `filter`, the `UnresolvedAlias` can't be resolved and

spark git commit: [SQL][MINOR] Simplify data source predicate filter translation.

2016-06-24 Thread yhuai
Repository: spark Updated Branches: refs/heads/master d48935400 -> 5f8de2160 [SQL][MINOR] Simplify data source predicate filter translation. ## What changes were proposed in this pull request? This is a small patch to rewrite the predicate filter translation in DataSourceStrategy. The

spark git commit: [SPARK-15706][SQL] Fix Wrong Answer when using IF NOT EXISTS in INSERT OVERWRITE for DYNAMIC PARTITION

2016-06-16 Thread yhuai
Repository: spark Updated Branches: refs/heads/master 5ada60614 -> e5d703bca [SPARK-15706][SQL] Fix Wrong Answer when using IF NOT EXISTS in INSERT OVERWRITE for DYNAMIC PARTITION What changes were proposed in this pull request? `IF NOT EXISTS` in `INSERT OVERWRITE` should not support

spark git commit: [SPARK-15706][SQL] Fix Wrong Answer when using IF NOT EXISTS in INSERT OVERWRITE for DYNAMIC PARTITION

2016-06-16 Thread yhuai
Repository: spark Updated Branches: refs/heads/branch-2.0 3994372f4 -> b82abde06 [SPARK-15706][SQL] Fix Wrong Answer when using IF NOT EXISTS in INSERT OVERWRITE for DYNAMIC PARTITION What changes were proposed in this pull request? `IF NOT EXISTS` in `INSERT OVERWRITE` should not

spark git commit: [SPARK-12476][SQL] Implement JdbcRelation#unhandledFilters for removing unnecessary Spark Filter

2016-02-09 Thread yhuai
Repository: spark Updated Branches: refs/heads/master 9267bc68f -> 6f710f9fd [SPARK-12476][SQL] Implement JdbcRelation#unhandledFilters for removing unnecessary Spark Filter Input: SELECT * FROM jdbcTable WHERE col0 = 'xxx' Current plan: ``` == Optimized Logical Plan == Project

spark git commit: [SPARK-12728][SQL] Integrates SQL generation with native view

2016-01-26 Thread yhuai
Repository: spark Updated Branches: refs/heads/master ce38a35b7 -> 58f5d8c1d [SPARK-12728][SQL] Integrates SQL generation with native view This PR is a follow-up of PR #10541. It integrates the newly introduced SQL generation feature with native view to make native view canonical. In this

spark git commit: [SPARK-10847][SQL][PYSPARK] Pyspark - DataFrame - Optional Metadata with `None` triggers cryptic failure

2016-01-27 Thread yhuai
Repository: spark Updated Branches: refs/heads/master 41f0c85f9 -> edd473751 [SPARK-10847][SQL][PYSPARK] Pyspark - DataFrame - Optional Metadata with `None` triggers cryptic failure The error message is now changed from "Do not support type class scala.Tuple2." to "Do not support type class

spark git commit: [SPARK-10847][SQL][PYSPARK] Pyspark - DataFrame - Optional Metadata with `None` triggers cryptic failure

2016-01-27 Thread yhuai
Repository: spark Updated Branches: refs/heads/branch-1.5 ae6fcc6bc -> 49dc8e7d3 [SPARK-10847][SQL][PYSPARK] Pyspark - DataFrame - Optional Metadata with `None` triggers cryptic failure The error message is now changed from "Do not support type class scala.Tuple2." to "Do not support type

spark git commit: [SPARK-10847][SQL][PYSPARK] Pyspark - DataFrame - Optional Metadata with `None` triggers cryptic failure

2016-01-27 Thread yhuai
Repository: spark Updated Branches: refs/heads/branch-1.6 17d1071ce -> 96e32db5c [SPARK-10847][SQL][PYSPARK] Pyspark - DataFrame - Optional Metadata with `None` triggers cryptic failure The error message is now changed from "Do not support type class scala.Tuple2." to "Do not support type

spark git commit: [SPARK-13020][SQL][TEST] fix random generator for map type

2016-02-02 Thread yhuai
Repository: spark Updated Branches: refs/heads/master 6de6a9772 -> 672032d0a [SPARK-13020][SQL][TEST] fix random generator for map type when we generate map, we first randomly pick a length, then create a seq of key value pair with the expected length, and finally call `toMap`. However,

spark git commit: [SPARK-13087][SQL] Fix group by function for sort based aggregation

2016-02-02 Thread yhuai
Repository: spark Updated Branches: refs/heads/branch-1.6 70fcbf68e -> bd8efba8f [SPARK-13087][SQL] Fix group by function for sort based aggregation It is not valid to call `toAttribute` on a `NamedExpression` unless we know for sure that the child produced that `NamedExpression`. The

spark git commit: [SPARK-13087][SQL] Fix group by function for sort based aggregation

2016-02-02 Thread yhuai
Repository: spark Updated Branches: refs/heads/master b8666fd0e -> 22ba21348 [SPARK-13087][SQL] Fix group by function for sort based aggregation It is not valid to call `toAttribute` on a `NamedExpression` unless we know for sure that the child produced that `NamedExpression`. The current

spark git commit: [SPARK-13021][CORE] Fail fast when custom RDDs violate RDD.partition's API contract

2016-01-27 Thread yhuai
Repository: spark Updated Branches: refs/heads/master 87abcf7df -> 32f741115 [SPARK-13021][CORE] Fail fast when custom RDDs violate RDD.partition's API contract Spark's `Partition` and `RDD.partitions` APIs have a contract which requires custom implementations of `RDD.partitions` to ensure

spark git commit: [HOTFIX] Fix Scala 2.11 compilation

2016-01-27 Thread yhuai
Repository: spark Updated Branches: refs/heads/master ef96cd3c5 -> d702f0c17 [HOTFIX] Fix Scala 2.11 compilation by explicitly marking annotated parameters as vals (SI-8813). Caused by #10835. Author: Andrew Or Closes #10955 from andrewor14/fix-scala211. Project:

spark git commit: [SPARK-13475][TESTS][SQL] HiveCompatibilitySuite should still run in PR builder even if a PR only changes sql/core

2016-02-24 Thread yhuai
run in PR build even if a PR only changes sql/core. So, I am going to remove `ExtendedHiveTest` annotation from `HiveCompatibilitySuite`. https://issues.apache.org/jira/browse/SPARK-13475 Author: Yin Huai <yh...@databricks.com> Closes #11351 from yhuai/SPARK-13475. (cherry picked fr

spark git commit: [SPARK-12941][SQL][MASTER] Spark-SQL JDBC Oracle dialect fails to map string datatypes to Oracle VARCHAR datatype

2016-02-25 Thread yhuai
Repository: spark Updated Branches: refs/heads/master 50e60e36f -> 8afe49141 [SPARK-12941][SQL][MASTER] Spark-SQL JDBC Oracle dialect fails to map string datatypes to Oracle VARCHAR datatype ## What changes were proposed in this pull request? This Pull request is used for the fix

spark git commit: [SPARK-13487][SQL] User-facing RuntimeConfig interface

2016-02-25 Thread yhuai
Repository: spark Updated Branches: refs/heads/master 8afe49141 -> 26ac60806 [SPARK-13487][SQL] User-facing RuntimeConfig interface ## What changes were proposed in this pull request? This patch creates the public API for runtime configuration and an implementation for it. The public runtime

spark git commit: [SPARK-13454][SQL] Allow users to drop a table with a name starting with an underscore.

2016-02-26 Thread yhuai
ugh). ## How was this patch tested? Add a test to make sure we can drop a table with a name starting with an underscore. https://issues.apache.org/jira/browse/SPARK-13454 Author: Yin Huai <yh...@databricks.com> Closes #11349 from yhuai/fixDropTable. Project: http://git-wip-us.apache.org/

spark git commit: [SPARK-12941][SQL][MASTER] Spark-SQL JDBC Oracle dialect fails to map string datatypes to Oracle VARCHAR datatype

2016-02-28 Thread yhuai
Repository: spark Updated Branches: refs/heads/branch-1.6 dcf60d79e -> fedb81360 [SPARK-12941][SQL][MASTER] Spark-SQL JDBC Oracle dialect fails to map string datatypes to Oracle VARCHAR datatype ## What changes were proposed in this pull request? This Pull request is used for the fix

spark git commit: [SPARK-13383][SQL] Fix test

2016-02-24 Thread yhuai
ded by SPARK-13383. So, I am fixing the test. Author: Yin Huai <yh...@databricks.com> Closes #11355 from yhuai/SPARK-13383-fix-test. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/cbb0b65a Tree: http://git-wip-us.apache.org/

spark git commit: [SPARK-13092][SQL] Add ExpressionSet for constraint tracking

2016-02-24 Thread yhuai
Repository: spark Updated Branches: refs/heads/master 5a7af9e7a -> 2b042577f [SPARK-13092][SQL] Add ExpressionSet for constraint tracking This PR adds a new abstraction called an `ExpressionSet` which attempts to canonicalize expressions to remove cosmetic differences. Deterministic

spark git commit: [SPARK-12941][SQL][BRANCH-1.4] Spark-SQL JDBC Oracle dialect fails to map string datatypes to Oracle VARCHAR datatype

2016-02-29 Thread yhuai
Repository: spark Updated Branches: refs/heads/branch-1.5 0e920411f -> f521c4470 [SPARK-12941][SQL][BRANCH-1.4] Spark-SQL JDBC Oracle dialect fails to map string datatypes to Oracle VARCHAR datatype Adding a getJDBCType() method to the JdbcDialects.scala which would create a VARCHAR type

spark git commit: [SPARK-12941][SQL][BRANCH-1.4] Spark-SQL JDBC Oracle dialect fails to map string datatypes to Oracle VARCHAR datatype

2016-02-29 Thread yhuai
Repository: spark Updated Branches: refs/heads/branch-1.4 a9b1b8025 -> 9d8404bf8 [SPARK-12941][SQL][BRANCH-1.4] Spark-SQL JDBC Oracle dialect fails to map string datatypes to Oracle VARCHAR datatype Adding a getJDBCType() method to the JdbcDialects.scala which would create a VARCHAR type

svn commit: r1732240 [2/2] - in /spark: _layouts/ site/ site/graphx/ site/mllib/ site/news/ site/releases/ site/screencasts/ site/sql/ site/streaming/

2016-02-24 Thread yhuai
Modified: spark/site/news/two-weeks-to-spark-summit-2014.html URL: http://svn.apache.org/viewvc/spark/site/news/two-weeks-to-spark-summit-2014.html?rev=1732240=1732239=1732240=diff == ---

svn commit: r1732240 [1/2] - in /spark: _layouts/ site/ site/graphx/ site/mllib/ site/news/ site/releases/ site/screencasts/ site/sql/ site/streaming/

2016-02-24 Thread yhuai
Author: yhuai Date: Wed Feb 24 23:07:27 2016 New Revision: 1732240 URL: http://svn.apache.org/viewvc?rev=1732240=rev Log: Add links to ASF on the nav bar Modified: spark/_layouts/global.html spark/site/community.html spark/site/documentation.html spark/site/downloads.html

spark git commit: [SPARK-13475][TESTS][SQL] HiveCompatibilitySuite should still run in PR builder even if a PR only changes sql/core

2016-02-24 Thread yhuai
PR build even if a PR only changes sql/core. So, I am going to remove `ExtendedHiveTest` annotation from `HiveCompatibilitySuite`. https://issues.apache.org/jira/browse/SPARK-13475 Author: Yin Huai <yh...@databricks.com> Closes #11351 from yhuai/SPARK-13475. Project: http:

spark git commit: [SPARK-8968] [SQL] [HOT-FIX] Fix scala 2.11 build.

2016-01-20 Thread yhuai
Repository: spark Updated Branches: refs/heads/master 015c8efb3 -> d60f8d74a [SPARK-8968] [SQL] [HOT-FIX] Fix scala 2.11 build. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/d60f8d74 Tree:

spark git commit: [SPARK-12870][SQL] better format bucket id in file name

2016-01-19 Thread yhuai
Repository: spark Updated Branches: refs/heads/master 0ddba6d88 -> e14817b52 [SPARK-12870][SQL] better format bucket id in file name for normal parquet file without bucket, it's file name ends with a jobUUID which maybe all numbers and mistakeny regarded as bucket id. This PR improves the

spark git commit: [SPARK-12901][SQL][HOT-FIX] Fix scala 2.11 compilation.

2016-01-25 Thread yhuai
Repository: spark Updated Branches: refs/heads/master 7d877c343 -> 00026fa99 [SPARK-12901][SQL][HOT-FIX] Fix scala 2.11 compilation. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/00026fa9 Tree:

svn commit: r1726699 [1/3] - in /spark: ./ _layouts/ site/ site/graphx/ site/mllib/ site/news/ site/releases/ site/screencasts/ site/sql/ site/streaming/

2016-01-25 Thread yhuai
Author: yhuai Date: Mon Jan 25 21:57:32 2016 New Revision: 1726699 URL: http://svn.apache.org/viewvc?rev=1726699=rev Log: Update the Spark example page to include examples using high level APIs Modified: spark/_config.yml spark/_layouts/global.html spark/examples.md spark/site

svn commit: r1726699 [2/3] - in /spark: ./ _layouts/ site/ site/graphx/ site/mllib/ site/news/ site/releases/ site/screencasts/ site/sql/ site/streaming/

2016-01-25 Thread yhuai
Modified: spark/site/examples.html URL: http://svn.apache.org/viewvc/spark/site/examples.html?rev=1726699=1726698=1726699=diff == --- spark/site/examples.html (original) +++ spark/site/examples.html Mon Jan 25 21:57:32

svn commit: r1726699 [3/3] - in /spark: ./ _layouts/ site/ site/graphx/ site/mllib/ site/news/ site/releases/ site/screencasts/ site/sql/ site/streaming/

2016-01-25 Thread yhuai
Modified: spark/site/news/spark-screencasts-published.html URL: http://svn.apache.org/viewvc/spark/site/news/spark-screencasts-published.html?rev=1726699=1726698=1726699=diff == ---

spark git commit: [SPARK-12624][PYSPARK] Checks row length when converting Java arrays to Python rows

2016-01-24 Thread yhuai
Repository: spark Updated Branches: refs/heads/branch-1.6 f913f7ea0 -> 88614dd0f [SPARK-12624][PYSPARK] Checks row length when converting Java arrays to Python rows When actual row length doesn't conform to specified schema field length, we should give a better error message instead of

spark git commit: [SPARK-12682][SQL] Add support for (optionally) not storing tables in hive metadata format

2016-01-26 Thread yhuai
Repository: spark Updated Branches: refs/heads/master ae0309a88 -> 08c781ca6 [SPARK-12682][SQL] Add support for (optionally) not storing tables in hive metadata format This PR adds a new table option (`skip_hive_metadata`) that'd allow the user to skip storing the table metadata in hive

spark git commit: [SPARK-12682][SQL] Add support for (optionally) not storing tables in hive metadata format

2016-01-26 Thread yhuai
Repository: spark Updated Branches: refs/heads/branch-1.6 572bc3999 -> f0c98a60f [SPARK-12682][SQL] Add support for (optionally) not storing tables in hive metadata format This PR adds a new table option (`skip_hive_metadata`) that'd allow the user to skip storing the table metadata in hive

spark git commit: [SPARK-12682][SQL][HOT-FIX] Fix test compilation

2016-01-26 Thread yhuai
Repository: spark Updated Branches: refs/heads/branch-1.6 f0c98a60f -> 6ce3dd940 [SPARK-12682][SQL][HOT-FIX] Fix test compilation Author: Yin Huai <yh...@databricks.com> Closes #10925 from yhuai/branch-1.6-hot-fix. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Com

spark git commit: [SPARK-12611][SQL][PYSPARK][TESTS] Fix test_infer_schema_to_local

2016-01-26 Thread yhuai
Repository: spark Updated Branches: refs/heads/branch-1.6 6ce3dd940 -> 85518eda4 [SPARK-12611][SQL][PYSPARK][TESTS] Fix test_infer_schema_to_local Previously (when the PR was first created) not specifying b= explicitly was fine (and treated as default null) - instead be explicit about b

spark git commit: [SPARK-13759][SQL] Add IsNotNull constraints for expressions with an inequality

2016-03-10 Thread yhuai
Repository: spark Updated Branches: refs/heads/master 235f4ac6f -> 19f4ac6dc [SPARK-13759][SQL] Add IsNotNull constraints for expressions with an inequality ## What changes were proposed in this pull request? This PR adds support for inferring `IsNotNull` constraints from expressions with

[1/4] spark git commit: [SPARK-13244][SQL] Migrates DataFrame to Dataset

2016-03-10 Thread yhuai
Repository: spark Updated Branches: refs/heads/master 27fe6bacc -> 1d542785b http://git-wip-us.apache.org/repos/asf/spark/blob/1d542785/sql/core/src/test/java/test/org/apache/spark/sql/JavaDataFrameSuite.java -- diff --git

[4/4] spark git commit: [SPARK-13244][SQL] Migrates DataFrame to Dataset

2016-03-10 Thread yhuai
[SPARK-13244][SQL] Migrates DataFrame to Dataset ## What changes were proposed in this pull request? This PR unifies DataFrame and Dataset by migrating existing DataFrame operations to Dataset and make `DataFrame` a type alias of `Dataset[Row]`. Most Scala code changes are source compatible,

[2/4] spark git commit: [SPARK-13244][SQL] Migrates DataFrame to Dataset

2016-03-10 Thread yhuai
http://git-wip-us.apache.org/repos/asf/spark/blob/1d542785/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/objects.scala -- diff --git

[3/4] spark git commit: [SPARK-13244][SQL] Migrates DataFrame to Dataset

2016-03-10 Thread yhuai
http://git-wip-us.apache.org/repos/asf/spark/blob/1d542785/examples/src/main/java/org/apache/spark/examples/ml/JavaSimpleParamsExample.java -- diff --git

spark git commit: [SPARK-13207][SQL] Make partitioning discovery ignore _SUCCESS files.

2016-03-14 Thread yhuai
uet) and data files, which requires more changes. https://issues.apache.org/jira/browse/SPARK-13207 Author: Yin Huai <yh...@databricks.com> Closes #11088 from yhuai/SPARK-13207. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spa

spark git commit: [SPARK-13895][SQL] DataFrameReader.text should return Dataset[String]

2016-03-15 Thread yhuai
Repository: spark Updated Branches: refs/heads/master 41eaabf59 -> 643649dcb [SPARK-13895][SQL] DataFrameReader.text should return Dataset[String] ## What changes were proposed in this pull request? This patch changes DataFrameReader.text()'s return type from DataFrame to Dataset[String].

spark git commit: [SPARK-13139][SQL] Parse Hive DDL commands ourselves

2016-03-11 Thread yhuai
Repository: spark Updated Branches: refs/heads/master 42afd72c6 -> 66d9d0edf [SPARK-13139][SQL] Parse Hive DDL commands ourselves ## What changes were proposed in this pull request? This patch is ported over from viirya's changes in #11048. Currently for most DDLs we just pass the query

spark git commit: [SPARK-13139][SQL] Follow-ups to #11573

2016-03-14 Thread yhuai
Repository: spark Updated Branches: refs/heads/master 250832c73 -> 9a1680c2c [SPARK-13139][SQL] Follow-ups to #11573 Addressing outstanding comments in #11573. Jenkins, new test case in `DDLCommandSuite` Author: Andrew Or Closes #11667 from

spark git commit: [SPARK-13740][SQL] add null check for _verify_type in types.py

2016-03-08 Thread yhuai
Repository: spark Updated Branches: refs/heads/master 9740954f3 -> d5ce61722 [SPARK-13740][SQL] add null check for _verify_type in types.py ## What changes were proposed in this pull request? This PR adds null check in `_verify_type` according to the nullability information. ## How was

spark git commit: [SPARK-13972][SQL][FOLLOW-UP] When creating the query execution for a converted SQL query, we eagerly trigger analysis

2016-03-19 Thread yhuai
tested? Existing tests. Author: Yin Huai <yh...@databricks.com> Closes #11825 from yhuai/SPARK-13972-follow-up. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/238fb485 Tree: http://git-wip-us.apache.org/repos/asf

[1/2] spark git commit: [SPARK-13923][SQL] Implement SessionCatalog

2016-03-19 Thread yhuai
Repository: spark Updated Branches: refs/heads/master 92b70576e -> ca9ef86c8 http://git-wip-us.apache.org/repos/asf/spark/blob/ca9ef86c/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalogSuite.scala

spark git commit: [SPARK-13827][SQL] Can't add subquery to an operator with same-name outputs while generate SQL string

2016-03-19 Thread yhuai
Repository: spark Updated Branches: refs/heads/master 91984978e -> 1d1de28a3 [SPARK-13827][SQL] Can't add subquery to an operator with same-name outputs while generate SQL string ## What changes were proposed in this pull request? This PR tries to solve a fundamental issue in the

spark git commit: [SPARK-13869][SQL] Remove redundant conditions while combining filters

2016-03-19 Thread yhuai
Repository: spark Updated Branches: refs/heads/master f96997ba2 -> 77ba3021c [SPARK-13869][SQL] Remove redundant conditions while combining filters ## What changes were proposed in this pull request? **[I'll link it to the JIRA once ASF JIRA is back online]** This PR modifies the existing

spark git commit: [SPARK-13760][SQL] Fix BigDecimal constructor for FloatType

2016-03-09 Thread yhuai
oat)`. The latter is deprecated and can result in inconsistencies due to an implicit conversion to `Double`. ## How was this patch tested? N/A cc yhuai Author: Sameer Agarwal <sam...@databricks.com> Closes #11597 from sameeragarwal/bigdecimal. (cherry picked fr

spark git commit: [SPARK-13760][SQL] Fix BigDecimal constructor for FloatType

2016-03-09 Thread yhuai
oat)`. The latter is deprecated and can result in inconsistencies due to an implicit conversion to `Double`. ## How was this patch tested? N/A cc yhuai Author: Sameer Agarwal <sam...@databricks.com> Closes #11597 from sameeragarwal/bigdecimal. Project: http://git-wip-us.apache.org/repos/asf/s

spark git commit: Revert "[SPARK-13760][SQL] Fix BigDecimal constructor for FloatType"

2016-03-09 Thread yhuai
Repository: spark Updated Branches: refs/heads/master 926e9c45a -> 790646125 Revert "[SPARK-13760][SQL] Fix BigDecimal constructor for FloatType" This reverts commit 926e9c45a21c5b71ef0832d63b8dae7d4f3d8826. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit:

spark git commit: [SPARK-13713][SQL][TEST-MAVEN] Add Antlr4 maven plugin.

2016-03-28 Thread yhuai
uai <yh...@databricks.com> Closes #12010 from yhuai/mavenAntlr4. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/7007f72b Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/7007f72b Diff: http://git-wip-us.apache.org/

spark git commit: [SPARK-14206][SQL] buildReader() implementation for CSV

2016-03-30 Thread yhuai
Repository: spark Updated Branches: refs/heads/master da54abfd8 -> 26445c2e4 [SPARK-14206][SQL] buildReader() implementation for CSV ## What changes were proposed in this pull request? Major changes: 1. Implement `FileFormat.buildReader()` for the CSV data source. 1. Add an extra argument

spark git commit: [SPARK-14259][SQL] Add a FileSourceStrategy option for limiting #files in a partition

2016-03-30 Thread yhuai
Repository: spark Updated Branches: refs/heads/master ca458618d -> dadf0138b [SPARK-14259][SQL] Add a FileSourceStrategy option for limiting #files in a partition ## What changes were proposed in this pull request? This pr is to add a config to control the maximum number of files as even

spark git commit: [SPARK-14418][PYSPARK] fix unpersist of Broadcast in Python

2016-04-06 Thread yhuai
Repository: spark Updated Branches: refs/heads/master 59236e5c5 -> 90ca18448 [SPARK-14418][PYSPARK] fix unpersist of Broadcast in Python ## What changes were proposed in this pull request? Currently, Broaccast.unpersist() will remove the file of broadcast, which should be the behavior of

spark git commit: [SPARK-14320][SQL] Make ColumnarBatch.Row mutable

2016-04-06 Thread yhuai
Repository: spark Updated Branches: refs/heads/master af73d9737 -> bb1fa5b21 [SPARK-14320][SQL] Make ColumnarBatch.Row mutable ## What changes were proposed in this pull request? In order to leverage a data structure like `AggregateHashMap` (https://github.com/apache/spark/pull/12055) to

spark git commit: [SPARK-14410][SQL] Push functions existence check into catalog

2016-04-07 Thread yhuai
Repository: spark Updated Branches: refs/heads/master aa852215f -> ae1db91d1 [SPARK-14410][SQL] Push functions existence check into catalog ## What changes were proposed in this pull request? This is a followup to #12117 and addresses some of the TODOs introduced there. In particular, the

spark git commit: [SPARK-14270][SQL] whole stage codegen support for typed filter

2016-04-07 Thread yhuai
Repository: spark Updated Branches: refs/heads/master ae1db91d1 -> 49fb23708 [SPARK-14270][SQL] whole stage codegen support for typed filter ## What changes were proposed in this pull request? We implement typed filter by `MapPartitions`, which doesn't work well with whole stage codegen.

spark git commit: [SPARK-14535][SQL] Remove buildInternalScan from FileFormat

2016-04-12 Thread yhuai
Repository: spark Updated Branches: refs/heads/master 52a801124 -> 678b96e77 [SPARK-14535][SQL] Remove buildInternalScan from FileFormat ## What changes were proposed in this pull request? Now `HadoopFsRelation` with all kinds of file formats can be handled in `FileSourceStrategy`, we can

spark git commit: [SPARK-14362][SPARK-14406][SQL][FOLLOW-UP] DDL Native Support: Drop View and Drop Table

2016-04-11 Thread yhuai
Repository: spark Updated Branches: refs/heads/master 83fb96403 -> 2d81ba542 [SPARK-14362][SPARK-14406][SQL][FOLLOW-UP] DDL Native Support: Drop View and Drop Table What changes were proposed in this pull request? In this PR, we are trying to address the comment in the original PR:

spark git commit: [SPARK-14554][SQL] disable whole stage codegen if there are too many input columns

2016-04-11 Thread yhuai
Repository: spark Updated Branches: refs/heads/master 2d81ba542 -> 52a801124 [SPARK-14554][SQL] disable whole stage codegen if there are too many input columns ## What changes were proposed in this pull request? In

spark git commit: [SPARK-14132][SPARK-14133][SQL] Alter table partition DDLs

2016-04-11 Thread yhuai
Repository: spark Updated Branches: refs/heads/master e9e1adc03 -> 83fb96403 [SPARK-14132][SPARK-14133][SQL] Alter table partition DDLs ## What changes were proposed in this pull request? This implements a few alter table partition commands using the `SessionCatalog`. In particular: ```

spark git commit: [SPARK-14335][SQL] Describe function command returns wrong output

2016-04-09 Thread yhuai
Repository: spark Updated Branches: refs/heads/master f7ec854f1 -> cd2fed701 [SPARK-14335][SQL] Describe function command returns wrong output ## What changes were proposed in this pull request? …because some of built-in functions are not in function registry. This fix tries to fix issues

spark git commit: [SPARK-14362][SPARK-14406][SQL] DDL Native Support: Drop View and Drop Table

2016-04-09 Thread yhuai
Repository: spark Updated Branches: refs/heads/master 9be5558e0 -> dfce9665c [SPARK-14362][SPARK-14406][SQL] DDL Native Support: Drop View and Drop Table What changes were proposed in this pull request? This PR is to provide a native support for DDL `DROP VIEW` and `DROP TABLE`. The PR

[2/2] spark git commit: [SPARK-14415][SQL] All functions should show usages by command `DESC FUNCTION`

2016-04-10 Thread yhuai
[SPARK-14415][SQL] All functions should show usages by command `DESC FUNCTION` ## What changes were proposed in this pull request? Currently, many functions do now show usages like the followings. ``` scala> sql("desc function extended `sin`").collect().foreach(println) [Function: sin] [Class:

[1/2] spark git commit: [SPARK-14415][SQL] All functions should show usages by command `DESC FUNCTION`

2016-04-10 Thread yhuai
Repository: spark Updated Branches: refs/heads/master b5c785629 -> a7ce473bd http://git-wip-us.apache.org/repos/asf/spark/blob/a7ce473b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala

spark git commit: [SPARK-14481][SQL] Issue Exceptions for All Unsupported Options during Parsing

2016-04-09 Thread yhuai
e should either support it or throw an exception." A comment from yhuai in another PR https://github.com/apache/spark/pull/12146 - Can `Explain` be an exception? The `Formatted` clause is used in `HiveCompatibilitySuite`. - Two unsupported clauses in `Drop Table` are handled in a separa

spark git commit: [SPARK-14362][SPARK-14406][SQL][FOLLOW-UP] DDL Native Support: Drop View and Drop Table

2016-04-10 Thread yhuai
Repository: spark Updated Branches: refs/heads/master fbf8d0088 -> 9f838bd24 [SPARK-14362][SPARK-14406][SQL][FOLLOW-UP] DDL Native Support: Drop View and Drop Table What changes were proposed in this pull request? This PR is to address the comment:

spark git commit: [SPARK-14129][SPARK-14128][SQL] Alter table DDL commands

2016-04-05 Thread yhuai
Repository: spark Updated Branches: refs/heads/master c59abad05 -> 45d8cdee3 [SPARK-14129][SPARK-14128][SQL] Alter table DDL commands ## What changes were proposed in this pull request? In Spark 2.0, we want to handle the most common `ALTER TABLE` commands ourselves instead of passing the

spark git commit: [SPARK-14128][SQL] Alter table DDL followup

2016-04-05 Thread yhuai
Repository: spark Updated Branches: refs/heads/master f6456fa80 -> adbfdb878 [SPARK-14128][SQL] Alter table DDL followup ## What changes were proposed in this pull request? This is just a followup to #12121, which implemented the alter table DDLs using the `SessionCatalog`. Specially, this

spark git commit: [SPARK-14394][SQL] Generate AggregateHashMap class for LongTypes during TungstenAggregate codegen

2016-04-08 Thread yhuai
Repository: spark Updated Branches: refs/heads/master 02757535b -> f8c9beca3 [SPARK-14394][SQL] Generate AggregateHashMap class for LongTypes during TungstenAggregate codegen ## What changes were proposed in this pull request? This PR adds support for generating the `AggregateHashMap` class

spark git commit: [SPARK-13871][SQL] Support for inferring filters from data constraints

2016-03-19 Thread yhuai
Repository: spark Updated Branches: refs/heads/master b90c0206f -> f96997ba2 [SPARK-13871][SQL] Support for inferring filters from data constraints ## What changes were proposed in this pull request? This PR generalizes the `NullFiltering` optimizer rule in catalyst to

spark git commit: Revert "[SPARK-12719][HOTFIX] Fix compilation against Scala 2.10"

2016-03-19 Thread yhuai
Repository: spark Updated Branches: refs/heads/master edf8b8775 -> 4c08e2c08 Revert "[SPARK-12719][HOTFIX] Fix compilation against Scala 2.10" This reverts commit 3ee7996187bbef008c10681bc4e048c6383f5187. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit:

spark git commit: [SPARK-10680][TESTS] Increase 'connectionTimeout' to make RequestTimeoutIntegrationSuite more stable

2016-03-19 Thread yhuai
Repository: spark Updated Branches: refs/heads/master dcaa01661 -> d630a203d [SPARK-10680][TESTS] Increase 'connectionTimeout' to make RequestTimeoutIntegrationSuite more stable ## What changes were proposed in this pull request? Increase 'connectionTimeout' to make

<    1   2   3   4   5   6   7   8   >