spark git commit: [SPARK-14678][SQL] Add a file sink log to support versioning and compaction

2016-04-20 Thread marmbrus
Repository: spark Updated Branches: refs/heads/master 296c384af -> 7bc948557 [SPARK-14678][SQL] Add a file sink log to support versioning and compaction ## What changes were proposed in this pull request? This PR adds a special log for FileStreamSink for two purposes: - Versioning. A future

spark git commit: [SPARK-14741][SQL] Fixed error in reading json file stream inside a partitioned directory

2016-04-20 Thread marmbrus
Repository: spark Updated Branches: refs/heads/master acc7e592c -> cb8ea9e1f [SPARK-14741][SQL] Fixed error in reading json file stream inside a partitioned directory ## What changes were proposed in this pull request? Consider the following directory structure dir/col=X/some-files If we

spark git commit: [SPARK-14555] First cut of Python API for Structured Streaming

2016-04-20 Thread marmbrus
Repository: spark Updated Branches: refs/heads/master 834277884 -> 80bf48f43 [SPARK-14555] First cut of Python API for Structured Streaming ## What changes were proposed in this pull request? This patch provides a first cut of python APIs for structured streaming. This PR provides the new

spark git commit: [SPARK-14474][SQL] Move FileSource offset log into checkpointLocation

2016-04-12 Thread marmbrus
Repository: spark Updated Branches: refs/heads/master da60b34d2 -> 6bf692147 [SPARK-14474][SQL] Move FileSource offset log into checkpointLocation ## What changes were proposed in this pull request? Now that we have a single location for storing checkpointed state. This PR just propagates

spark git commit: [SPARK-14494][SQL] Fix the race conditions in MemoryStream and MemorySink

2016-04-11 Thread marmbrus
Repository: spark Updated Branches: refs/heads/master 5de26194a -> 2dacc81ec [SPARK-14494][SQL] Fix the race conditions in MemoryStream and MemorySink ## What changes were proposed in this pull request? Make sure accessing mutable variables in MemoryStream and MemorySink are protected by

spark git commit: [SPARK-14456][SQL][MINOR] Remove unused variables and logics in DataSource

2016-04-07 Thread marmbrus
Repository: spark Updated Branches: refs/heads/master 3aa7d7639 -> 8dcb0c7c9 [SPARK-14456][SQL][MINOR] Remove unused variables and logics in DataSource ## What changes were proposed in this pull request? In DataSource#write method, the variables `dataSchema` and `equality`, and related

spark git commit: [SPARK-14411][SQL] Add a note to warn that onQueryProgress is asynchronous

2016-04-05 Thread marmbrus
Repository: spark Updated Branches: refs/heads/master 45d8cdee3 -> 7329fe272 [SPARK-14411][SQL] Add a note to warn that onQueryProgress is asynchronous ## What changes were proposed in this pull request? onQueryProgress is asynchronous so the user may see some future status of

spark git commit: [SPARK-14402][SQL] initcap UDF doesn't match Hive/Oracle behavior in lowercasing rest of string

2016-04-05 Thread marmbrus
Repository: spark Updated Branches: refs/heads/master 9ee5c2571 -> c59abad05 [SPARK-14402][SQL] initcap UDF doesn't match Hive/Oracle behavior in lowercasing rest of string ## What changes were proposed in this pull request? Current, SparkSQL `initCap` is using `toTitleCase` function.

spark git commit: [SPARK-14257][SQL] Allow multiple continuous queries to be started from the same DataFrame

2016-04-05 Thread marmbrus
Repository: spark Updated Branches: refs/heads/master f77f11c67 -> 463bac001 [SPARK-14257][SQL] Allow multiple continuous queries to be started from the same DataFrame ## What changes were proposed in this pull request? Make StreamingRelation store the closure to create the source in

spark git commit: [SPARK-14345][SQL] Decouple deserializer expression resolution from ObjectOperator

2016-04-05 Thread marmbrus
Repository: spark Updated Branches: refs/heads/master e4bd50412 -> f77f11c67 [SPARK-14345][SQL] Decouple deserializer expression resolution from ObjectOperator ## What changes were proposed in this pull request? This PR decouples deserializer expression resolution from `ObjectOperator`, so

spark git commit: [SPARK-14287] isStreaming method for Dataset

2016-04-04 Thread marmbrus
Repository: spark Updated Branches: refs/heads/master 7201f033c -> ba24d1ee9 [SPARK-14287] isStreaming method for Dataset With the addition of StreamExecution (ContinuousQuery) to Datasets, data will become unbounded. With unbounded data, the execution of some methods and operations will

[1/2] spark git commit: [SPARK-14255][SQL] Streaming Aggregation

2016-04-01 Thread marmbrus
Repository: spark Updated Branches: refs/heads/master 0b7d4966c -> 0fc4aaa71 http://git-wip-us.apache.org/repos/asf/spark/blob/0fc4aaa7/sql/core/src/test/scala/org/apache/spark/sql/execution/streaming/state/StateStoreRDDSuite.scala

[2/2] spark git commit: [SPARK-14255][SQL] Streaming Aggregation

2016-04-01 Thread marmbrus
` that checks only the output of the last batch has been added to simulate the future addition of output modes. Author: Michael Armbrust <mich...@databricks.com> Closes #12048 from marmbrus/statefulAgg. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org

spark git commit: [SPARK-14160] Time Windowing functions for Datasets

2016-04-01 Thread marmbrus
Repository: spark Updated Branches: refs/heads/master 1e8861598 -> 1b829ce13 [SPARK-14160] Time Windowing functions for Datasets ## What changes were proposed in this pull request? This PR adds the function `window` as a column expression. `window` can be used to bucket rows into time

spark git commit: [SPARK-14070][SQL] Use ORC data source for SQL queries on ORC tables

2016-04-01 Thread marmbrus
Repository: spark Updated Branches: refs/heads/master a884daad8 -> 1e8861598 [SPARK-14070][SQL] Use ORC data source for SQL queries on ORC tables ## What changes were proposed in this pull request? This patch enables use of OrcRelation for SQL queries which read data from Hive tables.

spark git commit: [SPARK-14191][SQL] Remove invalid Expand operator constraints

2016-04-01 Thread marmbrus
Repository: spark Updated Branches: refs/heads/master df68beb85 -> a884daad8 [SPARK-14191][SQL] Remove invalid Expand operator constraints `Expand` operator now uses its child plan's constraints as its valid constraints (i.e., the base of constraints). This is not correct because `Expand`

spark git commit: [SPARK-13995][SQL] Extract correct IsNotNull constraints for Expression

2016-04-01 Thread marmbrus
Repository: spark Updated Branches: refs/heads/master 381358fbe -> df68beb85 [SPARK-13995][SQL] Extract correct IsNotNull constraints for Expression ## What changes were proposed in this pull request? JIRA: https://issues.apache.org/jira/browse/SPARK-13995 We infer relative `IsNotNull`

spark git commit: [SPARK-14268][SQL] rename toRowExpressions and fromRowExpression to serializer and deserializer in ExpressionEncoder

2016-03-30 Thread marmbrus
Repository: spark Updated Branches: refs/heads/master 816f359cf -> d46c71b39 [SPARK-14268][SQL] rename toRowExpressions and fromRowExpression to serializer and deserializer in ExpressionEncoder ## What changes were proposed in this pull request? In `ExpressionEncoder`, we use

spark git commit: [SPARK-12443][SQL] encoderFor should support Decimal

2016-03-25 Thread marmbrus
Repository: spark Updated Branches: refs/heads/master 11fa8741c -> ca003354d [SPARK-12443][SQL] encoderFor should support Decimal ## What changes were proposed in this pull request? JIRA: https://issues.apache.org/jira/browse/SPARK-12443 `constructorFor` will call `dataTypeFor` to determine

spark git commit: [SPARK-14078] Streaming Parquet Based FileSink

2016-03-23 Thread marmbrus
ess test that checks the answer after non-deterministic injected failures. Author: Michael Armbrust <mich...@databricks.com> Closes #11897 from marmbrus/fileSink. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/6bc4

spark git commit: [SPARK-13985][SQL] Deterministic batches with ids

2016-03-22 Thread marmbrus
ion with the the `StateStore` (#11645). Author: Michael Armbrust <mich...@databricks.com> Closes #11804 from marmbrus/batchIds. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/caea1521 Tree: http://git-wip-us.apache.

spark git commit: [SPARK-14029][SQL] Improve BooleanSimplification optimization by implementing `Not` canonicalization.

2016-03-22 Thread marmbrus
Repository: spark Updated Branches: refs/heads/master 0ce01635c -> c632bdc01 [SPARK-14029][SQL] Improve BooleanSimplification optimization by implementing `Not` canonicalization. ## What changes were proposed in this pull request? Currently, **BooleanSimplification** optimization can handle

spark git commit: [SPARK-13883][SQL] Parquet Implementation of FileFormat.buildReader

2016-03-21 Thread marmbrus
ded. This code should be tested by the many existing tests for parquet. Author: Michael Armbrust <mich...@databricks.com> Author: Sameer Agarwal <sam...@databricks.com> Author: Nong Li <n...@databricks.com> Closes #11709 from marmbrus/parquetReader. Project: http://git-wip-us

spark git commit: [SPARK-13427][SQL] Support USING clause in JOIN.

2016-03-19 Thread marmbrus
Repository: spark Updated Branches: refs/heads/master 65b75e66e -> 637a78f1d [SPARK-13427][SQL] Support USING clause in JOIN. ## What changes were proposed in this pull request? Support queries that JOIN tables with USING clause. SELECT * from table1 JOIN table2 USING USING clause can be

spark git commit: [SPARK-13791][SQL] Add MetadataLog and HDFSMetadataLog

2016-03-14 Thread marmbrus
Repository: spark Updated Branches: refs/heads/master 8e0b03060 -> b5e3bd87f [SPARK-13791][SQL] Add MetadataLog and HDFSMetadataLog ## What changes were proposed in this pull request? - Add a MetadataLog interface for metadata reliably storage. - Add HDFSMetadataLog as a MetadataLog

spark git commit: [SPARK-13664][SQL] Add a strategy for planning partitioned and bucketed scans of files

2016-03-14 Thread marmbrus
ternal APIs to avoid unnecessary `toArray` calls - Rename `Partition` to `PartitionDirectory` to differentiate partitions used earlier in pruning from those where we have already enumerated the files and their sizes. Author: Michael Armbrust <mich...@databricks.com> Closes #11646 f

spark git commit: [SPARK-13658][SQL] BooleanSimplification rule is slow with large boolean expressions

2016-03-14 Thread marmbrus
Repository: spark Updated Branches: refs/heads/master 63f642aea -> 6a4bfcd62 [SPARK-13658][SQL] BooleanSimplification rule is slow with large boolean expressions JIRA: https://issues.apache.org/jira/browse/SPARK-13658 ## What changes were proposed in this pull request? Quoted from JIRA

svn commit: r1734450 - in /spark: ./ _layouts/ js/ news/_posts/ releases/_posts/ site/ site/docs/ site/docs/1.6.1/ site/docs/1.6.1/api/ site/docs/1.6.1/api/R/ site/docs/1.6.1/api/java/ site/docs/1.6.1

2016-03-10 Thread marmbrus
Author: marmbrus Date: Thu Mar 10 19:28:30 2016 New Revision: 1734450 URL: http://svn.apache.org/viewvc?rev=1734450=rev Log: Release Spark 1.6.1 [This commit notification would consist of 933 parts, which exceeds the limit of 50 ones, so it was shortened to the summary

svn commit: r12718 - /dev/spark/spark-1.6.1-rc1/ /release/spark/spark-1.6.1/

2016-03-10 Thread marmbrus
Author: marmbrus Date: Thu Mar 10 19:14:45 2016 New Revision: 12718 Log: Release Spark 1.6.1 Added: release/spark/spark-1.6.1/ - copied from r12717, dev/spark/spark-1.6.1-rc1/ Removed: dev/spark/spark-1.6.1-rc1

svn commit: r12717 - /dev/spark/spark-1.6.1-rc1/

2016-03-10 Thread marmbrus
Author: marmbrus Date: Thu Mar 10 19:10:54 2016 New Revision: 12717 Log: Add spark-1.6.1-rc1 Added: dev/spark/spark-1.6.1-rc1/ dev/spark/spark-1.6.1-rc1/spark-1.6.1-bin-cdh4.tgz (with props) dev/spark/spark-1.6.1-rc1/spark-1.6.1-bin-cdh4.tgz.asc dev/spark/spark-1.6.1-rc1/spark

[spark] Git Push Summary

2016-03-09 Thread marmbrus
Repository: spark Updated Tags: refs/tags/v1.6.1 [created] 15de51c23 - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org

[spark] Git Push Summary

2016-03-09 Thread marmbrus
Repository: spark Updated Tags: refs/tags/v1.6.1 [deleted] 152252f15 - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org

[spark] Git Push Summary

2016-03-09 Thread marmbrus
Repository: spark Updated Tags: refs/tags/v1.6.1-rc1 [deleted] 15de51c23 - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org

[spark] Git Push Summary

2016-03-09 Thread marmbrus
Repository: spark Updated Tags: refs/tags/v1.6.1 [created] 152252f15 - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org

spark git commit: [SPARK-13781][SQL] Use ExpressionSets in ConstraintPropagationSuite

2016-03-09 Thread marmbrus
Repository: spark Updated Branches: refs/heads/master e1772d3f1 -> dbf2a7cfa [SPARK-13781][SQL] Use ExpressionSets in ConstraintPropagationSuite ## What changes were proposed in this pull request? This PR is a small follow up on https://github.com/apache/spark/pull/11338

spark git commit: [SPARK-13527][SQL] Prune Filters based on Constraints

2016-03-09 Thread marmbrus
Repository: spark Updated Branches: refs/heads/master 3dc9ae2e1 -> c6aa356cd [SPARK-13527][SQL] Prune Filters based on Constraints What changes were proposed in this pull request? Remove all the deterministic conditions in a [[Filter]] that are contained in the Child's Constraints.

spark git commit: [SPARK-13728][SQL] Fix ORC PPD test so that pushed filters can be checked.

2016-03-09 Thread marmbrus
Repository: spark Updated Branches: refs/heads/master 23369c3bd -> cad29a40b [SPARK-13728][SQL] Fix ORC PPD test so that pushed filters can be checked. ## What changes were proposed in this pull request? https://issues.apache.org/jira/browse/SPARK-13728

spark git commit: [SPARK-13763][SQL] Remove Project when its Child's Output is Nil

2016-03-09 Thread marmbrus
Repository: spark Updated Branches: refs/heads/master 256704c77 -> 23369c3bd [SPARK-13763][SQL] Remove Project when its Child's Output is Nil What changes were proposed in this pull request? As shown in another PR: https://github.com/apache/spark/pull/11596, we are using `SELECT 1` as

spark git commit: [SPARK-13754] Keep old data source name for backwards compatibility

2016-03-08 Thread marmbrus
Repository: spark Updated Branches: refs/heads/master 982ef2b87 -> cc4ab37ee [SPARK-13754] Keep old data source name for backwards compatibility ## Motivation CSV data source was contributed by Databricks. It is the inlined version of https://github.com/databricks/spark-csv. The data source

spark git commit: [SPARK-13750][SQL] fix sizeInBytes of HadoopFsRelation

2016-03-08 Thread marmbrus
Repository: spark Updated Branches: refs/heads/master d8813fa04 -> 982ef2b87 [SPARK-13750][SQL] fix sizeInBytes of HadoopFsRelation ## What changes were proposed in this pull request? This PR fix the sizeInBytes of HadoopFsRelation. ## How was this patch tested? Added regression test for

spark git commit: [SPARK-13648] Add Hive Cli to classes for isolated classloader

2016-03-07 Thread marmbrus
Repository: spark Updated Branches: refs/heads/branch-1.6 cf4e62ec2 -> 695c8a257 [SPARK-13648] Add Hive Cli to classes for isolated classloader ## What changes were proposed in this pull request? Adding the hive-cli classes to the classloader ## How was this patch tested? The hive

spark git commit: [SPARK-13648] Add Hive Cli to classes for isolated classloader

2016-03-07 Thread marmbrus
Repository: spark Updated Branches: refs/heads/master e720dda42 -> 46f25c241 [SPARK-13648] Add Hive Cli to classes for isolated classloader ## What changes were proposed in this pull request? Adding the hive-cli classes to the classloader ## How was this patch tested? The hive

spark git commit: [SPARK-13722][SQL] No Push Down for Non-deterministics Predicates through Generate

2016-03-07 Thread marmbrus
Repository: spark Updated Branches: refs/heads/master a3ec50a4b -> b6071a700 [SPARK-13722][SQL] No Push Down for Non-deterministics Predicates through Generate What changes were proposed in this pull request? Non-deterministic predicates should not be pushed through Generate. How

spark git commit: [SPARK-13694][SQL] QueryPlan.expressions should always include all expressions

2016-03-07 Thread marmbrus
Repository: spark Updated Branches: refs/heads/master d7eac9d79 -> 489641117 [SPARK-13694][SQL] QueryPlan.expressions should always include all expressions ## What changes were proposed in this pull request? It's weird that expressions don't always have all the expressions in it. This PR

spark git commit: [SPARK-13544][SQL] Rewrite/Propagate Constraints for Aliases in Aggregate

2016-02-29 Thread marmbrus
ect` and `Aggregate`. So far, we only rewrite and propagate constraints if `Alias` is defined in `Project`. This PR is to resolve this issue in `Aggregate`. How was this patch tested? Added a test case for `Aggregate` in `ConstraintPropagationSuite`. marmbrus sameeragarwal Author: gatorsmile <g

[spark] Git Push Summary

2016-02-26 Thread marmbrus
Repository: spark Updated Tags: refs/tags/v1.6.1-rc1 [deleted] 152252f15 - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org

spark git commit: [SPARK-13383][SQL] Keep broadcast hint after column pruning

2016-02-24 Thread marmbrus
Repository: spark Updated Branches: refs/heads/master 893018183 -> f37398699 [SPARK-13383][SQL] Keep broadcast hint after column pruning JIRA: https://issues.apache.org/jira/browse/SPARK-13383 ## What changes were proposed in this pull request? When we do column pruning in Optimizer, we put

spark git commit: [SPARK-13440][SQL] ObjectType should accept any ObjectType, If should not care about nullability

2016-02-23 Thread marmbrus
ite` for the reported failure. - all the unit tests in `ExpressionEncoderSuite` are augmented to also confirm successful analysis. These tests are actually what pointed out the additional issues with `If` resolution. Author: Michael Armbrust <mich...@databricks.com> Closes #11316 from

spark git commit: Update branch-1.6 for 1.6.1 release

2016-02-22 Thread marmbrus
Repository: spark Updated Branches: refs/heads/branch-1.6 f7898f9e2 -> 40d11d049 Update branch-1.6 for 1.6.1 release Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/40d11d04 Tree:

spark git commit: [SPARK-11624][SPARK-11972][SQL] fix commands that need hive to exec

2016-02-22 Thread marmbrus
Repository: spark Updated Branches: refs/heads/branch-1.6 85e6a2205 -> f7898f9e2 [SPARK-11624][SPARK-11972][SQL] fix commands that need hive to exec In SparkSQLCLI, we have created a `CliSessionState`, but then we call `SparkSQLEnv.init()`, which will start another `SessionState`. This would

spark git commit: [SPARK-11624][SPARK-11972][SQL] fix commands that need hive to exec

2016-02-22 Thread marmbrus
Repository: spark Updated Branches: refs/heads/master a11b39951 -> 5d80fac58 [SPARK-11624][SPARK-11972][SQL] fix commands that need hive to exec In SparkSQLCLI, we have created a `CliSessionState`, but then we call `SparkSQLEnv.init()`, which will start another `SessionState`. This would

spark git commit: [SPARK-12546][SQL] Change default number of open parquet files

2016-02-22 Thread marmbrus
uet allocates a significant amount of memory that is not accounted for by our own mechanisms. As a workaround, we can ensure that only a single file is open per task unless the user explicitly asks for more. Author: Michael Armbrust <mich...@databricks.com> Closes #11308 from marmbrus/parque

spark git commit: [SPARK-12546][SQL] Change default number of open parquet files

2016-02-22 Thread marmbrus
tes a significant amount of memory that is not accounted for by our own mechanisms. As a workaround, we can ensure that only a single file is open per task unless the user explicitly asks for more. Author: Michael Armbrust <mich...@databricks.com> Closes #11308 from marmbrus/parque

spark git commit: [SPARK-13091][SQL] Rewrite/Propagate constraints for Aliases

2016-02-19 Thread marmbrus
any constraints on `a` now also apply to `b`. JIRA: https://issues.apache.org/jira/browse/SPARK-13091 cc marmbrus Author: Sameer Agarwal <sam...@databricks.com> Closes #11144 from sameeragarwal/alias. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.

spark git commit: [SPARK-13261][SQL] Expose maxCharactersPerColumn as a user configurable option

2016-02-19 Thread marmbrus
Repository: spark Updated Branches: refs/heads/master dbb08cdd5 -> 14844118b [SPARK-13261][SQL] Expose maxCharactersPerColumn as a user configurable option This patch expose `maxCharactersPerColumn` and `maxColumns` to user in CSV data source. Author: Hossein

spark git commit: [SPARK-12966][SQL] ArrayType(DecimalType) support in Postgres JDBC

2016-02-19 Thread marmbrus
Repository: spark Updated Branches: refs/heads/master c7c55637b -> dbb08cdd5 [SPARK-12966][SQL] ArrayType(DecimalType) support in Postgres JDBC Fixes error `org.postgresql.util.PSQLException: Unable to find server array type for provided name decimal(38,18)`. * Passes scale metadata to JDBC

spark git commit: [SPARK-13384][SQL] Keep attribute qualifiers after dedup in Analyzer

2016-02-19 Thread marmbrus
Repository: spark Updated Branches: refs/heads/master 6915cc23b -> c7c55637b [SPARK-13384][SQL] Keep attribute qualifiers after dedup in Analyzer JIRA: https://issues.apache.org/jira/browse/SPARK-13384 ## What changes were proposed in this pull request? When we de-duplicate attributes in

spark git commit: [SPARK-13101][SQL] nullability of array type element should not fail analysis of encoder

2016-02-08 Thread marmbrus
Repository: spark Updated Branches: refs/heads/master 06f0df6df -> 8e4d15f70 [SPARK-13101][SQL] nullability of array type element should not fail analysis of encoder nullability should only be considered as an optimization rather than part of the type system, so instead of failing analysis

spark git commit: [SPARK-12939][SQL] migrate encoder resolution logic to Analyzer

2016-02-05 Thread marmbrus
Repository: spark Updated Branches: refs/heads/master 7b73f1719 -> 1ed354a53 [SPARK-12939][SQL] migrate encoder resolution logic to Analyzer https://issues.apache.org/jira/browse/SPARK-12939 Now we will catch `ObjectOperator` in `Analyzer` and resolve the `fromRowExpression/deserializer`

spark git commit: [SPARK-10820][SQL] Support for the continuous execution of structured queries

2016-02-02 Thread marmbrus
agata Das <tathagata.das1...@gmail.com> Author: Josh Rosen <rosenvi...@gmail.com> Closes #11006 from marmbrus/structured-streaming. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/12a20c14 Tree: http://git-wip-us.apach

spark git commit: [SPARK-13094][SQL] Add encoders for seq/array of primitives

2016-02-02 Thread marmbrus
Repository: spark Updated Branches: refs/heads/branch-1.6 bd8efba8f -> 99594b213 [SPARK-13094][SQL] Add encoders for seq/array of primitives Author: Michael Armbrust <mich...@databricks.com> Closes #11014 from marmbrus/seqEncoders. (cherry picked fr

spark git commit: [DOCS] Update StructType.scala

2016-02-02 Thread marmbrus
Repository: spark Updated Branches: refs/heads/master d0df2ca40 -> b377b0353 [DOCS] Update StructType.scala The example will throw error like :20: error: not found: value StructType Need to add this line: import org.apache.spark.sql.types._ Author: Kevin (Sangwoo) Kim

spark git commit: [DOCS] Update StructType.scala

2016-02-02 Thread marmbrus
Repository: spark Updated Branches: refs/heads/branch-1.6 3c92333ee -> e81333be0 [DOCS] Update StructType.scala The example will throw error like :20: error: not found: value StructType Need to add this line: import org.apache.spark.sql.types._ Author: Kevin (Sangwoo) Kim

spark git commit: [SPARK-13056][SQL] map column would throw NPE if value is null

2016-02-02 Thread marmbrus
Repository: spark Updated Branches: refs/heads/branch-1.6 9c0cf22f7 -> 3c92333ee [SPARK-13056][SQL] map column would throw NPE if value is null Jira: https://issues.apache.org/jira/browse/SPARK-13056 Create a map like { "a": "somestring", "b": null} Query like SELECT col["b"] FROM t1; NPE

spark git commit: [SPARK-13056][SQL] map column would throw NPE if value is null

2016-02-02 Thread marmbrus
Repository: spark Updated Branches: refs/heads/master cba1d6b65 -> 358300c79 [SPARK-13056][SQL] map column would throw NPE if value is null Jira: https://issues.apache.org/jira/browse/SPARK-13056 Create a map like { "a": "somestring", "b": null} Query like SELECT col["b"] FROM t1; NPE would

spark git commit: [SPARK-12957][SQL] Initial support for constraint propagation in SparkSQL

2016-02-02 Thread marmbrus
Repository: spark Updated Branches: refs/heads/master e86f8f63b -> 138c300f9 [SPARK-12957][SQL] Initial support for constraint propagation in SparkSQL Based on the semantics of a query, we can derive a number of data constraints on output of each (logical or physical) operator. For instance,

spark git commit: [SPARK-12989][SQL] Delaying Alias Cleanup after ExtractWindowExpressions

2016-02-01 Thread marmbrus
Repository: spark Updated Branches: refs/heads/master 6075573a9 -> 33c8a490f [SPARK-12989][SQL] Delaying Alias Cleanup after ExtractWindowExpressions JIRA: https://issues.apache.org/jira/browse/SPARK-12989 In the rule `ExtractWindowExpressions`, we simply replace alias by the corresponding

spark git commit: [SPARK-12705][SPARK-10777][SQL] Analyzer Rule ResolveSortReferences

2016-02-01 Thread marmbrus
Repository: spark Updated Branches: refs/heads/master 33c8a490f -> 8f26eb5ef [SPARK-12705][SPARK-10777][SQL] Analyzer Rule ResolveSortReferences JIRA: https://issues.apache.org/jira/browse/SPARK-12705 **Scope:** This PR is a general fix for sorting reference resolution when the child's

spark git commit: [SPARK-11780][SQL] Add catalyst type aliases backwards compatibility

2016-02-01 Thread marmbrus
Repository: spark Updated Branches: refs/heads/branch-1.6 215d5d884 -> 70fcbf68e [SPARK-11780][SQL] Add catalyst type aliases backwards compatibility Changed a target at branch-1.6 from #10635. Author: Takeshi YAMAMURO Closes #10915 from maropu/pr9935-v3. Project:

spark git commit: [DOCS] Fix the jar location of datanucleus in sql-programming-guid.md

2016-02-01 Thread marmbrus
Repository: spark Updated Branches: refs/heads/branch-1.6 9a5b25d0f -> 215d5d884 [DOCS] Fix the jar location of datanucleus in sql-programming-guid.md ISTM `lib` is better because `datanucleus` jars are located in `lib` for release builds. Author: Takeshi YAMAMURO

spark git commit: [SPARK-12989][SQL] Delaying Alias Cleanup after ExtractWindowExpressions

2016-02-01 Thread marmbrus
Repository: spark Updated Branches: refs/heads/branch-1.6 ddb963304 -> 9a5b25d0f [SPARK-12989][SQL] Delaying Alias Cleanup after ExtractWindowExpressions JIRA: https://issues.apache.org/jira/browse/SPARK-12989 In the rule `ExtractWindowExpressions`, we simply replace alias by the

spark git commit: [DOCS] Fix the jar location of datanucleus in sql-programming-guid.md

2016-02-01 Thread marmbrus
Repository: spark Updated Branches: refs/heads/master 8f26eb5ef -> da9146c91 [DOCS] Fix the jar location of datanucleus in sql-programming-guid.md ISTM `lib` is better because `datanucleus` jars are located in `lib` for release builds. Author: Takeshi YAMAMURO

spark git commit: [SPARK-12926][SQL] SQLContext to display warning message when non-sql configs are being set

2016-01-28 Thread marmbrus
Repository: spark Updated Branches: refs/heads/master 415d0a859 -> 676803963 [SPARK-12926][SQL] SQLContext to display warning message when non-sql configs are being set Users unknowingly try to set core Spark configs in SQLContext but later realise that it didn't work. eg.

spark git commit: [SPARK-12975][SQL] Throwing Exception when Bucketing Columns are part of Partitioning Columns

2016-01-25 Thread marmbrus
It is just wasting extra CPU when reading or writing bucket tables. Thus, like Hive, we can issue an exception and let users do the change. Also added a test case for checking if the information of `sortBy` and `bucketBy` columns are correctly saved in the metastore table. Could you check if my unders

spark git commit: [SPARK-12816][SQL] De-alias type when generating schemas

2016-01-19 Thread marmbrus
Repository: spark Updated Branches: refs/heads/master 4dbd31612 -> c78e2080e [SPARK-12816][SQL] De-alias type when generating schemas Call `dealias` on local types to fix schema generation for abstract type members, such as ```scala type KeyValue = (Int, String) ``` Add simple test

spark git commit: [SQL][MINOR] BoundReference do not need to be NamedExpression

2016-01-15 Thread marmbrus
Repository: spark Updated Branches: refs/heads/master 61c45876f -> 3f1c58d60 [SQL][MINOR] BoundReference do not need to be NamedExpression We made it a `NamedExpression` to workaroud some hacky cases long time ago, and now seems it's safe to remove it. Author: Wenchen Fan

spark git commit: [SPARK-12813][SQL] Eliminate serialization for back to back operations

2016-01-14 Thread marmbrus
ion. - Eliminate serializations in more cases by adding more cases to `EliminateSerialization` Author: Michael Armbrust <mich...@databricks.com> Closes #10747 from marmbrus/encoderExpressions. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/

spark git commit: [HOT-FIX] bypass hive test when parse logical plan to json

2016-01-12 Thread marmbrus
Repository: spark Updated Branches: refs/heads/branch-1.6 03e523e52 -> f71e5cc12 [HOT-FIX] bypass hive test when parse logical plan to json https://github.com/apache/spark/pull/10311 introduces some rare, non-deterministic flakiness for hive udf tests, see

spark git commit: [SPARK-9843][SQL] Make catalyst optimizer pass pluggable at runtime

2016-01-12 Thread marmbrus
Repository: spark Updated Branches: refs/heads/master 1d8887953 -> 508592b1b [SPARK-9843][SQL] Make catalyst optimizer pass pluggable at runtime Let me know whether you'd like to see it in other place Author: Robert Kruszewski Closes #10210 from

spark git commit: [SPARK-12758][SQL] add note to Spark SQL Migration guide about TimestampType casting

2016-01-11 Thread marmbrus
Repository: spark Updated Branches: refs/heads/master a44991453 -> a767ee8a0 [SPARK-12758][SQL] add note to Spark SQL Migration guide about TimestampType casting Warning users about casting changes. Author: Brandon Bradley Closes #10708 from blbradley/spark-12758.

spark git commit: [SPARK-12758][SQL] add note to Spark SQL Migration guide about TimestampType casting

2016-01-11 Thread marmbrus
Repository: spark Updated Branches: refs/heads/branch-1.6 3b32aa9e2 -> dd2cf64f3 [SPARK-12758][SQL] add note to Spark SQL Migration guide about TimestampType casting Warning users about casting changes. Author: Brandon Bradley Closes #10708 from

[2/2] spark git commit: [SPARK-12696] Backport Dataset Bug fixes to 1.6

2016-01-08 Thread marmbrus
enc...@databricks.com> Author: gatorsmile <gatorsm...@gmail.com> Author: Liang-Chi Hsieh <vii...@gmail.com> Author: Cheng Lian <l...@databricks.com> Author: Nong Li <n...@databricks.com> Closes #10650 from marmbrus/dataset-backports. Project: http://git-wip-us.apache.org/repos/asf/spa

[1/2] spark git commit: [SPARK-12696] Backport Dataset Bug fixes to 1.6

2016-01-08 Thread marmbrus
Repository: spark Updated Branches: refs/heads/branch-1.6 faf094c7c -> a6190508b http://git-wip-us.apache.org/repos/asf/spark/blob/a6190508/sql/core/src/test/java/test/org/apache/spark/sql/JavaDatasetSuite.java -- diff --git

spark git commit: [SPARK-11878][SQL] Eliminate distribute by in case group by is present with exactly the same grouping expressi

2016-01-06 Thread marmbrus
Repository: spark Updated Branches: refs/heads/master 94c202c7d -> 9061e777f [SPARK-11878][SQL] Eliminate distribute by in case group by is present with exactly the same grouping expressi For queries like : select <> from table group by a distribute by a we can eliminate distribute by ;

svn commit: r1723144 - in /spark: releases/_posts/2016-01-04-spark-release-1-6-0.md site/releases/spark-release-1-6-0.html

2016-01-05 Thread marmbrus
Author: marmbrus Date: Tue Jan 5 18:18:18 2016 New Revision: 1723144 URL: http://svn.apache.org/viewvc?rev=1723144=rev Log: Update Spark 1.6 release notes Modified: spark/releases/_posts/2016-01-04-spark-release-1-6-0.md spark/site/releases/spark-release-1-6-0.html Modified: spark

spark git commit: [SPARK-12438][SQL] Add SQLUserDefinedType support for encoder

2016-01-05 Thread marmbrus
Repository: spark Updated Branches: refs/heads/master 1cdc42d2b -> b3c48e39f [SPARK-12438][SQL] Add SQLUserDefinedType support for encoder JIRA: https://issues.apache.org/jira/browse/SPARK-12438 ScalaReflection lacks the support of SQLUserDefinedType. We should add it. Author: Liang-Chi

spark git commit: [SPARK-12439][SQL] Fix toCatalystArray and MapObjects

2016-01-05 Thread marmbrus
Repository: spark Updated Branches: refs/heads/master 8ce645d4e -> d202ad2fc [SPARK-12439][SQL] Fix toCatalystArray and MapObjects JIRA: https://issues.apache.org/jira/browse/SPARK-12439 In toCatalystArray, we should look at the data type returned by dataTypeFor instead of silentSchemaFor,

spark git commit: [DOC] Adjust coverage for partitionBy()

2016-01-04 Thread marmbrus
Repository: spark Updated Branches: refs/heads/master 573ac55d7 -> 40d03960d [DOC] Adjust coverage for partitionBy() This is the related thread: http://search-hadoop.com/m/q3RTtO3ReeJ1iF02=Re+partitioning+json+data+in+spark Michael suggested fixing the doc. Please review. Author: tedyu

spark git commit: [DOC] Adjust coverage for partitionBy()

2016-01-04 Thread marmbrus
Repository: spark Updated Branches: refs/heads/branch-1.6 7f37c1e45 -> 1005ee396 [DOC] Adjust coverage for partitionBy() This is the related thread: http://search-hadoop.com/m/q3RTtO3ReeJ1iF02=Re+partitioning+json+data+in+spark Michael suggested fixing the doc. Please review. Author:

spark git commit: [SPARK-12421][SQL] Prevent Internal/External row from exposing state.

2016-01-04 Thread marmbrus
sed by the fact that scala's ArrayOps ```toArray``` (returned by calling ```toSeq```) will return the backing array instead of a copy. This PR fixes this problem. This PR was inspired by https://github.com/apache/spark/pull/10374 by apo1. cc apo1 sarutak marmbrus cloud-fan nongli (every

spark git commit: [SPARK-12512][SQL] support column name with dot in withColumn()

2016-01-04 Thread marmbrus
Repository: spark Updated Branches: refs/heads/master 43706bf8b -> 573ac55d7 [SPARK-12512][SQL] support column name with dot in withColumn() Author: Xiu Guo Closes #10500 from xguo27/SPARK-12512. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit:

[1/2] spark git commit: [SPARK-12600][SQL] Remove deprecated methods in Spark SQL

2016-01-04 Thread marmbrus
Repository: spark Updated Branches: refs/heads/master fdfac22d0 -> 77ab49b85 http://git-wip-us.apache.org/repos/asf/spark/blob/77ab49b8/sql/core/src/main/scala/org/apache/spark/sql/functions.scala -- diff --git

[2/2] spark git commit: [SPARK-12600][SQL] Remove deprecated methods in Spark SQL

2016-01-04 Thread marmbrus
[SPARK-12600][SQL] Remove deprecated methods in Spark SQL Author: Reynold Xin Closes #10559 from rxin/remove-deprecated-sql. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/77ab49b8 Tree:

spark git commit: [SPARK-12568][SQL] Add BINARY to Encoders

2016-01-04 Thread marmbrus
Repository: spark Updated Branches: refs/heads/branch-1.6 8950482ee -> d9e4438b5 [SPARK-12568][SQL] Add BINARY to Encoders Author: Michael Armbrust <mich...@databricks.com> Closes #10516 from marmbrus/datasetCleanup. (cherry picked from commit 53beddc5bf04a35ab73de99158919c2

spark git commit: [SPARK-12568][SQL] Add BINARY to Encoders

2016-01-04 Thread marmbrus
Repository: spark Updated Branches: refs/heads/master 7058dc115 -> 53beddc5b [SPARK-12568][SQL] Add BINARY to Encoders Author: Michael Armbrust <mich...@databricks.com> Closes #10516 from marmbrus/datasetCleanup. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: h

svn commit: r11815 - /dev/spark/spark-1.6.0-rc4/ /release/spark/spark-1.6.0/

2016-01-04 Thread marmbrus
Author: marmbrus Date: Mon Jan 4 16:20:48 2016 New Revision: 11815 Log: Release Spark 1.6.0 Added: release/spark/spark-1.6.0/ - copied from r11814, dev/spark/spark-1.6.0-rc4/ Removed: dev/spark/spark-1.6.0-rc4

svn commit: r1722911 [3/3] - in /spark: ./ js/ news/_posts/ releases/_posts/ site/ site/graphx/ site/js/ site/mllib/ site/news/ site/releases/ site/screencasts/ site/sql/ site/streaming/

2016-01-04 Thread marmbrus
Added: spark/site/releases/spark-release-1-6-0.html URL: http://svn.apache.org/viewvc/spark/site/releases/spark-release-1-6-0.html?rev=1722911=auto == --- spark/site/releases/spark-release-1-6-0.html (added) +++

svn commit: r1722911 [2/3] - in /spark: ./ js/ news/_posts/ releases/_posts/ site/ site/graphx/ site/js/ site/mllib/ site/news/ site/releases/ site/screencasts/ site/sql/ site/streaming/

2016-01-04 Thread marmbrus
Added: spark/site/news/spark-1-6-0-released.html URL: http://svn.apache.org/viewvc/spark/site/news/spark-1-6-0-released.html?rev=1722911=auto == --- spark/site/news/spark-1-6-0-released.html (added) +++

svn commit: r1722932 - in /spark/site/docs: ./ 1.6.0/ 1.6.0/api/ 1.6.0/api/R/ 1.6.0/api/java/ 1.6.0/api/java/lib/ 1.6.0/api/java/org/ 1.6.0/api/java/org/apache/ 1.6.0/api/java/org/apache/spark/ 1.6.0/

2016-01-04 Thread marmbrus
Author: marmbrus Date: Mon Jan 4 17:53:21 2016 New Revision: 1722932 URL: http://svn.apache.org/viewvc?rev=1722932=rev Log: Add Spark 1.6.0 docs [This commit notification would consist of 344 parts, which exceeds the limit of 50 ones, so it was shortened to the summary

svn commit: r1722911 [1/3] - in /spark: ./ js/ news/_posts/ releases/_posts/ site/ site/graphx/ site/js/ site/mllib/ site/news/ site/releases/ site/screencasts/ site/sql/ site/streaming/

2016-01-04 Thread marmbrus
Author: marmbrus Date: Mon Jan 4 16:22:05 2016 New Revision: 1722911 URL: http://svn.apache.org/viewvc?rev=1722911=rev Log: Update site for Spark 1.6.0 Added: spark/news/_posts/2016-01-04-spark-1-6-0-released.md spark/releases/_posts/2016-01-04-spark-release-1-6-0.md spark/site/news

<    1   2   3   4   5   6   7   8   9   10   >