spark git commit: [SPARK-20703][SQL][FOLLOW-UP] Associate metrics with data writes onto DataFrameWriter operations

2017-07-06 Thread wenchen
Repository: spark Updated Branches: refs/heads/master c09b31eb8 -> 5df99bd36 [SPARK-20703][SQL][FOLLOW-UP] Associate metrics with data writes onto DataFrameWriter operations ## What changes were proposed in this pull request? Remove time metrics since it seems no way to measure it in non

spark git commit: [SPARK-21217][SQL] Support ColumnVector.Array.toArray()

2017-07-06 Thread wenchen
Repository: spark Updated Branches: refs/heads/master 53c2eb59b -> c09b31eb8 [SPARK-21217][SQL] Support ColumnVector.Array.toArray() ## What changes were proposed in this pull request? This PR implements bulk-copy for `ColumnVector.Array.toArray()` methods (e.g. `toIntArray()`) in

spark git commit: [SPARK-21327][SQL][PYSPARK] ArrayConstructor should handle an array of typecode 'l' as long rather than int in Python 2.

2017-07-06 Thread ueshin
Repository: spark Updated Branches: refs/heads/master d451b7f43 -> 53c2eb59b [SPARK-21327][SQL][PYSPARK] ArrayConstructor should handle an array of typecode 'l' as long rather than int in Python 2. ## What changes were proposed in this pull request? Currently `ArrayConstructor` handles an

spark git commit: [SPARK-21326][SPARK-21066][ML] Use TextFileFormat in LibSVMFileFormat and allow multiple input paths for determining numFeatures

2017-07-06 Thread wenchen
Repository: spark Updated Branches: refs/heads/master e5bb26174 -> d451b7f43 [SPARK-21326][SPARK-21066][ML] Use TextFileFormat in LibSVMFileFormat and allow multiple input paths for determining numFeatures ## What changes were proposed in this pull request? This is related with

spark git commit: [SPARK-21329][SS] Make EventTimeWatermarkExec explicitly UnaryExecNode

2017-07-06 Thread zsxwing
Repository: spark Updated Branches: refs/heads/master 40c7add3a -> e5bb26174 [SPARK-21329][SS] Make EventTimeWatermarkExec explicitly UnaryExecNode ## What changes were proposed in this pull request? Making EventTimeWatermarkExec explicitly UnaryExecNode /cc tdas zsxwing ## How was this

spark git commit: [SPARK-20946][SQL] Do not update conf for existing SparkContext in SparkSession.getOrCreate

2017-07-06 Thread wenchen
Repository: spark Updated Branches: refs/heads/master 0217dfd26 -> 40c7add3a [SPARK-20946][SQL] Do not update conf for existing SparkContext in SparkSession.getOrCreate ## What changes were proposed in this pull request? SparkContext is shared by all sessions, we should not update its conf

spark git commit: [SPARK-21267][SS][DOCS] Update Structured Streaming Documentation

2017-07-06 Thread zsxwing
Repository: spark Updated Branches: refs/heads/branch-2.2 4e53a4edd -> 576fd4c3a [SPARK-21267][SS][DOCS] Update Structured Streaming Documentation ## What changes were proposed in this pull request? Few changes to the Structured Streaming documentation - Clarify that the entire stream input

spark git commit: [SPARK-21323][SQL] Rename plans.logical.statsEstimation.Range to ValueInterval

2017-07-06 Thread rxin
Repository: spark Updated Branches: refs/heads/master 48e44b24a -> bf66335ac [SPARK-21323][SQL] Rename plans.logical.statsEstimation.Range to ValueInterval ## What changes were proposed in this pull request? Rename org.apache.spark.sql.catalyst.plans.logical.statsEstimation.Range to

spark git commit: [SPARK-21204][SQL] Add support for Scala Set collection types in serialization

2017-07-06 Thread wenchen
Repository: spark Updated Branches: refs/heads/master 26ac085de -> 48e44b24a [SPARK-21204][SQL] Add support for Scala Set collection types in serialization ## What changes were proposed in this pull request? Currently we can't produce a `Dataset` containing `Set` in SparkSQL. This PR tries

spark git commit: [SPARK-21228][SQL] InSet incorrect handling of structs

2017-07-06 Thread wenchen
Repository: spark Updated Branches: refs/heads/master 565e7a8d4 -> 26ac085de [SPARK-21228][SQL] InSet incorrect handling of structs ## What changes were proposed in this pull request? When data type is struct, InSet now uses TypeUtils.getInterpretedOrdering (similar to EqualTo) to build a

spark git commit: [SPARK-20950][CORE] add a new config to diskWriteBufferSize which is hard coded before

2017-07-06 Thread wenchen
Repository: spark Updated Branches: refs/heads/master d540dfbff -> 565e7a8d4 [SPARK-20950][CORE] add a new config to diskWriteBufferSize which is hard coded before ## What changes were proposed in this pull request? This PR Improvement in two: 1.With spark.shuffle.spill.diskWriteBufferSize

spark git commit: [SPARK-21273][SQL][FOLLOW-UP] Add missing test cases back and revise code style

2017-07-06 Thread wenchen
Repository: spark Updated Branches: refs/heads/master b8e4d567a -> d540dfbff [SPARK-21273][SQL][FOLLOW-UP] Add missing test cases back and revise code style ## What changes were proposed in this pull request? Add missing test cases back and revise code style Follow up the previous PR:

spark-website git commit: Update Sandy.

2017-07-06 Thread srowen
Repository: spark-website Updated Branches: refs/heads/asf-site 8a9ae6b7d -> 878dcfd84 Update Sandy. Project: http://git-wip-us.apache.org/repos/asf/spark-website/repo Commit: http://git-wip-us.apache.org/repos/asf/spark-website/commit/878dcfd8 Tree:

spark-website git commit: Update committer page.

2017-07-06 Thread srowen
Repository: spark-website Updated Branches: refs/heads/asf-site 9749c8e2f -> 8a9ae6b7d Update committer page. Project: http://git-wip-us.apache.org/repos/asf/spark-website/repo Commit: http://git-wip-us.apache.org/repos/asf/spark-website/commit/8a9ae6b7 Tree:

spark git commit: [SPARK-21324][TEST] Improve statistics test suites

2017-07-06 Thread wenchen
Repository: spark Updated Branches: refs/heads/master 6ff05a66f -> b8e4d567a [SPARK-21324][TEST] Improve statistics test suites ## What changes were proposed in this pull request? 1. move `StatisticsCollectionTestBase` to a separate file. 2. move some test cases to

spark git commit: [SPARK-20703][SQL] Associate metrics with data writes onto DataFrameWriter operations

2017-07-06 Thread wenchen
Repository: spark Updated Branches: refs/heads/master 5800144a5 -> 6ff05a66f [SPARK-20703][SQL] Associate metrics with data writes onto DataFrameWriter operations ## What changes were proposed in this pull request? Right now in the UI, after SPARK-20213, we can show the operations to write

spark git commit: [SPARK-21012][SUBMIT] Add glob support for resources adding to Spark

2017-07-06 Thread wenchen
Repository: spark Updated Branches: refs/heads/master 60043f224 -> 5800144a5 [SPARK-21012][SUBMIT] Add glob support for resources adding to Spark Current "--jars (spark.jars)", "--files (spark.files)", "--py-files (spark.submit.pyFiles)" and "--archives (spark.yarn.dist.archives)" only

spark git commit: [SS][MINOR] Fix flaky test in DatastreamReaderWriterSuite. temp checkpoint dir should be deleted

2017-07-06 Thread tdas
Repository: spark Updated Branches: refs/heads/branch-2.2 6e1081cbe -> 4e53a4edd [SS][MINOR] Fix flaky test in DatastreamReaderWriterSuite. temp checkpoint dir should be deleted ## What changes were proposed in this pull request? Stopping query while it is being initialized can throw

spark git commit: [SS][MINOR] Fix flaky test in DatastreamReaderWriterSuite. temp checkpoint dir should be deleted

2017-07-06 Thread tdas
Repository: spark Updated Branches: refs/heads/master 14a3bb3a0 -> 60043f224 [SS][MINOR] Fix flaky test in DatastreamReaderWriterSuite. temp checkpoint dir should be deleted ## What changes were proposed in this pull request? Stopping query while it is being initialized can throw interrupt

spark git commit: [SPARK-21312][SQL] correct offsetInBytes in UnsafeRow.writeToStream

2017-07-06 Thread wenchen
Repository: spark Updated Branches: refs/heads/branch-2.1 8f1ca6957 -> 7f7b63bb6 [SPARK-21312][SQL] correct offsetInBytes in UnsafeRow.writeToStream ## What changes were proposed in this pull request? Corrects offsetInBytes calculation in UnsafeRow.writeToStream. Known failures include

spark git commit: [SPARK-21312][SQL] correct offsetInBytes in UnsafeRow.writeToStream

2017-07-06 Thread wenchen
Repository: spark Updated Branches: refs/heads/branch-2.2 770fd2a23 -> 6e1081cbe [SPARK-21312][SQL] correct offsetInBytes in UnsafeRow.writeToStream ## What changes were proposed in this pull request? Corrects offsetInBytes calculation in UnsafeRow.writeToStream. Known failures include

spark git commit: [SPARK-21312][SQL] correct offsetInBytes in UnsafeRow.writeToStream

2017-07-06 Thread wenchen
Repository: spark Updated Branches: refs/heads/master 75b168fd3 -> 14a3bb3a0 [SPARK-21312][SQL] correct offsetInBytes in UnsafeRow.writeToStream ## What changes were proposed in this pull request? Corrects offsetInBytes calculation in UnsafeRow.writeToStream. Known failures include writes

spark git commit: [SPARK-21308][SQL] Remove SQLConf parameters from the optimizer

2017-07-06 Thread wenchen
Repository: spark Updated Branches: refs/heads/master ab866f117 -> 75b168fd3 [SPARK-21308][SQL] Remove SQLConf parameters from the optimizer ### What changes were proposed in this pull request? This PR removes SQLConf parameters from the optimizer rules ### How was this patch tested? The