spark git commit: [MINOR][DOC] automatic type inference supports also Date and Timestamp

2017-11-01 Thread gurwls223
Repository: spark Updated Branches: refs/heads/master d43e1f06b -> b04eefae4 [MINOR][DOC] automatic type inference supports also Date and Timestamp ## What changes were proposed in this pull request? Easy fix in the documentation, which is reporting that only numeric types and string are

spark git commit: [MINOR][DOC] automatic type inference supports also Date and Timestamp

2017-11-01 Thread gurwls223
Repository: spark Updated Branches: refs/heads/branch-2.2 ab87a92a1 -> c311c5e79 [MINOR][DOC] automatic type inference supports also Date and Timestamp ## What changes were proposed in this pull request? Easy fix in the documentation, which is reporting that only numeric types and string

spark git commit: [MINOR][DOC] automatic type inference supports also Date and Timestamp

2017-11-01 Thread gurwls223
Repository: spark Updated Branches: refs/heads/branch-2.1 3d6d88996 -> 8b572116f [MINOR][DOC] automatic type inference supports also Date and Timestamp ## What changes were proposed in this pull request? Easy fix in the documentation, which is reporting that only numeric types and string

spark git commit: [BUILD] Close stale PRs

2017-11-07 Thread gurwls223
Repository: spark Updated Branches: refs/heads/master 9df08e218 -> ed1478cfe [BUILD] Close stale PRs Closes #11494 Closes #14158 Closes #16803 Closes #16864 Closes #17455 Closes #17936 Closes #19377 Added: Closes #19380 Closes #18642 Closes #18377 Closes #19632 Added: Closes #14471 Closes

spark git commit: [SPARK-22376][TESTS] Makes dev/run-tests.py script compatible with Python 3

2017-11-07 Thread gurwls223
Repository: spark Updated Branches: refs/heads/master ed1478cfe -> 160a54061 [SPARK-22376][TESTS] Makes dev/run-tests.py script compatible with Python 3 ## What changes were proposed in this pull request? This PR proposes to fix `dev/run-tests.py` script to support Python 3. Here are some

spark git commit: [SPARK-22466][SPARK SUBMIT] export SPARK_CONF_DIR while conf is default

2017-11-08 Thread gurwls223
Repository: spark Updated Branches: refs/heads/master 6447d7bc1 -> ee571d79e [SPARK-22466][SPARK SUBMIT] export SPARK_CONF_DIR while conf is default ## What changes were proposed in this pull request? We use SPARK_CONF_DIR to switch spark conf directory and can be visited if we explicitly

spark git commit: [SPARK-22456][SQL] Add support for dayofweek function

2017-11-08 Thread gurwls223
Repository: spark Updated Branches: refs/heads/master ee571d79e -> d01044233 [SPARK-22456][SQL] Add support for dayofweek function ## What changes were proposed in this pull request? This PR adds support for a new function called `dayofweek` that returns the day of the week of the given

spark git commit: [SPARK-21640][SQL][PYTHON][R][FOLLOWUP] Add errorifexists in SparkR and other documentations

2017-11-08 Thread gurwls223
Repository: spark Updated Branches: refs/heads/master d01044233 -> 695647bf2 [SPARK-21640][SQL][PYTHON][R][FOLLOWUP] Add errorifexists in SparkR and other documentations ## What changes were proposed in this pull request? This PR proposes to add `errorifexists` to SparkR API and fix the

spark git commit: [SPARK-22222][CORE][TEST][FOLLOW-UP] Remove redundant and deprecated `Timeouts`

2017-11-08 Thread gurwls223
Repository: spark Updated Branches: refs/heads/master 695647bf2 -> 98be55c0f [SPARK-2][CORE][TEST][FOLLOW-UP] Remove redundant and deprecated `Timeouts` ## What changes were proposed in this pull request? Since SPARK-21939, Apache Spark uses `TimeLimits` instead of the deprecated

spark git commit: [SPARK-22437][PYSPARK] default mode for jdbc is wrongly set to None

2017-11-04 Thread gurwls223
Repository: spark Updated Branches: refs/heads/master bc1e10103 -> e7adb7d7a [SPARK-22437][PYSPARK] default mode for jdbc is wrongly set to None ## What changes were proposed in this pull request? When writing using jdbc with python currently we are wrongly assigning by default None as

spark git commit: [SPARK-22655][PYSPARK] Throw exception rather than exit silently in PythonRunner when Spark session is stopped

2017-12-08 Thread gurwls223
Repository: spark Updated Branches: refs/heads/master f28b1a4c4 -> 26e66453d [SPARK-22655][PYSPARK] Throw exception rather than exit silently in PythonRunner when Spark session is stopped ## What changes were proposed in this pull request? During Spark shutdown, if there are some active

spark git commit: [SPARK-12297][SQL] Adjust timezone for int96 data from impala

2017-12-08 Thread gurwls223
Repository: spark Updated Branches: refs/heads/master e4639fa68 -> acf7ef315 [SPARK-12297][SQL] Adjust timezone for int96 data from impala ## What changes were proposed in this pull request? Int96 data written by impala vs data written by hive & spark is stored slightly differently -- they

spark git commit: [SPARK-20728][SQL][FOLLOWUP] Use an actionable exception message

2017-12-06 Thread gurwls223
Repository: spark Updated Branches: refs/heads/master 00d176d2f -> fb6a92275 [SPARK-20728][SQL][FOLLOWUP] Use an actionable exception message ## What changes were proposed in this pull request? This is a follow-up of https://github.com/apache/spark/pull/19871 to improve an exception

spark git commit: [SPARK-22672][SQL][TEST][FOLLOWUP] Fix to use `spark.conf`

2017-12-09 Thread gurwls223
Repository: spark Updated Branches: refs/heads/master acf7ef315 -> 251b2c03b [SPARK-22672][SQL][TEST][FOLLOWUP] Fix to use `spark.conf` ## What changes were proposed in this pull request? During https://github.com/apache/spark/pull/19882, `conf` is mistakenly used to switch ORC

spark git commit: [SPARK-3685][CORE] Prints explicit warnings when configured local directories are set to URIs

2017-12-12 Thread gurwls223
Repository: spark Updated Branches: refs/heads/master ecc179eca -> bc8933faf [SPARK-3685][CORE] Prints explicit warnings when configured local directories are set to URIs ## What changes were proposed in this pull request? This PR proposes to print warnings before creating local by

spark git commit: [SPARK-22375][TEST] Test script can fail if eggs are installed by set…

2017-10-29 Thread gurwls223
Repository: spark Updated Branches: refs/heads/master 7fdacbc77 -> 544a1ba67 [SPARK-22375][TEST] Test script can fail if eggs are installed by set… …up.py during test process ## What changes were proposed in this pull request? Ignore the python/.eggs folder when running lint-python ##

spark git commit: [SPARK-22302][INFRA] Remove manual backports for subprocess and print explicit message for < Python 2.7

2017-10-21 Thread gurwls223
Repository: spark Updated Branches: refs/heads/master a763607e4 -> ff8de99a1 [SPARK-22302][INFRA] Remove manual backports for subprocess and print explicit message for < Python 2.7 ## What changes were proposed in this pull request? Seems there was a mistake - missing import for

[2/2] spark git commit: [SPARK-22324][SQL][PYTHON] Upgrade Arrow to 0.8.0

2017-12-21 Thread gurwls223
[SPARK-22324][SQL][PYTHON] Upgrade Arrow to 0.8.0 ## What changes were proposed in this pull request? Upgrade Spark to Arrow 0.8.0 for Java and Python. Also includes an upgrade of Netty to 4.1.17 to resolve dependency requirements. The highlights that pertain to Spark for the update from

[1/2] spark git commit: [SPARK-22324][SQL][PYTHON] Upgrade Arrow to 0.8.0

2017-12-21 Thread gurwls223
Repository: spark Updated Branches: refs/heads/master cb9fc8d9b -> 59d52631e http://git-wip-us.apache.org/repos/asf/spark/blob/59d52631/sql/core/src/test/scala/org/apache/spark/sql/execution/arrow/ArrowConvertersSuite.scala --

spark git commit: [SPARK-23627][SQL] Provide isEmpty in Dataset

2018-05-15 Thread gurwls223
Repository: spark Updated Branches: refs/heads/master 9059f1ee6 -> e29176fd7 [SPARK-23627][SQL] Provide isEmpty in Dataset ## What changes were proposed in this pull request? This PR adds isEmpty() in DataSet ## How was this patch tested? Unit tests added Please review

spark git commit: [SPARK-24128][SQL] Mention configuration option in implicit CROSS JOIN error

2018-05-07 Thread gurwls223
Repository: spark Updated Branches: refs/heads/branch-2.3 3a22feab4 -> 4dc6719e9 [SPARK-24128][SQL] Mention configuration option in implicit CROSS JOIN error ## What changes were proposed in this pull request? Mention `spark.sql.crossJoin.enabled` in error message when an implicit `CROSS

spark git commit: [SPARK-24128][SQL] Mention configuration option in implicit CROSS JOIN error

2018-05-07 Thread gurwls223
Repository: spark Updated Branches: refs/heads/master 0d63eb888 -> cd12c5c3e [SPARK-24128][SQL] Mention configuration option in implicit CROSS JOIN error ## What changes were proposed in this pull request? Mention `spark.sql.crossJoin.enabled` in error message when an implicit `CROSS JOIN`

spark git commit: [SPARK-24068] Propagating DataFrameReader's options to Text datasource on schema inferring

2018-05-08 Thread gurwls223
Repository: spark Updated Branches: refs/heads/master 487faf17a -> e3de6ab30 [SPARK-24068] Propagating DataFrameReader's options to Text datasource on schema inferring ## What changes were proposed in this pull request? While reading CSV or JSON files, DataFrameReader's options are

spark git commit: [SPARK-23355][SQL][DOC][FOLLOWUP] Add migration doc for TBLPROPERTIES

2018-05-08 Thread gurwls223
Repository: spark Updated Branches: refs/heads/master e3de6ab30 -> 9498e528d [SPARK-23355][SQL][DOC][FOLLOWUP] Add migration doc for TBLPROPERTIES ## What changes were proposed in this pull request? In Apache Spark 2.4, [SPARK-23355](https://issues.apache.org/jira/browse/SPARK-23355) fixes

spark git commit: [SPARK-23736][SQL][FOLLOWUP] Error message should contains SQL types

2018-04-27 Thread gurwls223
Repository: spark Updated Branches: refs/heads/master 1fb46f30f -> ad94e8592 [SPARK-23736][SQL][FOLLOWUP] Error message should contains SQL types ## What changes were proposed in this pull request? In the error messages we should return the SQL types (like `string` rather than the internal

spark git commit: [SPARK-24068][BACKPORT-2.3] Propagating DataFrameReader's options to Text datasource on schema inferring

2018-05-10 Thread gurwls223
Repository: spark Updated Branches: refs/heads/branch-2.3 8889d7864 -> eab10f994 [SPARK-24068][BACKPORT-2.3] Propagating DataFrameReader's options to Text datasource on schema inferring ## What changes were proposed in this pull request? While reading CSV or JSON files, DataFrameReader's

spark git commit: [SPARK-24198][SPARKR][SQL] Adding slice function to SparkR

2018-05-12 Thread gurwls223
Repository: spark Updated Branches: refs/heads/master e3dabdf6e -> 5902125ac [SPARK-24198][SPARKR][SQL] Adding slice function to SparkR ## What changes were proposed in this pull request? The PR adds the `slice` function to SparkR. The function returns a subset of consecutive elements from

spark git commit: [SPARK-23907] Removes regr_* functions in functions.scala

2018-05-11 Thread gurwls223
Repository: spark Updated Branches: refs/heads/master f27a035da -> e3dabdf6e [SPARK-23907] Removes regr_* functions in functions.scala ## What changes were proposed in this pull request? This patch removes the various regr_* functions in functions.scala. They are so uncommon that I don't

spark git commit: [SPARK-24186][R][SQL] change reverse and concat to collection functions in R

2018-05-13 Thread gurwls223
Repository: spark Updated Branches: refs/heads/master 2fa33649d -> 3f0e801c1 [SPARK-24186][R][SQL] change reverse and concat to collection functions in R ## What changes were proposed in this pull request? reverse and concat are already in functions.R as column string functions. Since now

spark git commit: [SPARK-17916][SQL] Fix empty string being parsed as null when nullValue is set.

2018-05-13 Thread gurwls223
Repository: spark Updated Branches: refs/heads/master 3f0e801c1 -> 7a2d4895c [SPARK-17916][SQL] Fix empty string being parsed as null when nullValue is set. ## What changes were proposed in this pull request? I propose to bump version of uniVocity parser up to 2.6.3 where quoted empty

spark git commit: [SPARK-24228][SQL] Fix Java lint errors

2018-05-13 Thread gurwls223
Repository: spark Updated Branches: refs/heads/master 7a2d4895c -> b6c50d782 [SPARK-24228][SQL] Fix Java lint errors ## What changes were proposed in this pull request? This PR fixes the following Java lint errors due to importing unimport classes ``` $ dev/lint-java Using `mvn` from path:

spark git commit: [SPARK-22938][SQL][FOLLOWUP] Assert that SQLConf.get is accessed only on the driver

2018-05-10 Thread gurwls223
Repository: spark Updated Branches: refs/heads/master d3c426a5b -> a4206d58e [SPARK-22938][SQL][FOLLOWUP] Assert that SQLConf.get is accessed only on the driver ## What changes were proposed in this pull request? This is a followup of https://github.com/apache/spark/pull/20136 . #20136

spark git commit: [SPARK-24197][SPARKR][SQL] Adding array_sort function to SparkR

2018-05-10 Thread gurwls223
Repository: spark Updated Branches: refs/heads/master a4206d58e -> 75cf369c7 [SPARK-24197][SPARKR][SQL] Adding array_sort function to SparkR ## What changes were proposed in this pull request? The PR adds array_sort function to SparkR. ## How was this patch tested? Tests added into

spark git commit: [SPARK-24131][PYSPARK][FOLLOWUP] Add majorMinorVersion API to PySpark for determining Spark versions

2018-05-08 Thread gurwls223
Repository: spark Updated Branches: refs/heads/master e17567ca7 -> b54bbe57b [SPARK-24131][PYSPARK][FOLLOWUP] Add majorMinorVersion API to PySpark for determining Spark versions ## What changes were proposed in this pull request? More close to Scala API behavior when can't parse input by

spark git commit: [SPARK-23094][SPARK-23723][SPARK-23724][SQL][FOLLOW-UP] Support custom encoding for json files

2018-05-08 Thread gurwls223
Repository: spark Updated Branches: refs/heads/master b54bbe57b -> 2f6fe7d67 [SPARK-23094][SPARK-23723][SPARK-23724][SQL][FOLLOW-UP] Support custom encoding for json files ## What changes were proposed in this pull request? This is to add a test case to check the behaviors when users write

spark git commit: [SPARK-24185][SPARKR][SQL] add flatten function to SparkR

2018-05-05 Thread gurwls223
Repository: spark Updated Branches: refs/heads/master 47b5b6852 -> dd4b1b9c7 [SPARK-24185][SPARKR][SQL] add flatten function to SparkR ## What changes were proposed in this pull request? add array flatten function to SparkR ## How was this patch tested? Unit tests were added in

spark git commit: [MINOR][PROJECT-INFRA] Check if 'original_head' variable is defined in clean_up at merge script

2018-05-20 Thread gurwls223
Repository: spark Updated Branches: refs/heads/master 8eac62122 -> f32b7faf7 [MINOR][PROJECT-INFRA] Check if 'original_head' variable is defined in clean_up at merge script ## What changes were proposed in this pull request? This PR proposes to check if global variable exists or not in

spark git commit: Correct reference to Offset class

2018-05-22 Thread gurwls223
Repository: spark Updated Branches: refs/heads/branch-2.3 efe183f7b -> ed0060ce7 Correct reference to Offset class This is a documentation-only correction; `org.apache.spark.sql.sources.v2.reader.Offset` is actually `org.apache.spark.sql.sources.v2.reader.streaming.Offset`. Author: Seth

spark git commit: Correct reference to Offset class

2018-05-22 Thread gurwls223
Repository: spark Updated Branches: refs/heads/master 79e06faa4 -> 00c13cfad Correct reference to Offset class This is a documentation-only correction; `org.apache.spark.sql.sources.v2.reader.Offset` is actually `org.apache.spark.sql.sources.v2.reader.streaming.Offset`. Author: Seth

spark git commit: [SPARK-24364][SS] Prevent InMemoryFileIndex from failing if file path doesn't exist

2018-05-23 Thread gurwls223
Repository: spark Updated Branches: refs/heads/branch-2.3 75e2cd131 -> 068c4ae34 [SPARK-24364][SS] Prevent InMemoryFileIndex from failing if file path doesn't exist ## What changes were proposed in this pull request? This PR proposes to follow up https://github.com/apache/spark/pull/15153

spark git commit: [SPARK-24364][SS] Prevent InMemoryFileIndex from failing if file path doesn't exist

2018-05-23 Thread gurwls223
Repository: spark Updated Branches: refs/heads/master e108f84f5 -> 8a545822d [SPARK-24364][SS] Prevent InMemoryFileIndex from failing if file path doesn't exist ## What changes were proposed in this pull request? This PR proposes to follow up https://github.com/apache/spark/pull/15153 and

spark git commit: [SPARK-22269][BUILD] Run Java linter via SBT for Jenkins

2018-05-24 Thread gurwls223
Repository: spark Updated Branches: refs/heads/master 8a545822d -> 4a14dc0af [SPARK-22269][BUILD] Run Java linter via SBT for Jenkins ## What changes were proposed in this pull request? This PR proposes to check Java lint via SBT for Jenkins. It uses the SBT wrapper for checkstyle. I

spark git commit: [SPARK-19112][CORE][FOLLOW-UP] Add missing shortCompressionCodecNames to configuration.

2018-05-26 Thread gurwls223
Repository: spark Updated Branches: refs/heads/master 1b1528a50 -> ed1a65448 [SPARK-19112][CORE][FOLLOW-UP] Add missing shortCompressionCodecNames to configuration. ## What changes were proposed in this pull request? Spark provides four codecs: `lz4`, `lzf`, `snappy`, and `zstd`. This pr

spark git commit: [SPARK-24329][SQL] Test for skipping multi-space lines

2018-05-24 Thread gurwls223
Repository: spark Updated Branches: refs/heads/master 3469f5c98 -> 13bedc05c [SPARK-24329][SQL] Test for skipping multi-space lines ## What changes were proposed in this pull request? The PR is a continue of https://github.com/apache/spark/pull/21380 . It checks cases that are handled by

spark git commit: [SPARK-24378][SQL] Fix date_trunc function incorrect examples

2018-05-24 Thread gurwls223
Repository: spark Updated Branches: refs/heads/master 13bedc05c -> 0d8994344 [SPARK-24378][SQL] Fix date_trunc function incorrect examples ## What changes were proposed in this pull request? Fix `date_trunc` function incorrect examples. ## How was this patch tested? N/A Author: Yuming

spark git commit: [SPARK-24378][SQL] Fix date_trunc function incorrect examples

2018-05-24 Thread gurwls223
Repository: spark Updated Branches: refs/heads/branch-2.3 f48d62400 -> d0f30e3f3 [SPARK-24378][SQL] Fix date_trunc function incorrect examples ## What changes were proposed in this pull request? Fix `date_trunc` function incorrect examples. ## How was this patch tested? N/A Author: Yuming

spark git commit: [SPARK-24367][SQL] Parquet: use JOB_SUMMARY_LEVEL instead of deprecated flag ENABLE_JOB_SUMMARY

2018-05-24 Thread gurwls223
Repository: spark Updated Branches: refs/heads/master 0fd68cb72 -> 3b20b34ab [SPARK-24367][SQL] Parquet: use JOB_SUMMARY_LEVEL instead of deprecated flag ENABLE_JOB_SUMMARY ## What changes were proposed in this pull request? In current parquet version,the conf ENABLE_JOB_SUMMARY is

spark git commit: [SPARK-21945][YARN][PYTHON] Make --py-files work with PySpark shell in Yarn client mode

2018-05-16 Thread gurwls223
Repository: spark Updated Branches: refs/heads/master bfd75cdfb -> 9a641e7f7 [SPARK-21945][YARN][PYTHON] Make --py-files work with PySpark shell in Yarn client mode ## What changes were proposed in this pull request? ### Problem When we run _PySpark shell with Yarn client mode_, specified

spark git commit: [SPARK-24242][SQL] RangeExec should have correct outputOrdering and outputPartitioning

2018-05-21 Thread gurwls223
Repository: spark Updated Branches: refs/heads/master f32b7faf7 -> 6d7d45a1a [SPARK-24242][SQL] RangeExec should have correct outputOrdering and outputPartitioning ## What changes were proposed in this pull request? Logical `Range` node has been added with `outputOrdering` recently. It's

spark git commit: [SPARK-24323][SQL] Fix lint-java errors

2018-05-21 Thread gurwls223
Repository: spark Updated Branches: refs/heads/master 6d7d45a1a -> e480eccd9 [SPARK-24323][SQL] Fix lint-java errors ## What changes were proposed in this pull request? This PR fixes the following errors reported by `lint-java` ``` % dev/lint-java Using `mvn` from path: /usr/bin/mvn

spark git commit: [SPARK-23732][DOCS] Fix source links in generated scaladoc.

2018-06-11 Thread gurwls223
Repository: spark Updated Branches: refs/heads/master 048197749 -> dc22465f3 [SPARK-23732][DOCS] Fix source links in generated scaladoc. Apply the suggestion on the bug to fix source links. Tested with the 2.3.1 release docs. Author: Marcelo Vanzin Closes #21521 from vanzin/SPARK-23732.

spark git commit: [SPARK-23732][DOCS] Fix source links in generated scaladoc.

2018-06-11 Thread gurwls223
Repository: spark Updated Branches: refs/heads/branch-2.3 1582945d0 -> 4d4548a6e [SPARK-23732][DOCS] Fix source links in generated scaladoc. Apply the suggestion on the bug to fix source links. Tested with the 2.3.1 release docs. Author: Marcelo Vanzin Closes #21521 from

spark git commit: [SPARK-17756][PYTHON][STREAMING] Workaround to avoid return type mismatch in PythonTransformFunction

2018-06-08 Thread gurwls223
Repository: spark Updated Branches: refs/heads/master 1a644afba -> b070ded28 [SPARK-17756][PYTHON][STREAMING] Workaround to avoid return type mismatch in PythonTransformFunction ## What changes were proposed in this pull request? This PR proposes to wrap the transformed rdd within

spark git commit: [SPARK-23772][SQL] Provide an option to ignore column of all null values or empty array during JSON schema inference

2018-06-18 Thread gurwls223
Repository: spark Updated Branches: refs/heads/master b0a935255 -> e219e692e [SPARK-23772][SQL] Provide an option to ignore column of all null values or empty array during JSON schema inference ## What changes were proposed in this pull request? This pr added a new JSON option

spark git commit: [SPARK-24526][BUILD][TEST-MAVEN] Spaces in the build dir causes failures in the build/mvn script

2018-06-18 Thread gurwls223
Repository: spark Updated Branches: refs/heads/master e219e692e -> bce177552 [SPARK-24526][BUILD][TEST-MAVEN] Spaces in the build dir causes failures in the build/mvn script ## What changes were proposed in this pull request? Fix the call to ${MVN_BIN} to be wrapped in quotes so it will

spark git commit: [SPARK-22239][SQL][PYTHON] Enable grouped aggregate pandas UDFs as window functions with unbounded window frames

2018-06-12 Thread gurwls223
Repository: spark Updated Branches: refs/heads/master f53818d35 -> 9786ce66c [SPARK-22239][SQL][PYTHON] Enable grouped aggregate pandas UDFs as window functions with unbounded window frames ## What changes were proposed in this pull request? This PR enables using a grouped aggregate pandas

spark git commit: [SPARK-24485][SS] Measure and log elapsed time for filesystem operations in HDFSBackedStateStoreProvider

2018-06-12 Thread gurwls223
Repository: spark Updated Branches: refs/heads/master 3352d6fe9 -> 4c388bccf [SPARK-24485][SS] Measure and log elapsed time for filesystem operations in HDFSBackedStateStoreProvider ## What changes were proposed in this pull request? This patch measures and logs elapsed time for each

spark git commit: [SPARK-23754][PYTHON][FOLLOWUP][BACKPORT-2.3] Move UDF stop iteration wrapping from driver to executor

2018-06-12 Thread gurwls223
Repository: spark Updated Branches: refs/heads/branch-2.3 a55de387d -> 470cacd49 [SPARK-23754][PYTHON][FOLLOWUP][BACKPORT-2.3] Move UDF stop iteration wrapping from driver to executor SPARK-23754 was fixed in #21383 by changing the UDF code to wrap the user function, but this required a

spark git commit: [SPARK-24479][SS] Added config for registering streamingQueryListeners

2018-06-13 Thread gurwls223
Repository: spark Updated Branches: refs/heads/master 4c388bccf -> 7703b46d2 [SPARK-24479][SS] Added config for registering streamingQueryListeners ## What changes were proposed in this pull request? Currently a "StreamingQueryListener" can only be registered programatically. We could have

spark git commit: [SPARK-24573][INFRA] Runs SBT checkstyle after the build to work around a side-effect

2018-06-18 Thread gurwls223
Repository: spark Updated Branches: refs/heads/master c7c0b086a -> b0a935255 [SPARK-24573][INFRA] Runs SBT checkstyle after the build to work around a side-effect ## What changes were proposed in this pull request? Seems checkstyle affects the build in the PR builder in Jenkins. I can't

spark git commit: [SPARK-23754][PYTHON][FOLLOWUP] Move UDF stop iteration wrapping from driver to executor

2018-06-10 Thread gurwls223
Repository: spark Updated Branches: refs/heads/master f07c5064a -> 3e5b4ae63 [SPARK-23754][PYTHON][FOLLOWUP] Move UDF stop iteration wrapping from driver to executor ## What changes were proposed in this pull request? SPARK-23754 was fixed in #21383 by changing the UDF code to wrap the user

spark git commit: [MINOR][CORE][TEST] Remove unnecessary sort in UnsafeInMemorySorterSuite

2018-06-14 Thread gurwls223
Repository: spark Updated Branches: refs/heads/branch-2.3 470cacd49 -> a2f65eb79 [MINOR][CORE][TEST] Remove unnecessary sort in UnsafeInMemorySorterSuite ## What changes were proposed in this pull request? We don't require specific ordering of the input data, the sort action is not

spark git commit: [MINOR][CORE][TEST] Remove unnecessary sort in UnsafeInMemorySorterSuite

2018-06-14 Thread gurwls223
Repository: spark Updated Branches: refs/heads/master 3bf76918f -> 534065efe [MINOR][CORE][TEST] Remove unnecessary sort in UnsafeInMemorySorterSuite ## What changes were proposed in this pull request? We don't require specific ordering of the input data, the sort action is not necessary

spark git commit: [PYTHON] Fix typo in serializer exception

2018-06-15 Thread gurwls223
Repository: spark Updated Branches: refs/heads/branch-2.3 e6bf325de -> 7f1708a44 [PYTHON] Fix typo in serializer exception ## What changes were proposed in this pull request? Fix typo in exception raised in Python serializer ## How was this patch tested? No code changes Please review

spark git commit: [PYTHON] Fix typo in serializer exception

2018-06-15 Thread gurwls223
Repository: spark Updated Branches: refs/heads/master 22daeba59 -> 6567fc43a [PYTHON] Fix typo in serializer exception ## What changes were proposed in this pull request? Fix typo in exception raised in Python serializer ## How was this patch tested? No code changes Please review

spark git commit: [SPARK-23754][BRANCH-2.3][PYTHON] Re-raising StopIteration in client code

2018-05-30 Thread gurwls223
Repository: spark Updated Branches: refs/heads/branch-2.3 66289a3e0 -> e1c0ab16c [SPARK-23754][BRANCH-2.3][PYTHON] Re-raising StopIteration in client code ## What changes are proposed Make sure that `StopIteration`s raised in users' code do not silently interrupt processing by spark, but are

spark git commit: [SPARK-19613][SS][TEST] Random.nextString is not safe for directory namePrefix

2018-05-28 Thread gurwls223
Repository: spark Updated Branches: refs/heads/master fa2ae9d20 -> b31b587cd [SPARK-19613][SS][TEST] Random.nextString is not safe for directory namePrefix ## What changes were proposed in this pull request? `Random.nextString` is good for generating random string data, but it's not proper

spark git commit: [SPARK-19613][SS][TEST] Random.nextString is not safe for directory namePrefix

2018-05-28 Thread gurwls223
Repository: spark Updated Branches: refs/heads/branch-2.3 a9700cb4a -> fec43fe1b [SPARK-19613][SS][TEST] Random.nextString is not safe for directory namePrefix ## What changes were proposed in this pull request? `Random.nextString` is good for generating random string data, but it's not

spark git commit: [SPARK-24377][SPARK SUBMIT] make --py-files work in non pyspark application

2018-05-28 Thread gurwls223
Repository: spark Updated Branches: refs/heads/master b31b587cd -> 2ced6193b [SPARK-24377][SPARK SUBMIT] make --py-files work in non pyspark application ## What changes were proposed in this pull request? For some Spark applications, though they're a java program, they require not only jar

spark git commit: [SPARK-23754][PYTHON] Re-raising StopIteration in client code

2018-05-30 Thread gurwls223
Repository: spark Updated Branches: refs/heads/master a4be981c0 -> 0ebb0c0d4 [SPARK-23754][PYTHON] Re-raising StopIteration in client code ## What changes were proposed in this pull request? Make sure that `StopIteration`s raised in users' code do not silently interrupt processing by spark,

spark git commit: [SPARK-24455][CORE] fix typo in TaskSchedulerImpl comment

2018-06-03 Thread gurwls223
Repository: spark Updated Branches: refs/heads/branch-2.3 21800b878 -> 18194544f [SPARK-24455][CORE] fix typo in TaskSchedulerImpl comment change runTasks to submitTasks in the TaskSchedulerImpl.scala 's comment Author: xueyu Author: Xue Yu <278006...@qq.com> Closes #21485 from

spark git commit: [SPARK-24455][CORE] fix typo in TaskSchedulerImpl comment

2018-06-03 Thread gurwls223
Repository: spark Updated Branches: refs/heads/master de4feae3c -> a2166ecdd [SPARK-24455][CORE] fix typo in TaskSchedulerImpl comment change runTasks to submitTasks in the TaskSchedulerImpl.scala 's comment Author: xueyu Author: Xue Yu <278006...@qq.com> Closes #21485 from

spark git commit: [SPARK-24444][DOCS][PYTHON][BRANCH-2.3] Improve Pandas UDF docs to explain column assignment

2018-06-01 Thread gurwls223
Repository: spark Updated Branches: refs/heads/branch-2.3 b37e76fa4 -> e56266ad7 [SPARK-2][DOCS][PYTHON][BRANCH-2.3] Improve Pandas UDF docs to explain column assignment ## What changes were proposed in this pull request? Added sections to pandas_udf docs, in the grouped map section, to

spark git commit: [SPARK-24215][PYSPARK] Implement _repr_html_ for dataframes in PySpark

2018-06-04 Thread gurwls223
Repository: spark Updated Branches: refs/heads/master ff0501b0c -> dbb4d8382 [SPARK-24215][PYSPARK] Implement _repr_html_ for dataframes in PySpark ## What changes were proposed in this pull request? Implement `_repr_html_` for PySpark while in notebook and add config named

spark git commit: [SPARK-16451][REPL] Fail shell if SparkSession fails to start.

2018-06-04 Thread gurwls223
Repository: spark Updated Branches: refs/heads/master dbb4d8382 -> b3417b731 [SPARK-16451][REPL] Fail shell if SparkSession fails to start. Currently, in spark-shell, if the session fails to start, the user sees a bunch of unrelated errors which are caused by code in the shell initialization

spark git commit: [SPARK-24187][R][SQL] Add array_join function to SparkR

2018-06-05 Thread gurwls223
Repository: spark Updated Branches: refs/heads/master 93df3cd03 -> e9efb62e0 [SPARK-24187][R][SQL] Add array_join function to SparkR ## What changes were proposed in this pull request? This PR adds array_join function to SparkR ## How was this patch tested? Add unit test in test_sparkSQL.R

spark git commit: [SPARK-24392][PYTHON] Label pandas_udf as Experimental

2018-05-27 Thread gurwls223
Repository: spark Updated Branches: refs/heads/master de01a8d50 -> fa2ae9d20 [SPARK-24392][PYTHON] Label pandas_udf as Experimental ## What changes were proposed in this pull request? The pandas_udf functionality was introduced in 2.3.0, but is not completely stable and still evolving.

spark git commit: [SPARK-24392][PYTHON] Label pandas_udf as Experimental

2018-05-27 Thread gurwls223
Repository: spark Updated Branches: refs/heads/branch-2.3 9b0f6f530 -> 8bb6c2285 [SPARK-24392][PYTHON] Label pandas_udf as Experimental The pandas_udf functionality was introduced in 2.3.0, but is not completely stable and still evolving. This adds a label to indicate it is still an

spark git commit: [SPARK-24665][PYSPARK] Use SQLConf in PySpark to manage all sql configs

2018-07-02 Thread gurwls223
Repository: spark Updated Branches: refs/heads/master f825847c8 -> 8f91c697e [SPARK-24665][PYSPARK] Use SQLConf in PySpark to manage all sql configs ## What changes were proposed in this pull request? Use SQLConf for PySpark to manage all sql configs, drop all the hard code in config usage.

spark git commit: [SPARK-24645][SQL] Skip parsing when csvColumnPruning enabled and partitions scanned only

2018-06-27 Thread gurwls223
Repository: spark Updated Branches: refs/heads/master c5aa54d54 -> bd32b509a [SPARK-24645][SQL] Skip parsing when csvColumnPruning enabled and partitions scanned only ## What changes were proposed in this pull request? In the master, when `csvColumnPruning`(implemented in [this

spark git commit: [SPARK-24206][SQL][FOLLOW-UP] Update DataSourceReadBenchmark benchmark results

2018-06-27 Thread gurwls223
Repository: spark Updated Branches: refs/heads/master bd32b509a -> 1c9acc243 [SPARK-24206][SQL][FOLLOW-UP] Update DataSourceReadBenchmark benchmark results ## What changes were proposed in this pull request? This pr corrected the default configuration (`spark.master=local[1]`) for

spark git commit: [SPARK-24603][SQL] Fix findTightestCommonType reference in comments

2018-06-27 Thread gurwls223
Repository: spark Updated Branches: refs/heads/master 1c9acc243 -> 6a97e8eb3 [SPARK-24603][SQL] Fix findTightestCommonType reference in comments findTightestCommonTypeOfTwo has been renamed to findTightestCommonType ## What changes were proposed in this pull request? (Please fill in changes

spark git commit: [SPARK-24603][SQL] Fix findTightestCommonType reference in comments

2018-06-27 Thread gurwls223
Repository: spark Updated Branches: refs/heads/branch-2.2 47958270f -> a8537a5ab [SPARK-24603][SQL] Fix findTightestCommonType reference in comments findTightestCommonTypeOfTwo has been renamed to findTightestCommonType ## What changes were proposed in this pull request? (Please fill in

spark git commit: [SPARK-24603][SQL] Fix findTightestCommonType reference in comments

2018-06-27 Thread gurwls223
Repository: spark Updated Branches: refs/heads/branch-2.3 6e1f5e018 -> 0f534d3da [SPARK-24603][SQL] Fix findTightestCommonType reference in comments findTightestCommonTypeOfTwo has been renamed to findTightestCommonType ## What changes were proposed in this pull request? (Please fill in

spark git commit: [SPARK-24636][SQL] Type coercion of arrays for array_join function

2018-06-25 Thread gurwls223
Repository: spark Updated Branches: refs/heads/master c7967c604 -> e07aee216 [SPARK-24636][SQL] Type coercion of arrays for array_join function ## What changes were proposed in this pull request? Presto's implementation accepts arbitrary arrays of primitive types as an input: ``` presto>

spark git commit: [SPARK-23776][DOC] Update instructions for running PySpark after building with SBT

2018-06-25 Thread gurwls223
Repository: spark Updated Branches: refs/heads/master d48803bf6 -> 4c059ebc6 [SPARK-23776][DOC] Update instructions for running PySpark after building with SBT ## What changes were proposed in this pull request? This update tells the reader how to build Spark with SBT such that pyspark-sql

spark git commit: Fix minor typo in docs/cloud-integration.md

2018-06-25 Thread gurwls223
Repository: spark Updated Branches: refs/heads/master 6e0596e26 -> 8ab8ef773 Fix minor typo in docs/cloud-integration.md ## What changes were proposed in this pull request? Minor typo in docs/cloud-integration.md ## How was this patch tested? This is trivial enough that it should not

spark git commit: [SPARK-24324][PYTHON] Pandas Grouped Map UDF should assign result columns by name

2018-06-23 Thread gurwls223
Repository: spark Updated Branches: refs/heads/master 98f363b77 -> a5849ad9a [SPARK-24324][PYTHON] Pandas Grouped Map UDF should assign result columns by name ## What changes were proposed in this pull request? Currently, a `pandas_udf` of type `PandasUDFType.GROUPED_MAP` will assign the

spark git commit: [SPARK-24614][PYSPARK] Fix for SyntaxWarning on tests.py

2018-06-21 Thread gurwls223
Repository: spark Updated Branches: refs/heads/master 7236e759c -> c0cad596b [SPARK-24614][PYSPARK] Fix for SyntaxWarning on tests.py ## What changes were proposed in this pull request? Fix for SyntaxWarning on tests.py ## How was this patch tested? ./dev/run-tests Author: Rekha Joshi

spark git commit: [SPARK-24574][SQL] array_contains, array_position, array_remove and element_at functions deal with Column type

2018-06-21 Thread gurwls223
Repository: spark Updated Branches: refs/heads/master 54fcaafb0 -> 7236e759c [SPARK-24574][SQL] array_contains, array_position, array_remove and element_at functions deal with Column type ## What changes were proposed in this pull request? For the function ```def array_contains(column:

spark git commit: [SPARK-24715][BUILD] Override jline version as 2.14.3 in SBT

2018-07-02 Thread gurwls223
Repository: spark Updated Branches: refs/heads/master 8f91c697e -> 8008f9cb8 [SPARK-24715][BUILD] Override jline version as 2.14.3 in SBT ## What changes were proposed in this pull request? During SPARK-24418 (Upgrade Scala to 2.11.12 and 2.12.6), we upgrade `jline` version together. So,

spark git commit: [SPARK-24507][DOCUMENTATION] Update streaming guide

2018-07-02 Thread gurwls223
Repository: spark Updated Branches: refs/heads/branch-2.3 3c0af793f -> 1cba0505e [SPARK-24507][DOCUMENTATION] Update streaming guide ## What changes were proposed in this pull request? Updated streaming guide for direct stream and link to integration guide. ## How was this patch tested?

spark git commit: [SPARK-24507][DOCUMENTATION] Update streaming guide

2018-07-02 Thread gurwls223
Repository: spark Updated Branches: refs/heads/master 8008f9cb8 -> f599cde69 [SPARK-24507][DOCUMENTATION] Update streaming guide ## What changes were proposed in this pull request? Updated streaming guide for direct stream and link to integration guide. ## How was this patch tested? jekyll

spark git commit: [SPARK-24131][PYSPARK] Add majorMinorVersion API to PySpark for determining Spark versions

2018-05-01 Thread gurwls223
Repository: spark Updated Branches: refs/heads/master 6782359a0 -> e15850be6 [SPARK-24131][PYSPARK] Add majorMinorVersion API to PySpark for determining Spark versions ## What changes were proposed in this pull request? We need to determine Spark major and minor versions in PySpark. We can

spark git commit: [SPARK-23853][PYSPARK][TEST] Run Hive-related PySpark tests only for `-Phive`

2018-04-30 Thread gurwls223
Repository: spark Updated Branches: refs/heads/master 007ae6878 -> b857fb549 [SPARK-23853][PYSPARK][TEST] Run Hive-related PySpark tests only for `-Phive` ## What changes were proposed in this pull request? When `PyArrow` or `Pandas` are not available, the corresponding PySpark tests are

spark git commit: [MINOR][DOCS] Fix a broken link for Arrow's supported types in the programming guide

2018-04-29 Thread gurwls223
Repository: spark Updated Branches: refs/heads/master bd14da6fd -> 56f501e1c [MINOR][DOCS] Fix a broken link for Arrow's supported types in the programming guide ## What changes were proposed in this pull request? This PR fixes a broken link for Arrow's supported types in the programming

spark git commit: [MINOR][DOCS] Fix a broken link for Arrow's supported types in the programming guide

2018-04-29 Thread gurwls223
Repository: spark Updated Branches: refs/heads/branch-2.3 df45ddb9d -> 235ec9ee7 [MINOR][DOCS] Fix a broken link for Arrow's supported types in the programming guide ## What changes were proposed in this pull request? This PR fixes a broken link for Arrow's supported types in the

spark git commit: [SPARK-23846][SQL] The samplingRatio option for CSV datasource

2018-04-29 Thread gurwls223
Repository: spark Updated Branches: refs/heads/master 56f501e1c -> 3121b411f [SPARK-23846][SQL] The samplingRatio option for CSV datasource ## What changes were proposed in this pull request? I propose to support the `samplingRatio` option for schema inferring of CSV datasource similar to

spark git commit: [SPARK-23094][SPARK-23723][SPARK-23724][SQL] Support custom encoding for json files

2018-04-28 Thread gurwls223
Repository: spark Updated Branches: refs/heads/master 4df51361a -> bd14da6fd [SPARK-23094][SPARK-23723][SPARK-23724][SQL] Support custom encoding for json files ## What changes were proposed in this pull request? I propose new option for JSON datasource which allows to specify encoding

spark git commit: [SPARK-23715][SQL] the input of to/from_utc_timestamp can not have timezone

2018-05-03 Thread gurwls223
Repository: spark Updated Branches: refs/heads/master c9bfd1c6f -> 417ad9250 [SPARK-23715][SQL] the input of to/from_utc_timestamp can not have timezone ## What changes were proposed in this pull request? `from_utc_timestamp` assumes its input is in UTC timezone and shifts it to the

<    1   2   3   4   5   6   7   8   9   10   >