This is an automated email from the ASF dual-hosted git repository. gurwls223 pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git
The following commit(s) were added to refs/heads/master by this push: new a5427a0 [MINOR][SQL][DOCS] Reformat the tables in SQL migration guide a5427a0 is described below commit a5427a0067504484b0eb5e5f87b2658566aee324 Author: Hyukjin Kwon <gurwls...@apache.org> AuthorDate: Sat Feb 2 23:45:46 2019 +0800 [MINOR][SQL][DOCS] Reformat the tables in SQL migration guide ## What changes were proposed in this pull request? 1. Reformat the tables to be located with a proper indentation under the corresponding item to be consistent. 2. Fix **Table 2**'s contents to be more readable with code blocks. ### Table 1 **Before:** ![screen shot 2019-02-02 at 11 37 30 am](https://user-images.githubusercontent.com/6477701/52159396-f1a18380-26de-11e9-9dca-f56b19f22bb4.png) **After:** ![screen shot 2019-02-02 at 11 32 39 am](https://user-images.githubusercontent.com/6477701/52159370-7d66e000-26de-11e9-9e6d-81cf73691c05.png) ### Table 2 **Before:** ![screen shot 2019-02-02 at 11 35 51 am](https://user-images.githubusercontent.com/6477701/52159401-0ed65200-26df-11e9-8b0e-86d005c233b5.png) **After:** ![screen shot 2019-02-02 at 11 32 44 am](https://user-images.githubusercontent.com/6477701/52159372-7f30a380-26de-11e9-8c04-a88c74b78cff.png) ## How was this patch tested? Manually built the doc. Closes #23723 from HyukjinKwon/minor-doc-fix. Authored-by: Hyukjin Kwon <gurwls...@apache.org> Signed-off-by: Hyukjin Kwon <gurwls...@apache.org> --- docs/sql-migration-guide-upgrade.md | 138 ++++++++++++++++++------------------ 1 file changed, 69 insertions(+), 69 deletions(-) diff --git a/docs/sql-migration-guide-upgrade.md b/docs/sql-migration-guide-upgrade.md index dbf9df0..1ae26e6 100644 --- a/docs/sql-migration-guide-upgrade.md +++ b/docs/sql-migration-guide-upgrade.md @@ -38,7 +38,7 @@ displayTitle: Spark SQL Upgrading Guide - Since Spark 3.0, JSON datasource and JSON function `schema_of_json` infer TimestampType from string values if they match to the pattern defined by the JSON option `timestampFormat`. Set JSON option `inferTimestamp` to `false` to disable such type inferring. - In PySpark, when Arrow optimization is enabled, if Arrow version is higher than 0.11.0, Arrow can perform safe type conversion when converting Pandas.Series to Arrow array during serialization. Arrow will raise errors when detecting unsafe type conversion like overflow. Setting `spark.sql.execution.pandas.arrowSafeTypeConversion` to true can enable it. The default setting is false. PySpark's behavior for Arrow versions is illustrated in the table below: - <table class="table"> + <table class="table"> <tr> <th> <b>PyArrow version</b> @@ -51,39 +51,39 @@ displayTitle: Spark SQL Upgrading Guide </th> </tr> <tr> - <th> - <b>version < 0.11.0</b> - </th> - <th> - <b>Raise error</b> - </th> - <th> - <b>Silently allows</b> - </th> + <td> + version < 0.11.0 + </td> + <td> + Raise error + </td> + <td> + Silently allows + </td> </tr> <tr> - <th> - <b>version > 0.11.0, arrowSafeTypeConversion=false</b> - </th> - <th> - <b>Silent overflow</b> - </th> - <th> - <b>Silently allows</b> - </th> + <td> + version > 0.11.0, arrowSafeTypeConversion=false + </td> + <td> + Silent overflow + </td> + <td> + Silently allows + </td> </tr> <tr> - <th> - <b>version > 0.11.0, arrowSafeTypeConversion=true</b> - </th> - <th> - <b>Raise error</b> - </th> - <th> - <b>Raise error</b> - </th> + <td> + version > 0.11.0, arrowSafeTypeConversion=true + </td> + <td> + Raise error + </td> + <td> + Raise error + </td> </tr> - </table> + </table> - In Spark version 2.4 and earlier, if `org.apache.spark.sql.functions.udf(Any, DataType)` gets a Scala closure with primitive-type argument, the returned UDF will return null if the input values is null. Since Spark 3.0, the UDF will return the default value of the Java type if the input value is null. For example, `val f = udf((x: Int) => x, IntegerType)`, `f($"x")` will return null in Spark 2.4 and earlier if column `x` is null, and return 0 in Spark 3.0. This behavior change is int [...] @@ -100,64 +100,64 @@ displayTitle: Spark SQL Upgrading Guide ## Upgrading From Spark SQL 2.3 to 2.4 - In Spark version 2.3 and earlier, the second parameter to array_contains function is implicitly promoted to the element type of first array type parameter. This type promotion can be lossy and may cause `array_contains` function to return wrong result. This problem has been addressed in 2.4 by employing a safer type promotion mechanism. This can cause some change in behavior and are illustrated in the table below. - <table class="table"> + <table class="table"> <tr> <th> <b>Query</b> </th> <th> - <b>Result Spark 2.3 or Prior</b> + <b>Spark 2.3 or Prior</b> </th> <th> - <b>Result Spark 2.4</b> + <b>Spark 2.4</b> </th> <th> <b>Remarks</b> </th> </tr> <tr> - <th> - <b>SELECT <br> array_contains(array(1), 1.34D);</b> - </th> - <th> - <b>true</b> - </th> - <th> - <b>false</b> - </th> - <th> - <b>In Spark 2.4, left and right parameters are promoted to array(double) and double type respectively.</b> - </th> + <td> + <code>SELECT array_contains(array(1), 1.34D);</code> + </td> + <td> + <code>true</code> + </td> + <td> + <code>false</code> + </td> + <td> + In Spark 2.4, left and right parameters are promoted to array type of double type and double type respectively. + </td> </tr> <tr> - <th> - <b>SELECT <br> array_contains(array(1), '1');</b> - </th> - <th> - <b>true</b> - </th> - <th> - <b>AnalysisException is thrown since integer type can not be promoted to string type in a loss-less manner.</b> - </th> - <th> - <b>Users can use explicit cast</b> - </th> + <td> + <code>SELECT array_contains(array(1), '1');</code> + </td> + <td> + <code>true</code> + </td> + <td> + <code>AnalysisException</code> is thrown. + </td> + <td> + Explicit cast can be used in arguments to avoid the exception. In Spark 2.4, <code>AnalysisException</code> is thrown since integer type can not be promoted to string type in a loss-less manner. + </td> </tr> <tr> - <th> - <b>SELECT <br> array_contains(array(1), 'anystring');</b> - </th> - <th> - <b>null</b> - </th> - <th> - <b>AnalysisException is thrown since integer type can not be promoted to string type in a loss-less manner.</b> - </th> - <th> - <b>Users can use explicit cast</b> - </th> + <td> + <code>SELECT array_contains(array(1), 'anystring');</code> + </td> + <td> + <code>null</code> + </td> + <td> + <code>AnalysisException</code> is thrown. + </td> + <td> + Explicit cast can be used in arguments to avoid the exception. In Spark 2.4, <code>AnalysisException</code> is thrown since integer type can not be promoted to string type in a loss-less manner. + </td> </tr> - </table> + </table> - Since Spark 2.4, when there is a struct field in front of the IN operator before a subquery, the inner query must contain a struct field as well. In previous versions, instead, the fields of the struct were compared to the output of the inner query. Eg. if `a` is a `struct(a string, b int)`, in Spark 2.4 `a in (select (1 as a, 'a' as b) from range(1))` is a valid query, while `a in (select 1, 'a' from range(1))` is not. In previous version it was the opposite. --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org