GitHub user sureshthalamati opened a pull request: https://github.com/apache/spark/pull/16209
[WIP][SPARK-10849][SQL] Adds option to the JDBC data source for user to specify database column type for the create table ## What changes were proposed in this pull request? Currently JDBC data source creates tables in the target database using the default type mapping, and the JDBC dialect mechanism. Â If users want to specify different database data type for only some of columns, there is no option available. In scenarios where default mapping does not work, users are forced to create tables on the target database before writing. This workaround is probably not acceptable from a usability point of view. This PR is to provide a user-defined type mapping for specific columns. The solution is to allow users to specify database column data type for the create table as JDBC datasource option(createTableColumnTypes) on write. Data type information can be specified as key(column name)-value(data type) pairs in JSON (e.g: {"name":"varchar(128)", "comments":"clob(20k)"}). Users can use org.apache.spark.sql.types.MetadataBuilder to build the metadata and generate the JSON string required for this option. Example: ```Scala val mdb = new MetadataBuilder() mdb.putString("name", "VARCHAR(128)â) mdb.putString("commentsâ, âCLOB(20K)â) val createTableColTypes = mdb.build().json df.write.option("createTableColumnTypes", createTableColTypes).jdbc(url, "TEST.DBCOLTYPETEST", properties) ``` Alternative approach is to add a new column metadata property to the jdbc data source for users to specify database column type using the metadata. TODO : Case-insensitive column name lookup based on the spark.sql.caseSensitive property value. ## How was this patch tested? Added new test case to the JDBCWriteSuite You can merge this pull request into a Git repository by running: $ git pull https://github.com/sureshthalamati/spark jdbc_custom_dbtype_option_json-spark-10849 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/16209.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #16209 ---- commit 6eec6ca63c5641d1bbbbc9958bdd300ac079d5cf Author: sureshthalamati <suresh.thalam...@gmail.com> Date: 2016-12-02T23:22:17Z Adding new option to the jdbc to allow users to specify create table column types when table is created on write ---- --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org