[GitHub] [spark] EnricoMi opened a new pull request, #41518: [SPARK-19335][SQL] Add upserts for writing to JDBC

via GitHub Thu, 08 Jun 2023 08:03:13 -0700


EnricoMi opened a new pull request, #41518:
URL: https://github.com/apache/spark/pull/41518


   ### What changes were proposed in this pull request?
   This is a follow-up on #16685 and #16692.
   
   Implements upsert mode for `SaveMode.Append` of the MySql, MsSql, and 
Postgres JDBC source.
   
   ### Why are the changes needed?
   The JDBC writer only supports either truncating the existing table or 
inserting. Duplicates, i.e. rows with identical values in the primary or unique 
index columns, cause an exception, permitting updating existing and inserting 
new rows.
   
   Re-evaluating a partition due to executor loss will insert rows that have 
been inserted in an earlier attempt, which kills the entier Spark job.
   
   ### Does this PR introduce _any_ user-facing change?
   This adds `upsert` and `upsertKeyColumns` options for `SaveMode.Append` of 
the JDBC source.
   
   ### How was this patch tested?
   Tests in `JdbcSuite` and integration suites.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] EnricoMi opened a new pull request, #41518: [SPARK-19335][SQL] Add upserts for writing to JDBC

Reply via email to