Hello spark users, Spark-postgres is designed for reliable and performant ETL in big-data workload and offer read/write/scd capability . The version 3 introduces a datasource API and simplifies the usage. It outperforms sqoop by factor 8 and the apache spark core jdbc by infinity.
Features: - use of pg COPY statements - parallel reads/writes - use of hdfs to store intermediary csv - reindex after bulk-loading - SCD1 computations done on the spark side - use unlogged tables when needed - handle arrays and multiline string columns - useful jdbc functions (ddl, updates...) The official repository: https://framagit.org/parisni/spark-etl/tree/master/spark-postgres And its mirror on microsoft github: https://github.com/EDS-APHP/spark-etl/tree/master/spark-postgres -- nicolas --------------------------------------------------------------------- To unsubscribe e-mail: user-unsubscr...@spark.apache.org