PostgreSQL connector for direct export with pg_bulkload
-------------------------------------------------------
Key: SQOOP-390
URL: https://issues.apache.org/jira/browse/SQOOP-390
Project: Sqoop
Issue Type: New Feature
Reporter: Masatake Iwasaki
h1. Features
* Fast data export with pg_bulkload.
** http://pgbulkload.projects.postgresql.org/index.html
* User can get benefit of functionality of pg_bulkload such as
** bypassing shared bufferes and WAL,
** ignoring data causing parse error,
** ETL feature with filter functions.
h1. Implementation
* Each map tasks create their own staging table with names based on task
attempt id. Then export data with pg_bulkload invoked as outer process.
* Staging tables are erased on
** Successful completion of export,
** or Exception in map task.
* Reduce task migrate data from staging tables into destination table
** Number of reduce tasks is internally set to 1 .
h1. Requirements
* pg_bulkload must be installed on DB server and all slave nodes.
* Also PostgreSQL JDBC is required on client. (Same as PostgresqlManager)
* Superuser role of PostgreSQL database is required for pg_bulkload.
h1. Usage
Currently there is no Factory class. Specify connection manager class name with
--connection-manager option to use.
{noformat}
sqoop export --connect jdbc:postgresql://localhost:5432/test \
--connection-manager com.cloudera.sqoop.manager.PGBulkloadManager \
--table test --username postgres --export-dir=/test -m 1
{noformat}
You can also specify pg_bulkload configuration with Hadoop Configuration
properties.
{noformat}
-Dpgbulkload.bin="/usr/bin/pg_bulkload"
-Dpgbulkload.input.field.delim=$'\t'
-Dpgbulkload.check.constraints="YES"
-Dpgbulkload.parse.errors="INFINITE"
-Dpgbulkload.duplicate.errors="INFINITE"
{noformat}
h1. Test
There is test class named PGBulkloadManagerTest extending TestExport.
{noformat}
ant -Dtestcase=PGBulkloadManagerTest test
{noformat}
This test requires
* PostgreSQL running on localhost:5432,
* database named sqooptest,
* super user role named sqooptest with no password, or .pgpass created
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira