Hi all,
I've used Sqoop 1 to integrate custom pre-processing code when performing a
Sqoop import from a relational database into HDFS. Basically, I used the
"codegen" command to create the object-relational mapping class, and then
modified that class source code to embed my custom pre-processing code.
Through this approach, I was able to modify the readFields() method to process
the field values (in this case, by encrypting sensitive fields), when reading
the fields from the JDBC result set and before setting them in the object
instance. I then used this modified ORM class file when performing the Sqoop
import operation. The end result was that certain fields in my data were
encrypted by my custom code before being written into HDFS.
For example, modified ORM class:
public class Customer extends SqoopRecord implements DBWritable, Writable {
...
public void readFields(ResultSet __dbResults) throws SQLException {
this.__cur_result_set = __dbResults;
this.id = JdbcWritableBridge.readInteger(1, __dbResults);
this.last_name = JdbcWritableBridge.readString(2, __dbResults);
this.first_name = JdbcWritableBridge.readString(3, __dbResults);
# encrypt cc (credit card) field, before setting value in object
this.cc = encrypt(JdbcWritableBridge.readString(4, __dbResults));
}
...
}
This approach works fine in Sqoop 1.
But I don't see any way to integrate such custom pre-processing code in Sqoop
2. There is no "codegen" or equivalent option in Sqoop2. Is there a UDF or
other custom connector approach that can be used in Sqoop 2 to achieve this, to
process fields during the Sqoop 2 import job? If so, can you point me at some
examples or docs showing how that works in Sqoop 2?
Thanks!
--Joe Achett