Hi,

I'm writing a custom Storage Handler and would need to run some custom code
at the end of an INSERT query.

I can easily do that by providing a custom OutputCommitter class and
overriding the commitJob() method. However, that only works for the "mr"
execution engine, as the "commitJob()" method is never called when using
Tez.

With Tez, I managed to get it to work partially by providing a custom
HiveMetaHook class and overriding the commitInsertTable() method. However,
that method only gets called at the end of a "INSERT INTO TABLE" query. It
never gets called at the end of a "INSERT INTO TABLE PARTITION (...)" query.

After doing a bit of troubleshooting, it looks like Tez uses the "DDLTask"
class (which later calls the commitInsertTable() method) only for a "INSERT
INTO TABLE" query. When inserting into a specific partition, the "DDLTask"
class doesn't seem to be used at all.

Is there a way for me to override some type of Tez hook to run custom code
at the end of a "INSERT INTO TABLE PARTITION (...)" query? Maybe by somehow
hooking into the TezTask or TezWork classes?

Any tips would be very welcome.

Thanks!

Julien

Reply via email to