Siddhi Mehta created PHOENIX-2367:
-------------------------------------

             Summary: Change PhoenixRecordWriter to use execute instead of 
executeBatch
                 Key: PHOENIX-2367
                 URL: https://issues.apache.org/jira/browse/PHOENIX-2367
             Project: Phoenix
          Issue Type: Improvement
            Reporter: Siddhi Mehta
            Assignee: Siddhi Mehta


Hey All,

I wanted to add a notion of skipping invalid rows for PhoenixHbaseStorage 
similar to how the CSVBulkLoad tool has an option of ignoring the bad rows.I 
did some work on the apache pig code that allows Storers to have a notion of 
Customizable/Configurable Errors PIG-4704.

I wanted to plug this behavior for PhoenixHbaseStorage and propose certain 
changes for the same.

Current Behavior/Problem:

PhoenixRecordWriter makes use of executeBatch() to process rows once batch size 
is reached. If there are any client side validation/syntactical errors like 
data not fitting the column size, executeBatch() throws an exception and there 
is no-way to retrieve the valid rows from the batch and retry them. We discard 
the whole batch or fail the job without errorhandling.

With auto commit set to false execute() also servers the purpose of not making 
any rpc calls  but does a bunch of validation client side and adds it to the 
client cache of mutation.

On conn.commit() we make a rpc call.

Proposed Change

To be able to use Configurable ErrorHandling and ignore only the failed records 
instead of discarding the whole batch I want to propose changing the behavior 
in PhoenixRecordWriter from execute to executeBatch() or having a configuration 
to toggle between the 2 behaviors 




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to