Hi, I'm looking for a little bit of help trying to get some light over ROW_TIMESTAMP.
Some background over the problem ( simplified ) : I'm working in a project that needs to create a "enriched" replica of a RBDMS table based on a stream of cdc changes off that table. Each cdc event contains the timestamp of the change plus all the column values 'before' and 'after' the change . And each event is pushed to a kafka topic. Because of certain "non-negotiable" design decisions kafka guarantees delivering each event at least once, but doesn't guarantee ordering for changes over the same row in the source table. The final step of the kafka-based flow is sinking the information into HBase/Phoenix. As I cannot get in order delivery guarantee from Kafka I need to use the cdc event timestamp to ensure that HBase keeps the latest change over a row. This fits perfectly well with an HBase table design with VERSIONS=1 and using the source event timestamp as HBase row/cells timestamp The thing is that I cannot find a way to define the value of the HBase cell from a Phoenix upsert. I came across the ROW_TIMESTAMP functionality, but I've just found ( I'm devastated now ) that the ROW_TIMESTAMP columns store the date in both hbase's cell timestamp and in the primary key, meaning that I cannot leverage that functionality to keep only the latest change. Is there a way of defining hbase's row timestamp when doing the UPSERT - even by setting it through some obscure hidden jdbc property - ? I want to avoid by all means doing a checkAndPut as the volume of changes is going to be quite bug. -- Un saludo. Pedro Boado.
