Hi, We have a problem when we are writing lots of records to HBase. We are not specifying timestamps explicitly and so the situation arises where multiple records are being written in the same millisecond. Unfortunately when the records are written and the timestamps are the same then later writes are treated as updates of the previous records and not separate records, which is what we want. So we want to be able to guarantee that records are not treated as overwrites (unless we explicitly make them so).
As I understand it there are (at least) two different ways to proceed. The first approach is to increase the resolution of the timestamp. So we could use something like java.lang.System.nanoTime() However although this seems to ameliorate the problem it seems to introduce other problems. Also ideally we would like something that guarantees that we don't lose writes rather than making them more unlikely. The second approach is to write a prePut co-processor. In the prePut I can do a read using the same rowkey, column family and column qualifier and omit the timestamp. As I understand it this will return me the latest timestamp. Then I can update the timestamp that I am going to write, if necessary, to make sure that the timestamp is always unique. In this way I can guarantee that none of my writes are accidentally turned into updates. However this approach seems to be expensive. I have to do a read before each write, and although (I believe) it will be on the same region server, it's still going to slow things down a lot. Also I am assuming that the prePut co-processor is executed inside a record lock so that I don't have to worry about synchronization. Is this true? Is there a better way? Maybe there is some implementation of this already that I can pick up? Maybe there is some way that I can implement this more efficiently? It seems to me that this might be better handled at compaction. Shouldn't there be some way that I can mark writes with some sort of special value of timestamp that means that this write should never be considered as an update but always as a separate write? Any advice gratefully received. Peter Marron