Hi, I had a question about how to check for existence of a record in HBase. I went through some threads discussing the various techniques , mainly - row locks and checkAndPut().
My schema looks like the following -> <prefix>-<event_type>-<yyyy-mm-dd>-<eventid> The reason I am adding the prefix is to avoid hot spotting due to increasing time series data ( I have tested this on a test cluster and it seems to work pretty well for my use case). I decided to go with the prefix approach after much discussion over the mailing list and experimentation with the test cluster. The prefix is assigned as ( some value % num_machines). I only want to insert the record if it does not exist. Due to randomly assigning a prefix to the beginning of a row , I cannot simply use checkAndPut(). This leaves me with two options -> 1) Do something like -> prefix = hash(rowID) % numMachines , such that the prefix is predictable for a row, and I can use checkAndPut() I can maybe use SHA-256 above. 2) When a record comes in for insert , do the following -> lock ( rowIDWithoutPrefix) rowIDWithPrefix = prefix + rowIDWithoutPrefix insert (rowIDWithPrefix) unlock (rowIDWithoutPrefix) Every client takes a lock on a rowid without the prefix, but adds the prefix when writing ... Does anyone have experience implementing any of these techniques ? I understand that there are problems with deadlocks while using row locks, which is something I am not too concerned with. I would never have a lot of clients trying to grab the lock on the same row key (the probability of this happening is very slim). I am only worried about write performance. Thank you, Sam
