Hi,

I had a question about how to check for existence of a record in HBase. I
went through some threads discussing the various techniques , mainly - row
locks and checkAndPut().

My schema looks like the following ->

<prefix>-<event_type>-<yyyy-mm-dd>-<eventid>

The reason I am adding the prefix is to avoid hot spotting due to increasing
time series data ( I have tested this on a test cluster and it seems to work
pretty well for my use case). I decided to go with the prefix approach after
much discussion over the mailing list and experimentation with the test
cluster. The prefix is assigned as ( some value % num_machines).

I only want to insert the record if it does not exist. Due to randomly
assigning a prefix to the beginning of a row , I cannot simply use
checkAndPut().

This leaves me with two options ->

1)
Do something like ->   prefix = hash(rowID) % numMachines , such that the
prefix is predictable for a row, and I can use checkAndPut()

I can maybe use SHA-256 above.

2)
When a record comes in for insert , do the following ->

lock ( rowIDWithoutPrefix)
rowIDWithPrefix = prefix + rowIDWithoutPrefix
insert (rowIDWithPrefix)
unlock (rowIDWithoutPrefix)

Every client takes a lock on a rowid without the prefix, but adds the prefix
when writing ...

Does anyone have experience implementing any of these techniques ? I
understand that there are problems with deadlocks while using row locks,
which is something I am not too concerned with. I would never have a lot of
clients trying to grab the lock on the same row key (the probability of this
happening is very slim).

I am only worried about write performance.

Thank you,

Sam

Reply via email to