Re: Avoiding duplicate writes

Ted Yu Thu, 11 Jan 2018 07:33:44 -0800

Peter:
Normally java.lang.System.nanoTime() is used for measuring duration of time.


See also
https://www.javacodegeeks.com/2012/02/what-is-behind-systemnanotime.html

bq. the prePut co-processor is executed inside a record lock

The prePut hook is called with read lock on the underlying region.


Have you heard of HLC ? See HBASE-14070

The work hasn't been active recently.

FYI

On Thu, Jan 11, 2018 at 2:16 AM, Peter Marron <[email protected]>
wrote:

> Hi,
>
> We have a problem when we are writing lots of records to HBase.
> We are not specifying timestamps explicitly and so the situation arises
> where multiple records are being written in the same millisecond.
> Unfortunately when the records are written and the timestamps are the same
> then later writes are treated as updates of the previous records and not
> separate records, which is what we want.
> So we want to be able to guarantee that records are not treated as
> overwrites (unless we explicitly make them so).
>
> As I understand it there are (at least) two different ways to proceed.
>
> The first approach is to increase the resolution of the timestamp.
> So we could use something like java.lang.System.nanoTime()
> However although this seems to ameliorate the problem it seems to
> introduce other problems.
> Also ideally we would like something that guarantees that we don't lose
> writes rather than making them more unlikely.
>
> The second approach is to write a prePut co-processor.
> In the prePut I can do a read using the same rowkey, column family and
> column qualifier and omit the timestamp.
> As I understand it this will return me the latest timestamp.
> Then I can update the timestamp that I am going to write, if necessary, to
> make sure that the timestamp is always unique.
> In this way I can guarantee that none of my writes are accidentally turned
> into updates.
>
> However this approach seems to be expensive.
> I have to do a read before each write, and although (I believe) it will be
> on the same region server, it's still going to slow things down a lot.
> Also I am assuming that the prePut co-processor is executed inside a
> record lock so that I don't have to worry about synchronization.
> Is this true?
>
> Is there a better way?
>
> Maybe there is some implementation of this already that I can pick up?
>
> Maybe there is some way that I can implement this more efficiently?
>
> It seems to me that this might be better handled at compaction.
> Shouldn't there be some way that I can mark writes with some sort of
> special value of timestamp that means that this write should never be
> considered as an update but always as a separate write?
>
> Any advice gratefully received.
>
> Peter Marron
>

Re: Avoiding duplicate writes

Reply via email to