We're using FAST_DIFF for the block encoding. For completeness here's the rest:

{TABLE_ATTRIBUTES => {coprocessor$1 => 's3://optix.rtb16/root/lib/geomesa-hbase-distributed-runtime_2.11-2.1.0.jar|org.locationtech.geomesa.hbase.coprocessor.GeoMesaCoprocessor|1073741823|'}, {NAME => 'd', BLOOMFILTER => 'NONE', DATA_BLOCK_ENCODING => 'FAST_DIFF', COMPRESSION => 'SNAPPY'}

Thanks,
Austin

On 3/20/19 23:01, Reid Chan wrote:
Do your table has `DATA_BLOCK_ENCODING` set?





--------------------------

Best regards,
R.C



________________________________________
From: aheyne <ahe...@ccri.com>
Sent: 21 March 2019 10:06
To: Sean Busbey
Cc: user@hbase.apache.org
Subject: Re: Bits getting flipped in record value

Correct, no records will ever be updated.

We do have a custom coprocessor loaded but before I mention that you
should have more context about what we're actually doing. We're running
GeoMesa [1] on top of HBase. Practically what this means is that for
every record we write, we write twice, once to a spatio-temporal indexed
table and another to an attribute indexed table. What we have seen is
that the value in one of the tables, but not the other, is becoming
corrupt. As far as we can tell, no custom read path code is involved
since we've validated the raw binary values using direct HBase API
access and since the write path has been verified with a second bulk
load. Additionally, the fact the same code is writing the values for
each index, I feel confident ruling out the write path. As I understand
it so far the only manipulation of the values is happening during
compactions.

The coprocessor we're using is available here [2]. It's not doing
anything too crazy, just filtering and depending on the query type,
deserializing the row values and/or rolling up some aggregations.

Thanks,
Austin

[1] https://www.geomesa.org/
[2]
https://github.com/locationtech/geomesa/blob/master/geomesa-hbase/geomesa-hbase-datastore/src/main/scala/org/locationtech/geomesa/hbase/coprocessor/GeoMesaCoprocessor.scala

On 2019-03-20 21:34, Sean Busbey wrote:
So you're saying no records should ever be updated, right?

Do you have any coprocessors loaded?

On Wed, Mar 20, 2019, 20:32 aheyne <ahe...@ccri.com> wrote:

I don't have the WALs but due to the nature of the data each
record/key
is unique. The keys for the data are generated using
spatial-temporal
dimensions of the observation.

-Austin

On 2019-03-20 21:25, Sean Busbey wrote:
Have you examined the wals for writes to the impacted cells to
verify
an
update wasn't written with the change to the value?

On Wed, Mar 20, 2019, 17:47 Austin Heyne <ahe...@ccri.com> wrote:

Hey all,

We're running HBase 1.4.8 on EMR 5.20 backed by S3 and we're
seeing a
bit get flipped in some record values.

We've preformed a bulk ingest and bulk load of a large chunk of
data
and
then pointed a live ingest feed to that table. After a period of
time
we
found that a few records in the table had been corrupted and were
one
bit different from their original value. Since we saved the
output of
the bulk ingest we re-loaded those files and verified that at the
time
of bulk load the record was correct. This seems to us to indicate
that
at some point during the live ingest writes the record was
corrupted.
I've verified that the region that the record is in has never
been
split
but it has received over 2 million write requests so there very
likely
could have been some minor compactions there.

Has anyone seen anything like this before?

Thanks,
Austin

--
Austin L. Heyne


--
Austin L. Heyne

Reply via email to