Re: Bits getting flipped in record value

Austin Heyne Thu, 21 Mar 2019 08:29:35 -0700

We're using FAST_DIFF for the block encoding. For completeness here'sthe rest:

{TABLE_ATTRIBUTES => {coprocessor$1 =>'s3://optix.rtb16/root/lib/geomesa-hbase-distributed-runtime_2.11-2.1.0.jar|org.locationtech.geomesa.hbase.coprocessor.GeoMesaCoprocessor|1073741823|'},{NAME => 'd', BLOOMFILTER => 'NONE', DATA_BLOCK_ENCODING => 'FAST_DIFF',COMPRESSION => 'SNAPPY'}


Thanks,
Austin

On 3/20/19 23:01, Reid Chan wrote:

Do your table has `DATA_BLOCK_ENCODING` set?





--------------------------

Best regards,
R.C



________________________________________
From: aheyne <[email protected]>
Sent: 21 March 2019 10:06
To: Sean Busbey
Cc: [email protected]
Subject: Re: Bits getting flipped in record value

Correct, no records will ever be updated.

We do have a custom coprocessor loaded but before I mention that you
should have more context about what we're actually doing. We're running
GeoMesa [1] on top of HBase. Practically what this means is that for
every record we write, we write twice, once to a spatio-temporal indexed
table and another to an attribute indexed table. What we have seen is
that the value in one of the tables, but not the other, is becoming
corrupt. As far as we can tell, no custom read path code is involved
since we've validated the raw binary values using direct HBase API
access and since the write path has been verified with a second bulk
load. Additionally, the fact the same code is writing the values for
each index, I feel confident ruling out the write path. As I understand
it so far the only manipulation of the values is happening during
compactions.

The coprocessor we're using is available here [2]. It's not doing
anything too crazy, just filtering and depending on the query type,
deserializing the row values and/or rolling up some aggregations.

Thanks,
Austin

[1] https://www.geomesa.org/
[2]
https://github.com/locationtech/geomesa/blob/master/geomesa-hbase/geomesa-hbase-datastore/src/main/scala/org/locationtech/geomesa/hbase/coprocessor/GeoMesaCoprocessor.scala

On 2019-03-20 21:34, Sean Busbey wrote:

So you're saying no records should ever be updated, right?

Do you have any coprocessors loaded?

On Wed, Mar 20, 2019, 20:32 aheyne <[email protected]> wrote:

I don't have the WALs but due to the nature of the data each
record/key
is unique. The keys for the data are generated using
spatial-temporal
dimensions of the observation.

-Austin

On 2019-03-20 21:25, Sean Busbey wrote:

Have you examined the wals for writes to the impacted cells to

verify

an
update wasn't written with the change to the value?

On Wed, Mar 20, 2019, 17:47 Austin Heyne <[email protected]> wrote:

Hey all,

We're running HBase 1.4.8 on EMR 5.20 backed by S3 and we're

seeing a

bit get flipped in some record values.

We've preformed a bulk ingest and bulk load of a large chunk of

data

and
then pointed a live ingest feed to that table. After a period of

time

we
found that a few records in the table had been corrupted and were

one

bit different from their original value. Since we saved the

output of

the bulk ingest we re-loaded those files and verified that at the

time

of bulk load the record was correct. This seems to us to indicate

that

at some point during the live ingest writes the record was

corrupted.

I've verified that the region that the record is in has never

been

split
but it has received over 2 million write requests so there very

likely

could have been some minor compactions there.

Has anyone seen anything like this before?

Thanks,
Austin

--
Austin L. Heyne

--
Austin L. Heyne

Re: Bits getting flipped in record value

Reply via email to