> Can you share code that generate HFiles with delete markers?

Here <https://gist.github.com/bharathv/1bc499e717366ea71940ab6df6a98d33>
you go. You might want to use table.getDescriptor() and build the column
families from that descriptor. I just hardcoded everything for my simple
table. You might have to change a few things depending on your table
properties. Also, the code works on the master branch, certain APIs may be
different from older branches. I was actually thinking about a little more
elegant way of doing this, so I was mucking with the importtsv tool to
generate deletes rather than puts from an input tsv file. I gave up after
sometime, but that is also worth trying I guess, or may be others know a
better way of doing this.

> But what would happen after major-compaction?

Its like a regular major compaction. In this case, it cleans up all the
history because both edits undo each other.

hbase(main):006:0> scan 't1', {RAW=>true}
ROW                                   COLUMN+CELL

 row1                                 column=f:a,
timestamp=9223372036854775807, type=Delete

 row1                                 column=f:a,
timestamp=9223372036854775807, value=val

1 row(s)
Took 0.0193 seconds

hbase(main):007:0> major_compact 't1'
Took 0.0479 seconds

hbase(main):008:0> scan 't1', {RAW=>true}
ROW                                   COLUMN+CELL

0 row(s)
Took 0.0026 seconds

> Maybe TS corruption issue some how linked with another issue that we got
- https://issues.apache.org/jira/browse/HBASE-22862

We are running into this too. Our current theory is that it is caused by
Phoenix indexes and an upgrade may fix it but I don't know if/how these two
issues are linked.

On Thu, May 14, 2020 at 7:22 AM Alexander Batyrshin <0x62...@gmail.com>
wrote:

> Thank you for this idea. Its looks promising.
>
> Can you share code that generate HFiles with delete markers?
>
> As I see delete markers was inserted correctly.
> But what would happen after major-compaction?
>
> > On 13 May 2020, at 08:32, Bharath Vissapragada <bhara...@apache.org>
> wrote:
> >
> > Interesting behavior, I just tried it out on my local setup (master/HEAD)
> > out of curiosity to check if we can trick HBase into deleting this bad
> row
> > and the following worked for me. I don't know how you ended up with that
> > row though (bad bulk load? just guessing).
> >
> > To have a table with the Long.MAX timestamp, I commented out some pieces
> of
> > HBase code so that it doesn't override the timestamp with the current
> > millis on the region server (otherwise, I just see the expected behavior
> of
> > current ms).
> >
> > *Step1: Create a table and generate the problematic row*
> >
> > hbase(main):002:0> create 't1', 'f'
> > Created table t1
> >
> > -- patch hbase to accept Long.MAX_VALUE ts ---
> >
> > hbase(main):005:0> put 't1', 'row1', 'f:a', 'val', 9223372036854775807
> > Took 0.0054 seconds
> >
> > -- make sure the put with the ts is present --
> > hbase(main):006:0> scan 't1'
> > ROW                                  COLUMN+CELL
> >
> > row1                                column=f:a, timestamp=
> > *9223372036854775807*, value=val
> >
> > 1 row(s)
> > Took 0.0226 seconds
> >
> > *Step 2: Hand craft an HFile with the delete marker*
> >
> > ...with this row/col/max ts [Let me know if you want the code, I can put
> > it somewhere. I just used the StoreFileWriter utility ]
> >
> > -- dump the contents of hfile using the utility ---
> >
> > $ bin/hbase hfile -f
> file:///tmp/hfiles/f/bf84f424544f4675880494e09b750ce8
> <file:///tmp/hfiles/f/bf84f424544f4675880494e09b750ce8>
> > -p
> > ......
> > Scanned kv count -> 1
> > K: row1/f:a/LATEST_TIMESTAMP/Delete/vlen=0/seqid=0 V:  <==== Delete
> marker
> >
> > *Step 3: Bulk load this HFile with the delete marker *
> >
> > bin/hbase completebulkload file:///tmp/hfiles <file:///tmp/hfiles> t1
> >
> > *Step 4: Make sure the delete marker is inserted correctly.*
> >
> > hbase(main):001:0> scan 't1'
> > ......
> >
> > 0 row(s)
> > Took 0.1387 seconds
> >
> > -- Raw scan to make sure the delete marker is inserted and nothing funky
> is
> > happening ---
> >
> > hbase(main):003:0> scan 't1', {RAW=>true}
> > ROW                                          COLUMN+CELL
> >
> >
> > row1                                        column=f:a,
> > timestamp=9223372036854775807, type=Delete
> >
> > row1                                        column=f:a,
> > timestamp=9223372036854775807, value=val
> >
> > 1 row(s)
> > Took 0.0044 seconds
> >
> > Thoughts?
> >
> > On Tue, May 12, 2020 at 2:00 PM Alexander Batyrshin <0x62...@gmail.com
> <mailto:0x62...@gmail.com>>
> > wrote:
> >
> >> Table is ~ 10TB SNAPPY data. I don’t have such a big time window on
> >> production for re-inserting all data.
> >>
> >> I don’t know how we got those cells. I can only assume that this is
> >> phoenix and/or replaying from WAL after region server crash.
> >>
> >>> On 12 May 2020, at 18:25, Wellington Chevreuil <
> >> wellington.chevre...@gmail.com> wrote:
> >>>
> >>> How large is this table? Can you afford re-insert all current data on a
> >>> new, temp table? If so, you could write a mapreduce job that scans this
> >>> table and rewrite all its cells to this new, temp table. I had verified
> >>> that 1.4.10 does have the timestamp replacing logic here:
> >>>
> >>
> https://github.com/apache/hbase/blob/branch-1.4/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java#L3395
> >> <
> >>
> https://github.com/apache/hbase/blob/branch-1.4/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java#L3395
> <
> https://github.com/apache/hbase/blob/branch-1.4/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java#L3395
> >
> >>>
> >>>
> >>> So if you re-insert all this table cells into a new one, the timestamps
> >>> would be inserted correctly and you would then be able to delete those.
> >>> Now, how those cells managed to get inserted with max timestamp? Was
> this
> >>> cluster running on an old version that then got upgraded to 1.4.10?
> >>>
> >>>
> >>> Em ter., 12 de mai. de 2020 às 13:49, Alexander Batyrshin <
> >> 0x62...@gmail.com <mailto:0x62...@gmail.com> <mailto:0x62...@gmail.com
> <mailto:0x62...@gmail.com>>>
> >>> escreveu:
> >>>
> >>>> Any ideas how to delete these rows?
> >>>>
> >>>> I see only this way:
> >>>> - backup data from region that contains “damaged” rows
> >>>> - close region
> >>>> - remove region files from HDFS
> >>>> - assign region
> >>>> - copy needed rows from backup to recreated region
> >>>>
> >>>>> On 30 Apr 2020, at 21:00, Alexander Batyrshin <0x62...@gmail.com
> <mailto:0x62...@gmail.com>>
> >> wrote:
> >>>>>
> >>>>> The same effect for CF:
> >>>>>
> >>>>> d =
> >>>>
> >>
> org.apache.hadoop.hbase.client.Delete.new("\x0439d58wj434dd".to_s.to_java_bytes)
> >>>>> d.deleteFamily("d".to_s.to_java_bytes,
> >>>> 9223372036854775807.to_java(Java::long))
> >>>>> table.delete(d)
> >>>>>
> >>>>> ROW
> >> COLUMN+CELL
> >>>>> \x0439d58wj434dd
> column=d:,
> >>>> timestamp=1588269277879, type=DeleteFamily
> >>>>>
> >>>>>
> >>>>>> On 29 Apr 2020, at 18:30, Wellington Chevreuil <
> >>>> wellington.chevre...@gmail.com <mailto:wellington.chevre...@gmail.com>
> <mailto:wellington.chevre...@gmail.com <mailto:
> wellington.chevre...@gmail.com>>
> >> <mailto:wellington.chevre...@gmail.com <mailto:
> wellington.chevre...@gmail.com> <mailto:
> >> wellington.chevre...@gmail.com <mailto:wellington.chevre...@gmail.com
> >>>>
> >>>> wrote:
> >>>>>>
> >>>>>> Well, it's weird that puts with such TS values were allowed,
> according
> >>>> to
> >>>>>> current code state. Can you afford delete the whole CF for those
> rows?
> >>>>>>
> >>>>>> Em qua., 29 de abr. de 2020 às 14:41, junhyeok park <
> >>>> runnerren...@gmail.com <mailto:runnerren...@gmail.com> <mailto:
> runnerren...@gmail.com <mailto:runnerren...@gmail.com>> <mailto:
> >> runnerren...@gmail.com <mailto:runnerren...@gmail.com> <mailto:
> runnerren...@gmail.com <mailto:runnerren...@gmail.com>>>>
> >>>>>> escreveu:
> >>>>>>
> >>>>>>> I've been through the same thing. I use 2.2.0
> >>>>>>>
> >>>>>>> 2020년 4월 29일 (수) 오후 10:32, Alexander Batyrshin <0x62...@gmail.com
> <mailto:0x62...@gmail.com>
> >> <mailto:0x62...@gmail.com <mailto:0x62...@gmail.com>>
> >>>> <mailto:0x62...@gmail.com <mailto:0x62...@gmail.com> <mailto:
> 0x62...@gmail.com <mailto:0x62...@gmail.com>>>>님이 작성:
> >>>>>>>
> >>>>>>>> As you can see in example I already tried DELETE operation with
> >>>> timestamp
> >>>>>>>> = Long.MAX_VALUE without any success.
> >>>>>>>>
> >>>>>>>>> On 29 Apr 2020, at 12:41, Wellington Chevreuil <
> >>>>>>>> wellington.chevre...@gmail.com <mailto:
> wellington.chevre...@gmail.com> <mailto:
> >> wellington.chevre...@gmail.com <mailto:wellington.chevre...@gmail.com>>
> <mailto:wellington.chevre...@gmail.com <mailto:
> wellington.chevre...@gmail.com>
> >> <mailto:wellington.chevre...@gmail.com <mailto:
> wellington.chevre...@gmail.com>>>>
> >>>> wrote:
> >>>>>>>>>
> >>>>>>>>> That's expected behaviour [1]. If you are "travelling to the
> >> future",
> >>>>>>> you
> >>>>>>>>> need to do a delete specifying Long.MAX_VALUE timestamp as the
> >>>>>>> timestamp
> >>>>>>>>> optional parameter in the delete operation [2], if you don't
> >> specify
> >>>>>>>>> timestamp on the delete, it will assume current time for the
> delete
> >>>>>>>> marker,
> >>>>>>>>> which will be smaller than the Long.MAX_VALUE set to your cells,
> so
> >>>>>>> scans
> >>>>>>>>> wouldn't filter it.
> >>>>>>>>>
> >>>>>>>>> [1] https://hbase.apache.org/book.html#version.delete <
> https://hbase.apache.org/book.html#version.delete> <
> >> https://hbase.apache.org/book.html#version.delete <
> https://hbase.apache.org/book.html#version.delete>> <
> >>>> https://hbase.apache.org/book.html#version.delete <
> https://hbase.apache.org/book.html#version.delete> <
> >> https://hbase.apache.org/book.html#version.delete <
> https://hbase.apache.org/book.html#version.delete>>>
> >>>>>>>>> [2]
> >>>>>>>>>
> >>>>>>>>
> >>>>>>>
> >>>>
> >>
> https://github.com/apache/hbase/blob/branch-1.4/hbase-client/src/main/java/org/apache/hadoop/hbase/client/Delete.java#L98
> <
> https://github.com/apache/hbase/blob/branch-1.4/hbase-client/src/main/java/org/apache/hadoop/hbase/client/Delete.java#L98
> >
> >> <
> >>
> https://github.com/apache/hbase/blob/branch-1.4/hbase-client/src/main/java/org/apache/hadoop/hbase/client/Delete.java#L98
> <
> https://github.com/apache/hbase/blob/branch-1.4/hbase-client/src/main/java/org/apache/hadoop/hbase/client/Delete.java#L98
> >
> >>>
> >>>> <
> >>>>
> >>
> https://github.com/apache/hbase/blob/branch-1.4/hbase-client/src/main/java/org/apache/hadoop/hbase/client/Delete.java#L98
> <
> https://github.com/apache/hbase/blob/branch-1.4/hbase-client/src/main/java/org/apache/hadoop/hbase/client/Delete.java#L98
> >
> >>>>>
> >>>>>>>>>
> >>>>>>>>> Em qua., 29 de abr. de 2020 às 08:57, Alexander Batyrshin <
> >>>>>>>> 0x62...@gmail.com>
> >>>>>>>>> escreveu:
> >>>>>>>>>
> >>>>>>>>>> Hello all,
> >>>>>>>>>> We had faced with strange situation: table has rows with
> >>>>>>> Long.MAX_VALUE
> >>>>>>>>>> timestamp.
> >>>>>>>>>> These rows impossible to delete, because DELETE mutation uses
> >>>>>>>>>> System.currentTimeMillis() timestamp.
> >>>>>>>>>> Is there any way to delete these rows?
> >>>>>>>>>> We use HBase-1.4.10
> >>>>>>>>>>
> >>>>>>>>>> Example:
> >>>>>>>>>>
> >>>>>>>>>> hbase(main):037:0> scan 'TRACET', { ROWPREFIXFILTER =>
> >>>>>>>> "\x0439d58wj434dd",
> >>>>>>>>>> RAW=>true, VERSIONS=>10}
> >>>>>>>>>> ROW
> >>>>>>> COLUMN+CELL
> >>>>>>>>>> \x0439d58wj434dd                                   column=d:_0,
> >>>>>>>>>> timestamp=9223372036854775807, value=x
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> hbase(main):045:0* delete 'TRACET', "\x0439d58wj434dd", "d:_0"
> >>>>>>>>>> 0 row(s) in 0.0120 seconds
> >>>>>>>>>>
> >>>>>>>>>> hbase(main):046:0> scan 'TRACET', { ROWPREFIXFILTER =>
> >>>>>>>> "\x0439d58wj434dd",
> >>>>>>>>>> RAW=>true, VERSIONS=>10}
> >>>>>>>>>> ROW
> >>>>>>> COLUMN+CELL
> >>>>>>>>>> \x0439d58wj434dd                                   column=d:_0,
> >>>>>>>>>> timestamp=9223372036854775807, value=x
> >>>>>>>>>> \x0439d58wj434dd                                   column=d:_0,
> >>>>>>>>>> timestamp=1588146570005, type=Delete
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> hbase(main):047:0> delete 'TRACET', "\x0439d58wj434dd", "d:_0",
> >>>>>>>>>> 9223372036854775807
> >>>>>>>>>> 0 row(s) in 0.0110 seconds
> >>>>>>>>>>
> >>>>>>>>>> hbase(main):048:0> scan 'TRACET', { ROWPREFIXFILTER =>
> >>>>>>>> "\x0439d58wj434dd",
> >>>>>>>>>> RAW=>true, VERSIONS=>10}
> >>>>>>>>>> ROW
> >>>>>>> COLUMN+CELL
> >>>>>>>>>> \x0439d58wj434dd                                   column=d:_0,
> >>>>>>>>>> timestamp=9223372036854775807, value=x
> >>>>>>>>>> \x0439d58wj434dd                                   column=d:_0,
> >>>>>>>>>> timestamp=1588146678086, type=Delete
> >>>>>>>>>> \x0439d58wj434dd                                   column=d:_0,
> >>>>>>>>>> timestamp=1588146570005, type=Delete
>
>

Reply via email to