anisms and will disappear
after that happens. Doing the GC during minor compactions as well as major
ones would change that visibility window, but doesn't seem to change that
odd behavior that is there to begin with.
On Wed, Jun 14, 2017 at 5:51 PM, Dave Latham wrote:
> What cells, if any, a
What cells, if any, are removed during minor compactions?
Cells that
(a) are beyond the TTL?
(b) are shadowed by a delete marker? (from the files compacted)
(c) are shadowed by newer versions? (assuming numVersions configured < num
versions of the cell found)
Do you have compression enabled, and is your data highly compressible?
On Mon, Mar 27, 2017 at 6:26 AM, Hef wrote:
> Hi,
> Does anyone have an idea why most of my 128MB memstore flushed files are
> only several MBs?
>
> There are a lot of logs look as below:
>
> 2017-03-27 13:10:25,064 INFO
> or
If you truly have no way to predict anything about the distribution of your
data across the row key space, then you are correct that there is no way to
presplit your regions in an effective way. Either you need to make some
starting guess, such as a small number of uniform splits, or wait until yo
Hi Zheng,
Your intuition is correct. If the client does not specify a timestamp for
writes, then the region server will use the system clock to do so. If you
send a Put to a region hosted by a server with a clock that is 50 seconds
slow, and that region has existing Cell(s) with the same row & c
What if someone doesn't know the distribution of their row keys?
HBase should be able to handle this case.
On Wed, Mar 16, 2016 at 7:18 AM, Jean-Marc Spaggiari <
jean-m...@spaggiari.org> wrote:
> Balancer is not moving regions that are compacting, right? He is just
> pusing to much load on a non
ying to fix it.
>
> JMS
>
> 2016-03-16 10:54 GMT-04:00 Dave Latham :
>
> > What if someone doesn't know the distribution of their row keys?
> > HBase should be able to handle this case.
> >
> > On Wed, Mar 16, 2016 at 7:18 AM, Jean-Marc Spaggiari <
>
Don't think that's correct. If you look at
StoreFileScanner.shouldUseScanner you can see that it will skip entire
store files if the time range for a scan does not intersect with the time
range of data in the store file. However, without tiered compaction there
is nothing built in to optimize gro
I have not tried out stripe compaction and don't see how it would help here.
On Mon, Aug 3, 2015 at 12:16 PM, Ted Yu wrote:
> bq. revive some notion of tiered compaction
>
> Did you have a chance to try out Stripe compaction ?
>
> Thanks
>
> On Mon, Aug 3, 2015 at 11
hether the remaining column families should be
> loaded.
> > To be specific, if outside the TimeRange you specify (last day), your
> > filter returns ReturnCode.INCLUDE_AND_SEEK_NEXT_ROW.
> >
> > What do you think ?
> >
> > Cheers
> >
> > On Sat,
case) to guide whether the remaining column families should be
> loaded.
> > To be specific, if outside the TimeRange you specify (last day), your
> > filter returns ReturnCode.INCLUDE_AND_SEEK_NEXT_ROW.
> >
> > What do you think ?
> >
> > Cheers
> >
>
> > branch-1 for upcoming unscheduled minor release line 1.3. Would that
> work?
> > Or would this change need to go further back?
> >
> > Maybe someone else has another suggestion.
> >
> >
> > On Sat, Aug 1, 2015 at 7:17 AM, Dave Latham wrote:
> &g
n you achieve your goal with two scans ?
> The first scan specifies TimeRange corresponding to last day. This scan
> returns both column families.
> The other scan specifies TimeRange excluding last day. This scan returns
> column family A.
>
> Cheers
>
> On Sat, Aug 1, 2015 at
nly family A is returned.
>
> Cheers
>
> On Sat, Aug 1, 2015 at 7:17 AM, Dave Latham wrote:
>
> > I have a table with 2 column families, call them A and B, with new data
> > regularly being added. They are very different sizes: B is 100x the size
> of
> > A. Amon
I have a table with 2 column families, call them A and B, with new data
regularly being added. They are very different sizes: B is 100x the size of
A. Among other uses for this data, I have a MapReduce job that needs to
read all of A, but only recent data from B (e.g. last day). Here are some
met
For #1, as Ted mentioned HDFS replication will work just fine with bulk
loads. What you may have read is that bulk loaded data won't be picked up
by HBase replication. If you are using HBase replication to send data to
another cluster, then you need to also manage getting the bulk loaded data
to
What JDK are you using? I've seen such behavior when a machine was
swapping. Can you tell if there was any swap in use?
On Mon, Jul 13, 2015 at 3:24 AM, Ankit Singhal
wrote:
> Hi Team,
>
> We are seeing regionservers getting down whenever major compaction is
> triggered on table(8.5TB size).
>
will need to implement on top of
>> ResultScanner - ThrottledResultScanner.
>>
>> Good idea for improvement, actually.
>>
>> -Vlad
>>
>> On Wed, Jun 10, 2015 at 7:30 PM, Louis Hust wrote:
>>
>> > hi, Dave,
>> >
>> > For now we
I'm not aware of anything in version 0.96 that will limit the scan for
you - you may have to do it in your client yourself. If you're
willing to upgrade, do check out the throttling available in HBase
1.1:
https://blogs.apache.org/hbase/entry/the_hbase_request_throttling_feature
On Wed, Jun 10,
On Wed, May 27, 2015 at 11:17 AM, wrote:
> Thanks! I want to make sure I've got it right:
>
> When I import the 0.92 data into 0.98, the columns are defined properly
> in the 0.98 table, but I cannot perform a scan with a column filter in
> the shell as the shell interprets the second ':' in the
).toInt'] }
Note that you can specify a FORMATTER by column only (cf:qualifer).
You cannot specify
a FORMATTER for all columns of a column family.
On Wed, May 27, 2015 at 10:23 AM, wrote:
> On Wed, May 27, 2015, at 11:35 AM, Dave Latham wrote:
>> Sounds like quite a puzzle.
>>
>
Sounds like quite a puzzle.
You mentioned that you can read data written through manual Puts from
the shell - but not data from the Import. There must be something
different about the data itself once it's in the table. Can you
compare a row that was imported to a row that was manually written -
Major compactions will fix locality, so long as there is space on the
local data nodes and they actually happen. Also, if there is already
only a single HFile in a store, major compaction may be skipped.
Newer versions of hbase have a parameter
hbase.hstore.min.locality.to.skip.major.compact that
> I believe toStringBinary does all ascii if input ascii-only.
Right, but it will also mix ascii range characters with binary.
> Our output was in part shaped by ruby binary String representation. It was
> thought useful that you could copy from shell and find in UI, and
> vice-versa. I don't thi
Wish I had started this conversation 5 years ago...
When we're using binary data, especially in row keys (and therefore
region boundaries) the output created by toStringBinary is very
painful to use:
- mix of ascii / hex representation is trouble
- it's quite long (4 characters per binary byte)
If you haven't already seen it - take a look at the bridge at
https://issues.apache.org/jira/browse/HBASE-12814
We're using it to go through the process now.
Dave
On Wed, Mar 18, 2015 at 5:46 PM, Bryan Beaudreault wrote:
> My only complaint about this poll is the labels: "0.94.x - I like stable
That's not possible with HBase today. The simplest thing may be to set
your Scan time range to include both today's and yesterday's data and then
filter down to only the data you want inside your map task. Other
possibilities would be creating a custom filter to do the filtering on the
server sid
What a milestone! Congratulations to the HBase developer community and
everyone who worked to make this happen. HBase has come a long way over
the years.
On Tue, Feb 24, 2015 at 12:28 AM, Enis Söztutar wrote:
> The HBase Team is pleased to announce the immediate release of HBase 1.0.0.
> Downl
Hi Hongbin,
The WAL class is used internally to the region server. Typically an HBase
write operation will first call WAL.append() with the data, then later,
after releasing locks, call WAL.sync() to ensure that the data for that
write has been synced to be durable before returning to the client
There's also a patch at https://issues.apache.org/jira/browse/HBASE-12814
to allow you to run replication between a 0.94 cluster and a 0.98 cluster.
With that you can get the data setup in both 0.94 clusters, then upgrade
one at a time.
Dave
On Mon, Feb 9, 2015 at 4:46 AM, Hayden Marchant wrote:
"hbase book"?
>
> We are on CDH4.4 - HBase 0.94.6, so I think we are good there.
>
> Thanks for your time Dave.
>
> Harshad
>
>
> On Thu, Oct 17, 2013 at 2:39 PM, Dave Latham wrote:
>
> > We're running HBase replication successfully on a 500 TB (c
We're running HBase replication successfully on a 500 TB (compressed - raw
is about 2PB) cluster over a 60ms link across the country. I'd give it a
thumbs up for dealing with loss of a cluster and being able to run
applications in two places that can tolerate inconsistency from the
asynchronous na
What about having all columns in the column family use the same qualifier
and then setting the max versions for that column family to limit it?
http://hbase.apache.org/book.html#schema.versions
It would only work if you didn't need to do updates to the cell without
knowing its timestamp or having
Major compactions can still be useful to improve locality - could we add a
condition to check for that too?
On Mon, Sep 9, 2013 at 10:41 PM, lars hofhansl wrote:
> Interesting. I guess we could add a check to avoid major compactions if
> (1) no TTL is set or we can show that all data is newer a
On Mon, Jul 1, 2013 at 4:52 PM, Azuryy Yu wrote:
> how to enable "sync on block close" in HDFS?
>
Set dfs.datanode.synconclose to true
See https://issues.apache.org/jira/browse/HDFS-1539
sh Srinivas wrote:
> Yes this is a known issue.
>
> The HDFS part of this was addressed in
> https://issues.apache.org/jira/browse/HDFS-744 for 2.0.2-alpha and is not
> available in 1.x release. I think HBase does not use this API yet.
>
>
> On Mon, Jul 1, 2013 at 3:00 PM, Da
On Tue, Feb 26, 2013 at 4:23 PM, Jean-Daniel Cryans wrote:
> Well the rest of the logic is part of the replication code, so
> logically I think it needs to be disabled too if you kill replication.
> It leaves us with the choice of keeping the logs around or not. If you
> think the former is danger
me you should never have to stay on stop_replication
> more than a few minutes, either you'll continue replicating, you drop
> the peer, or you disable that peer.
>
> FWIW setting hbase.replication to true with no peers should achieve
> what you want, no need to call stop_replica
We have been preparing to enable replication between two large clusters.
For the past couple of weeks, replication has been enabled via
hbase-site.xml, but the replication state has been false (set false by
issuing a stop_replication command).
The master is no longer cleaning any logs from /hbase/
We recently saw some of these warnings in a cluster we were setting up.
These warnings mean there are rows in the META table that are missing one
of the expected columns. In our case, we verified that these regions
didn't appear to exist in HDFS either and the table itself showed no holes
or probl
This fork looks a bit more up to date:
https://github.com/ndimiduk/orderly
On Mon, Nov 5, 2012 at 4:26 PM, Dave Latham wrote:
> Here's a project to deal with this issue specifically. I'm not sure of
> it's status:
> https://github.com/conikeec/orderly
>
>
> On
Here's a project to deal with this issue specifically. I'm not sure of
it's status:
https://github.com/conikeec/orderly
On Mon, Nov 5, 2012 at 4:01 PM, lars hofhansl wrote:
> Have a look at the lily library. It has code to encode Longs/Doubles into
> bytes such that resulting bytes sort as expe
On Fri, Oct 19, 2012 at 5:22 PM, Amandeep Khurana wrote:
> Answers inline
>
> On Fri, Oct 19, 2012 at 4:31 PM, Dave Latham wrote:
>
>> I need to scale an internal service / datastore that is currently hosted on
>> an HBase cluster and wanted to ask for advice from
antages when it came to SSDs?
>
> If your data lookups exhibits temporal locality, external, client side cache
> pools may help.
>
> My 2c,
> Abhishek
>
>
> -Original Message-
> From: ddlat...@gmail.com [mailto:ddlat...@gmail.com] On Behalf Of Dave Latham
> S
Woohoo! Many thanks to everyone who contributed to this big release. One
of HBase's biggest strengths is its community.
Stack, the link to the upgrade guide doesn't seem to be working, and I
don't see any information on the page about upgrading to 0.92.
Dave
On Mon, Jan 23, 2012 at 3:57 PM, St
We just hit the same issue. I attached log snippets from the regionserver
and master into https://issues.apache.org/jira/browse/HBASE-4107
I was able to get the log file out of hdfs. Is there a location I can put
it back in to have it picked up?
Dave
On Fri, Jul 15, 2011 at 12:23 PM, Andy Saut
I believe this is what Eran is suggesting:
Table A
---
Row1 (has joinVal_1)
Row2 (has joinVal_2)
Row3 (has joinVal_1)
Table B
---
Row4 (has joinVal_1)
Row5 (has joinVal_3)
Row6 (has joinVal_2)
Mapper receives a list of input rows (union of both input tables in any
order), and produces (=
I'd recommend adding -Dlog4j.debug to the JVM args for any JVM that's not
giving you what you expect. In this case, if it's the map/reduce tasks, add
it to mapred.child.java.opts in mapred-site.xml. It should show you what
configuration log4j is actually picking up.
Dave
On Wed, May 25, 2011 at
Are you using TableInputFormat? If so, if you turn on DEBUG level logging
for hbase (or just org.apache.hadoop.hbase.mapreduce.TableInputFormatBase)
you should see lines like this, giving the map task number, region location,
start row, and end row:
getSplits: split -> 0 ->
hslave107:,@G\xA0\xFB\
Are you using the graceful_stop script?
In 0.90.3 the bin/graceful_stop.sh script was updated to disable the
master's balancer. However, it doesn't seem that anything re-enables it, so
if you're using it you need to re-enable it on your own. See the book for
more details:
http://hbase.apache.org
The HBase book ( http://hbase.apache.org/book/upgrading.html ) states,
> This version of 0.90.x HBase can be started on data written by HBase 0.20.x
> or HBase 0.89.x. There is no
> need of a migration step. HBase 0.89.x and 0.90.x does write out the name of
> region directories differently --
>
If the ordering of the row ids is the same in both tables and both are of
the same order of magnitude of size, I would recommend opening scanners on
both tables, then compare the current row in each scanner, and advance
whichever scanner is behind. Whenever you hit a match, you output it and
advan
52 matches
Mail list logo