Hi,
I have set up Hbase in pseudo-distributed mode.
It was working fine for 6 days , but suddenly today morning both HMaster
and Hregion process went down.
I checked in logs of both hadoop and hbase.
Please help here.
Here are the snippets :-
*Datanode logs:*
2013-06-05 05:12:51,436 INFO
org.apach
Sorry, forgot to mention that I added the log statements to the method
readBlock in HFileReaderV2.java. I'm on hbase 0.94.2.
On Tue, Jun 4, 2013 at 11:16 PM, Pankaj Gupta wrote:
> Some context on how I observed bloom filters being loaded constantly. I
> added the following logging statements to
Some context on how I observed bloom filters being loaded constantly. I
added the following logging statements to HFileReaderV2.java:
}
if (!useLock) {
// check cache again with lock
useLock = true;
continue;
}
// Load block from filesystem.
Thanks for the approach you suggested Asaf. This is definitely very promising.
Our use case is that, we have a raw stream of events which may have duplicates.
After our HBase + MR processing, we would emit a de-duped stream (which would
have duplicates eliminated) for later processing. Let me se
>From what I read about HFileV2 and looking at the performance in my cluster
it seems that bloom filter and index blocks are loaded on demand as blocks
are accessed. Isn't that the case? I see that bloom filters are being
loaded all the time when I run scans and not just once.
On Tue, Jun 4, 2013
Whenever the region is opened all the bloom filter meta data are loaded
into memory. I think his concern is every time all the store files are
read and then we load it into memory and wants some faster ways of doing it.
Asaf you are right.
Regards
Ram
On Wed, Jun 5, 2013 at 11:22 AM, Asaf Mesik
Thanks for that confirmation. This is what we hypothesized as well.
So, if we are dependent on timerange scans, we need to completely avoid major
compaction and depend only on minor compactions? Is there any downside? We do
have a TTL set on all the rows in the table.
~Rahul.
_
When you do the first read of this region, wouldn't this load all bloom
filters?
On Wed, Jun 5, 2013 at 8:43 AM, ramkrishna vasudevan <
ramkrishna.s.vasude...@gmail.com> wrote:
> for the question whether you will be able to do a warm up for the bloom and
> block cache i don't think it is possib
On Tuesday, June 4, 2013, Rahul Ravindran wrote:
> Hi,
>
> We are relatively new to Hbase, and we are hitting a roadblock on our scan
> performance. I searched through the email archives and applied a bunch of
> the recommendations there, but they did not improve much. So, I am hoping I
> am missi
for the question whether you will be able to do a warm up for the bloom and
block cache i don't think it is possible now.
Regards
Ram
On Wed, Jun 5, 2013 at 10:57 AM, Asaf Mesika wrote:
> If you will read HFile v2 document on HBase site you will understand
> completely how the search for a rec
When you set time range on Scan, some files can get skipped based on the
max min ts values in that file. Said this, when u do major compact and do
scan based on time range, dont think u will get some advantage.
-Anoop-
On Wed, Jun 5, 2013 at 10:11 AM, Rahul Ravindran wrote:
> Our row-keys do
If you will read HFile v2 document on HBase site you will understand
completely how the search for a record works and why there is linear search
in the block but binary search to get to the right block.
Also bear in mind the amount of keys in a blocks is not big since a block
in HFile by default is
>4. This one is related to what I read in the HBase definitive guide
bloom filter section
Given a random row key you are looking for, it is very likely that this
key will fall in between two block start keys. The only way for HBase to
figure out if the key actually exists is by loading
Thanks for the replies. I'll take a look at src/main/java/org/apache/
hadoop/hbase/coprocessor/BaseRegionObserver.java.
@ramkrishna: I do want to have bloom filter and block index all the time.
For good read performance they're critical in my workflow. The worry is
that when HBase is restarted it
Our row-keys do not contain time. By time-based scans, I mean, an MR over the
Hbase table where the scan object has no startRow or endRow but has a startTime
and endTime.
Our row key format is +UUID, so, we expect good distribution. We
have pre-split initially to prevent any initial hotspotting
Yes the replication can be specified at the CF level.. You have used
HCD#setScope() right?
> S => '3', BLOCKSIZE => '65536'}, {*NAME => 'cf2', REPLICATION_SCOPE =>
'2'*,
You set scope as 2?? You have to set one CF to be replicated to one cluster
and another to to another cluster. I dont think it
On Tue, Jun 4, 2013 at 11:48 AM, Rahul Ravindran wrote:
> Hi,
>
> We are relatively new to Hbase, and we are hitting a roadblock on our scan
> performance. I searched through the email archives and applied a bunch of
> the recommendations there, but they did not improve much. So, I am hoping I
>
bq. But i am not very sure if we can control the files getting selected for
compaction in the older verisons.
Same mechanism is available in 0.94
Take a look
at src/main/java/org/apache/hadoop/hbase/coprocessor/BaseRegionObserver.java
where you would find the following methods (and more):
publ
bq. I found this jira: https://issues.apache.org/jira/browse/HBASE-5199 but
I dont' know if the
compaction being talked about there is minor or major.
The optimization above applies to minor compaction selection.
Cheers
On Tue, Jun 4, 2013 at 7:15 PM, Pankaj Gupta wrote:
> Hi,
>
> I have
>>Does Minor compaction remove HFiles in which all entries are out of
TTL or does only Major compaction do that
Yes it applies for Minor compactions.
>>Is there a way of configuring major compaction to compact only files
older than a certain time or to compress all the files except the latest
Hi,
I have a few small questions regarding HBase. I've searched the forum but
couldn't find clear answers hence asking them here:
1. Does Minor compaction remove HFiles in which all entries are out of
TTL or does only Major compaction do that? I found this jira:
https://issues.apache.or
Haven't had a chance to write a JIRA yet, but I thought I'd pop in here
with an update in the meantime.
I tried a number of different approaches to eliminate latency and
"bubbles" in the scan pipeline, and eventually arrived at adding a
streaming scan API to the region server, along with refactori
On Tue, Jun 4, 2013 at 6:48 PM, Jean-Daniel Cryans wrote:
> Replication doesn't need to know about compression at the RPC level so
> it won't refer to it and as far as I can tell you need to set
> compression only on the master cluster and the slave will figure it
> out.
>
> Looking at the code th
On Tue, Jun 4, 2013 at 9:58 PM, Rob Verkuylen wrote:
> Finally fixed this, my code was at fault.
>
> Protobufs require a builder object which was a (non static) protected
> object in an abstract class all parsers extend. The mapper calls a parser
> factory depending on the input record. Because w
hi, folks,
By reading several documents, I always have the impression that *
"Replication* works at the table-*column*-*family level*". However, when I
am setting up a table with two columnfamilies and replicate them to two
different slavers, the whole table replicated. Is this a bug? Thanks
He
Finally fixed this, my code was at fault.
Protobufs require a builder object which was a (non static) protected object in
an abstract class all parsers extend. The mapper calls a parser factory
depending on the input record. Because we designed the parser instances as
singletons, the builder ob
Hi,
We are relatively new to Hbase, and we are hitting a roadblock on our scan
performance. I searched through the email archives and applied a bunch of the
recommendations there, but they did not improve much. So, I am hoping I am
missing something which you could guide me towards. Thanks in a
Ok...
A little bit more detail...
First, its possible to store your data in multiple tables each with a different
key.
Not a good idea for some very obvious reasons
You could however create a secondary table which is an inverted table where the
rowkey of the index is the value in the base
Rams - you might enjoy this blog post from HBase committer Jesse Yates (from
last summer):
http://jyates.github.io/2012/07/09/consistent-enough-secondary-indexes.html
Secondary Indexing doesn't exist in HBase core today, but there are various
proposals and early implementations of it in flight.
Hi Michel,
If you don't mind can you please help explain in detail ...
Also can you pls let me know whether we have secondary index in HBASE?
regards,
Rams
On Tue, Jun 4, 2013 at 1:13 PM, Michel Segel wrote:
> Quick and dirty...
>
> Create an inverted table for each index
> Then you can t
Quick and dirty...
Create an inverted table for each index
Then you can take the intersection of the result set(s) to get your list of
rows for further filtering.
There is obviously more to this, but its the core idea...
Sent from a remote device. Please excuse any typos...
Mike Segel
On
Hi,
The read pattern differs from each application..
Is the below approach fine?
Create one HBASE table with a unique rowkey and put all 200 columns into
it...
create mutiple small HBASE tables where it has the read access pattern
columns and the rowkey it is mapped to the master table...
e.g
Thanks Enis, I'll see if I can backport this patch - it is exactly what I was
going to try. This should solve my scan performance problems if I can get it to
work.
On May 29, 2013, at 1:29 PM, Enis Söztutar wrote:
> Hi,
>
> Regarding running raw scans on top of Hfiles, you can try a version o
Just a quick thought, why don't you create different tables and duplicate
data i.e. go for demoralization and data redundancy. Is your all read
access patterns that would require 70 columns are incorporated into one
application/client? Or it will be bunch of different clients/applications?
If that
Replication doesn't need to know about compression at the RPC level so
it won't refer to it and as far as I can tell you need to set
compression only on the master cluster and the slave will figure it
out.
Looking at the code tho, I'm not sure it works the same way it used to
work before everythin
Hi,
In a HBASE table, there are 200 columns and the read pattern for diffferent
systems invols 70 columns...
In the above case, we cannot have 70 columns in the rowkey which will not
be a good design...
Can you please suggest how to handle this problem?
Also can we do indexing in HBASE apart from
No logs there either (in fact no logs are written in any log file when I
execute the request)
Simon
On Tue, Jun 4, 2013 at 5:42 PM, Ted Yu wrote:
> Can you check region server log around that time ?
>
> Thanks
>
> On Jun 4, 2013, at 8:37 AM, Simon Majou wrote:
>
> > Hello,
> >
> > I am using
Can you check region server log around that time ?
Thanks
On Jun 4, 2013, at 8:37 AM, Simon Majou wrote:
> Hello,
>
> I am using thrift & thrift2 interfaces (thrift for DDL & thrift2 for the
> rest), my requests work with thrift but with thrift2 I got a error 400.
>
> Here is my code (coffees
Hello,
I am using thrift & thrift2 interfaces (thrift for DDL & thrift2 for the
rest), my requests work with thrift but with thrift2 I got a error 400.
Here is my code (coffeescript) :
colValue = new types2.TColumnValue family: 'cf', qualifier:'col',
value:'yoo'
put = new types2.TPut(row:'ro
If RPC has compression abilities, how come Replication, which also works in
RPC does not get it automatically?
On Tue, Jun 4, 2013 at 12:34 PM, Anoop John wrote:
> > 0.96 will support HBase RPC compression
> Yes
>
> > Replication between master and slave
> will enjoy it as well (important since
What's your blockCacheHitCachingRatio ? It would tell you about the ratio
of scans requested from cache (default) to the scans actually served from
the block cache. You can get that from the RS web ui. What you are seeing
can almost map to anything, for example: is scanner caching (client side)
ena
> 0.96 will support HBase RPC compression
Yes
> Replication between master and slave
will enjoy it as well (important since bandwidth between geographically
distant data centers is scarce and more expensive)
But I can not see it is being utilized in replication. May be we can do
improvements in t
Hi,
Just wanted to make sure if I read in the internet correctly: 0.96 will
support HBase RPC compression thus Replication between master and slave
will enjoy it as well (important since bandwidth between geographically
distant data centers is scarce and more expensive)
What do you mean by indirect blocks?
On Tue, Jun 4, 2013 at 7:22 AM, Mohit Anchlia wrote:
> Better approach would be to break the data in chunks and create a behaviour
> similar to indirect blocks.
>
> On Mon, Jun 3, 2013 at 9:12 PM, Asaf Mesika wrote:
>
> > I guess one can hack opening a socke
44 matches
Mail list logo