HMaster and HRegionServer going down

2013-06-04 Thread Vimal Jain
Hi, I have set up Hbase in pseudo-distributed mode. It was working fine for 6 days , but suddenly today morning both HMaster and Hregion process went down. I checked in logs of both hadoop and hbase. Please help here. Here are the snippets :- *Datanode logs:* 2013-06-05 05:12:51,436 INFO org.apach

Re: Questions about HBase

2013-06-04 Thread Pankaj Gupta
Sorry, forgot to mention that I added the log statements to the method readBlock in HFileReaderV2.java. I'm on hbase 0.94.2. On Tue, Jun 4, 2013 at 11:16 PM, Pankaj Gupta wrote: > Some context on how I observed bloom filters being loaded constantly. I > added the following logging statements to

Re: Questions about HBase

2013-06-04 Thread Pankaj Gupta
Some context on how I observed bloom filters being loaded constantly. I added the following logging statements to HFileReaderV2.java: } if (!useLock) { // check cache again with lock useLock = true; continue; } // Load block from filesystem.

Re: Scan + Gets are disk bound

2013-06-04 Thread Rahul Ravindran
Thanks for the approach you suggested Asaf. This is definitely very promising. Our use case is that, we have a raw stream of events which may have duplicates. After our HBase + MR processing, we would emit a de-duped stream (which would have duplicates eliminated) for later processing. Let me se

Re: Questions about HBase

2013-06-04 Thread Pankaj Gupta
>From what I read about HFileV2 and looking at the performance in my cluster it seems that bloom filter and index blocks are loaded on demand as blocks are accessed. Isn't that the case? I see that bloom filters are being loaded all the time when I run scans and not just once. On Tue, Jun 4, 2013

Re: Questions about HBase

2013-06-04 Thread ramkrishna vasudevan
Whenever the region is opened all the bloom filter meta data are loaded into memory. I think his concern is every time all the store files are read and then we load it into memory and wants some faster ways of doing it. Asaf you are right. Regards Ram On Wed, Jun 5, 2013 at 11:22 AM, Asaf Mesik

Re: Scan + Gets are disk bound

2013-06-04 Thread Rahul Ravindran
Thanks for that confirmation. This is what we hypothesized as well. So, if we are dependent on timerange scans, we need to completely avoid major compaction and depend only on minor compactions? Is there any downside? We do have a TTL set on all the rows in the table. ~Rahul. _

Re: Questions about HBase

2013-06-04 Thread Asaf Mesika
When you do the first read of this region, wouldn't this load all bloom filters? On Wed, Jun 5, 2013 at 8:43 AM, ramkrishna vasudevan < ramkrishna.s.vasude...@gmail.com> wrote: > for the question whether you will be able to do a warm up for the bloom and > block cache i don't think it is possib

Re: Scan + Gets are disk bound

2013-06-04 Thread Asaf Mesika
On Tuesday, June 4, 2013, Rahul Ravindran wrote: > Hi, > > We are relatively new to Hbase, and we are hitting a roadblock on our scan > performance. I searched through the email archives and applied a bunch of > the recommendations there, but they did not improve much. So, I am hoping I > am missi

Re: Questions about HBase

2013-06-04 Thread ramkrishna vasudevan
for the question whether you will be able to do a warm up for the bloom and block cache i don't think it is possible now. Regards Ram On Wed, Jun 5, 2013 at 10:57 AM, Asaf Mesika wrote: > If you will read HFile v2 document on HBase site you will understand > completely how the search for a rec

Re: Scan + Gets are disk bound

2013-06-04 Thread Anoop John
When you set time range on Scan, some files can get skipped based on the max min ts values in that file. Said this, when u do major compact and do scan based on time range, dont think u will get some advantage. -Anoop- On Wed, Jun 5, 2013 at 10:11 AM, Rahul Ravindran wrote: > Our row-keys do

Re: Questions about HBase

2013-06-04 Thread Asaf Mesika
If you will read HFile v2 document on HBase site you will understand completely how the search for a record works and why there is linear search in the block but binary search to get to the right block. Also bear in mind the amount of keys in a blocks is not big since a block in HFile by default is

Re: Questions about HBase

2013-06-04 Thread Anoop John
>4. This one is related to what I read in the HBase definitive guide bloom filter section Given a random row key you are looking for, it is very likely that this key will fall in between two block start keys. The only way for HBase to figure out if the key actually exists is by loading

Re: Questions about HBase

2013-06-04 Thread Pankaj Gupta
Thanks for the replies. I'll take a look at src/main/java/org/apache/ hadoop/hbase/coprocessor/BaseRegionObserver.java. @ramkrishna: I do want to have bloom filter and block index all the time. For good read performance they're critical in my workflow. The worry is that when HBase is restarted it

Re: Scan + Gets are disk bound

2013-06-04 Thread Rahul Ravindran
Our row-keys do not contain time. By time-based scans, I mean, an MR over the Hbase table where the scan object has no startRow or endRow but has a startTime and endTime. Our row key format is +UUID, so, we expect good distribution. We have pre-split initially to prevent any initial hotspotting

Re: Replication is on columnfamily level or table level?

2013-06-04 Thread Anoop John
Yes the replication can be specified at the CF level.. You have used HCD#setScope() right? > S => '3', BLOCKSIZE => '65536'}, {*NAME => 'cf2', REPLICATION_SCOPE => '2'*, You set scope as 2?? You have to set one CF to be replicated to one cluster and another to to another cluster. I dont think it

Re: Scan + Gets are disk bound

2013-06-04 Thread anil gupta
On Tue, Jun 4, 2013 at 11:48 AM, Rahul Ravindran wrote: > Hi, > > We are relatively new to Hbase, and we are hitting a roadblock on our scan > performance. I searched through the email archives and applied a bunch of > the recommendations there, but they did not improve much. So, I am hoping I >

Re: Questions about HBase

2013-06-04 Thread Ted Yu
bq. But i am not very sure if we can control the files getting selected for compaction in the older verisons. Same mechanism is available in 0.94 Take a look at src/main/java/org/apache/hadoop/hbase/coprocessor/BaseRegionObserver.java where you would find the following methods (and more): publ

Re: Questions about HBase

2013-06-04 Thread Ted Yu
bq. I found this jira: https://issues.apache.org/jira/browse/HBASE-5199 but I dont' know if the compaction being talked about there is minor or major. The optimization above applies to minor compaction selection. Cheers On Tue, Jun 4, 2013 at 7:15 PM, Pankaj Gupta wrote: > Hi, > > I have

Re: Questions about HBase

2013-06-04 Thread ramkrishna vasudevan
>>Does Minor compaction remove HFiles in which all entries are out of TTL or does only Major compaction do that Yes it applies for Minor compactions. >>Is there a way of configuring major compaction to compact only files older than a certain time or to compress all the files except the latest

Questions about HBase

2013-06-04 Thread Pankaj Gupta
Hi, I have a few small questions regarding HBase. I've searched the forum but couldn't find clear answers hence asking them here: 1. Does Minor compaction remove HFiles in which all entries are out of TTL or does only Major compaction do that? I found this jira: https://issues.apache.or

Re: Poor HBase map-reduce scan performance

2013-06-04 Thread Sandy Pratt
Haven't had a chance to write a JIRA yet, but I thought I'd pop in here with an update in the meantime. I tried a number of different approaches to eliminate latency and "bubbles" in the scan pipeline, and eventually arrived at adding a streaming scan API to the region server, along with refactori

Re: RPC Replication Compression

2013-06-04 Thread Stack
On Tue, Jun 4, 2013 at 6:48 PM, Jean-Daniel Cryans wrote: > Replication doesn't need to know about compression at the RPC level so > it won't refer to it and as far as I can tell you need to set > compression only on the master cluster and the slave will figure it > out. > > Looking at the code th

Re: Explosion in datasize using HBase as a MR sink

2013-06-04 Thread Stack
On Tue, Jun 4, 2013 at 9:58 PM, Rob Verkuylen wrote: > Finally fixed this, my code was at fault. > > Protobufs require a builder object which was a (non static) protected > object in an abstract class all parsers extend. The mapper calls a parser > factory depending on the input record. Because w

Replication is on columnfamily level or table level?

2013-06-04 Thread N Dm
hi, folks, By reading several documents, I always have the impression that * "Replication* works at the table-*column*-*family level*". However, when I am setting up a table with two columnfamilies and replicate them to two different slavers, the whole table replicated. Is this a bug? Thanks He

Re: Explosion in datasize using HBase as a MR sink

2013-06-04 Thread Rob Verkuylen
Finally fixed this, my code was at fault. Protobufs require a builder object which was a (non static) protected object in an abstract class all parsers extend. The mapper calls a parser factory depending on the input record. Because we designed the parser instances as singletons, the builder ob

Scan + Gets are disk bound

2013-06-04 Thread Rahul Ravindran
Hi, We are relatively new to Hbase, and we are hitting a roadblock on our scan performance. I searched through the email archives and applied a bunch of the recommendations there, but they did not improve much. So, I am hoping I am missing something which you could guide me towards. Thanks in a

Re: Regarding Indexing columns in HBASE

2013-06-04 Thread Michael Segel
Ok... A little bit more detail... First, its possible to store your data in multiple tables each with a different key. Not a good idea for some very obvious reasons You could however create a secondary table which is an inverted table where the rowkey of the index is the value in the base

Re: Regarding Indexing columns in HBASE

2013-06-04 Thread Ian Varley
Rams - you might enjoy this blog post from HBase committer Jesse Yates (from last summer): http://jyates.github.io/2012/07/09/consistent-enough-secondary-indexes.html Secondary Indexing doesn't exist in HBase core today, but there are various proposals and early implementations of it in flight.

Re: Regarding Indexing columns in HBASE

2013-06-04 Thread Ramasubramanian Narayanan
Hi Michel, If you don't mind can you please help explain in detail ... Also can you pls let me know whether we have secondary index in HBASE? regards, Rams On Tue, Jun 4, 2013 at 1:13 PM, Michel Segel wrote: > Quick and dirty... > > Create an inverted table for each index > Then you can t

Re: Regarding Indexing columns in HBASE

2013-06-04 Thread Michel Segel
Quick and dirty... Create an inverted table for each index Then you can take the intersection of the result set(s) to get your list of rows for further filtering. There is obviously more to this, but its the core idea... Sent from a remote device. Please excuse any typos... Mike Segel On

Re: Regarding Indexing columns in HBASE

2013-06-04 Thread Ramasubramanian Narayanan
Hi, The read pattern differs from each application.. Is the below approach fine? Create one HBASE table with a unique rowkey and put all 200 columns into it... create mutiple small HBASE tables where it has the read access pattern columns and the rowkey it is mapped to the master table... e.g

Re: Poor HBase map-reduce scan performance

2013-06-04 Thread Bryan Keller
Thanks Enis, I'll see if I can backport this patch - it is exactly what I was going to try. This should solve my scan performance problems if I can get it to work. On May 29, 2013, at 1:29 PM, Enis Söztutar wrote: > Hi, > > Regarding running raw scans on top of Hfiles, you can try a version o

Re: Regarding Indexing columns in HBASE

2013-06-04 Thread Shahab Yunus
Just a quick thought, why don't you create different tables and duplicate data i.e. go for demoralization and data redundancy. Is your all read access patterns that would require 70 columns are incorporated into one application/client? Or it will be bunch of different clients/applications? If that

Re: RPC Replication Compression

2013-06-04 Thread Jean-Daniel Cryans
Replication doesn't need to know about compression at the RPC level so it won't refer to it and as far as I can tell you need to set compression only on the master cluster and the slave will figure it out. Looking at the code tho, I'm not sure it works the same way it used to work before everythin

Regarding Indexing columns in HBASE

2013-06-04 Thread Ramasubramanian Narayanan
Hi, In a HBASE table, there are 200 columns and the read pattern for diffferent systems invols 70 columns... In the above case, we cannot have 70 columns in the rowkey which will not be a good design... Can you please suggest how to handle this problem? Also can we do indexing in HBASE apart from

Re: Using thrift2 interface but getting : 400 Bad Request

2013-06-04 Thread Simon Majou
No logs there either (in fact no logs are written in any log file when I execute the request) Simon On Tue, Jun 4, 2013 at 5:42 PM, Ted Yu wrote: > Can you check region server log around that time ? > > Thanks > > On Jun 4, 2013, at 8:37 AM, Simon Majou wrote: > > > Hello, > > > > I am using

Re: Using thrift2 interface but getting : 400 Bad Request

2013-06-04 Thread Ted Yu
Can you check region server log around that time ? Thanks On Jun 4, 2013, at 8:37 AM, Simon Majou wrote: > Hello, > > I am using thrift & thrift2 interfaces (thrift for DDL & thrift2 for the > rest), my requests work with thrift but with thrift2 I got a error 400. > > Here is my code (coffees

Using thrift2 interface but getting : 400 Bad Request

2013-06-04 Thread Simon Majou
Hello, I am using thrift & thrift2 interfaces (thrift for DDL & thrift2 for the rest), my requests work with thrift but with thrift2 I got a error 400. Here is my code (coffeescript) : colValue = new types2.TColumnValue family: 'cf', qualifier:'col', value:'yoo' put = new types2.TPut(row:'ro

Re: RPC Replication Compression

2013-06-04 Thread Asaf Mesika
If RPC has compression abilities, how come Replication, which also works in RPC does not get it automatically? On Tue, Jun 4, 2013 at 12:34 PM, Anoop John wrote: > > 0.96 will support HBase RPC compression > Yes > > > Replication between master and slave > will enjoy it as well (important since

Re: what's the typical scan latency?

2013-06-04 Thread Amit Mor
What's your blockCacheHitCachingRatio ? It would tell you about the ratio of scans requested from cache (default) to the scans actually served from the block cache. You can get that from the RS web ui. What you are seeing can almost map to anything, for example: is scanner caching (client side) ena

Re: RPC Replication Compression

2013-06-04 Thread Anoop John
> 0.96 will support HBase RPC compression Yes > Replication between master and slave will enjoy it as well (important since bandwidth between geographically distant data centers is scarce and more expensive) But I can not see it is being utilized in replication. May be we can do improvements in t

RPC Replication Compression

2013-06-04 Thread Asaf Mesika
Hi, Just wanted to make sure if I read in the internet correctly: 0.96 will support HBase RPC compression thus Replication between master and slave will enjoy it as well (important since bandwidth between geographically distant data centers is scarce and more expensive)

Re: HTable and streaming

2013-06-04 Thread Asaf Mesika
What do you mean by indirect blocks? On Tue, Jun 4, 2013 at 7:22 AM, Mohit Anchlia wrote: > Better approach would be to break the data in chunks and create a behaviour > similar to indirect blocks. > > On Mon, Jun 3, 2013 at 9:12 PM, Asaf Mesika wrote: > > > I guess one can hack opening a socke