date:20120419

Re: Applying filters to ResultScanner

2012-04-19 Thread Alok Kumar

Thanks for pointing about setCacheBlocks() , its HBase default value will provide better performance for following Filters as well as for Kevin's multiple Facet search. -Alok On Fri, Apr 20, 2012 at 7:02 AM, Kevin M wrote: > Thanks for pointing me towards setCacheBlocks() and explaining the > d

Re: Applying filters to ResultScanner

2012-04-19 Thread Kevin M

Thanks for pointing me towards setCacheBlocks() and explaining the difference between those two types of caching in HBase. According to the API documentation, setCacheBlocks defaults to true, so it looks like HBase will take care of what I am looking for automatically. Thanks so much for your answ

Re: Applying filters to ResultScanner

2012-04-19 Thread Alex Baranau

Regarding caching during scans there are two types of caches: * caching (bufferring) the records before returning them to the client, enabled via scan.setCaching(numRows) * block cache on a regionserver, enabled via setCacheBlocks(true) The latter one (block cache) is what you are looking for. No

Re: Duplicate an HBase cluster

2012-04-19 Thread lars hofhansl

A good way of doing that start replicating to the new cluster using HBase replication. Then *after* replication has been setup and enabled you would issue a CopyTable M/R for each table. After the CopyTable jobs are finished you have a backup cluster that behind only "a few seconds" (however lo

Re: Need help on using hbase on EC2

2012-04-19 Thread Jean-Daniel Cryans

Would it be possible for you to pastebin a much bigger portion of the hbase log? Thx, J-D On Tue, Apr 17, 2012 at 10:35 AM, Xin Liu wrote: > Hi there, > > I setup hadoop and hbase on top of EC2 in Pseudo-distributed mode. I > can use hbase shell to connect. However, when I use java client to >

Re: HBase parallel scanner performance

2012-04-19 Thread Michael Segel

No problem. One of the hardest things to do is to try to be open to other design ideas and not become wedded to one. I think once you get that working you can start to look at your cluster. On Apr 19, 2012, at 1:26 PM, Narendra yadala wrote: > Michael, > > I will do the redesign and build t

Re: HBase parallel scanner performance

2012-04-19 Thread Narendra yadala

Michael, I will do the redesign and build the index. Thanks a lot for the insights. Narendra On Thu, Apr 19, 2012 at 9:56 PM, Michael Segel wrote: > Narendra, > > I think you are still missing the point. > 130 seconds to scan the table per iteration. > Even if you have 10K rows > 130 * 10^4 or

Re: Applying filters to ResultScanner

2012-04-19 Thread Kevin M

Thanks for the reply. I see. Would HBase cache the results of the first scan so it wouldn't take as long to collect the results? Say there were 5 facets selected one after another. A new scan would take place with more strict filtering each time on the whole table rather than to use the results of

Re: More tables, or add a prefix to each row key?

2012-04-19 Thread Ian Varley

Tom, The overall tradeoff with "table vs prefix" is that the former adds some (small) amount of cluster management overhead for each new table, whereas the latter adds runtime overhead (memory, cpu, disk, etc) on every operation. In your case, since you're just talking about ~3 tables vs 1, my

RE: HBase parallel scanner performance

2012-04-19 Thread Bijieshan

Narendra, Since I didn't see the client logs , FullGC is one probably reason I suspect. No matter it happens in client side or server side. So I suggest to check the GC log (Open the client GC log both at server and client side) to see whether FullGC happens with a high frequency, and check the

Re: HBase parallel scanner performance

2012-04-19 Thread Michael Segel

Narendra, I think you are still missing the point. 130 seconds to scan the table per iteration. Even if you have 10K rows 130 * 10^4 or 1.3*10^6 seconds. ~361 hours Compare that to 10K rows where you then select a single row in your sub select that has a list of all of the associated rows.

Re: HBase parallel scanner performance

2012-04-19 Thread Narendra yadala

Hi Jieshan HBase version : Version 0.90.4-cdh3u3 Size of Key Value pair should not be more than 2KB I changed the GC parameters at the server side. I have not looked into GC logs yet but I have noticed that it pausing the batch process every now and then. How do I look at the server GC logs? Than

Re: hbase coprocessor unit testing

2012-04-19 Thread Alex Baranau

Are you sure you need to do table.close() after each put? Looks incorrect. Alex Baranau -- Sematext :: http://blog.sematext.com/ :: Solr - Lucene - Hadoop - HBase On Thu, Apr 19, 2012 at 2:48 AM, Marcin Cylke wrote: > On 17/04/12 18:45, Alex Baranau wrote: > > I don't think that your error

Re: HBase parallel scanner performance

2012-04-19 Thread Narendra yadala

Michael, Thanks for the response. This is a real problem and not a class project. Boxes itself costed 9k ;) I think there is some difference in understanding of the problem. The table has 2m rows but I am looking at the latest 10k rows only in the outer for loop. Only in the inner for loop i am t

Re: hbase coprocessor unit testing

2012-04-19 Thread Marcin Cylke

On 17/04/12 18:45, Alex Baranau wrote: > I don't think that your error is related to CPs stuff. What lib versions do > you use? Can you compare with those of the HBaseHUT pom? Ok, I've managed to track down the source of my error. If I do normal Put modifications in my prePut/postPut method everyt

RE: HBase parallel scanner performance

2012-04-19 Thread Bijieshan

Hi Narendra, I have a few doubts: 1. Which version you are using? 2. What's the size of each KeyValue? 3. Did you change the GC parameters in client side or server side? After changing the GC parameters, did you keep an eye on the GC logs? Thank you. Regards, Jieshan -Original Message---

Re: HBase parallel scanner performance

2012-04-19 Thread Michel Segel

Narendra, Are you trying to solve a real problem, or is this a class project? Your solution doesn't scale. It's a non starter. 130 seconds for each iteration times 1 million seconds is how long? 130 million seconds, which is ~36000 hours or over 4 years to complete. (the numbers are rough but

Re: HBase parallel scanner performance

2012-04-19 Thread Narendra yadala

Hi Michel Yes, that is exactly what I do in step 2. I am aware of the reason for the scanner timeout exceptions. It is the time between two consecutive invocations of the next call on a specific scanner object. I increased the scanner timeout to 10 min on the region server and still I keep seeing

Re: HBase parallel scanner performance

2012-04-19 Thread Michel Segel

So in your step 2 you have the following: FOREACH row IN TABLE alpha: SELECT something FROM TABLE alpha WHERE alpha.url = row.url Right? And you are wondering why you are getting timeouts? ... ... And how long does it take to do a full table scan? ;-) (there's more, but that's the

HBase parallel scanner performance

2012-04-19 Thread Narendra yadala

I have an issue with my HBase cluster. We have a 4 node HBase/Hadoop (4*32 GB RAM and 4*6 TB disk space) cluster. We are using Cloudera distribution for maintaining our cluster. I have a single tweets table in which we store the tweets, one tweet per row (it has millions of rows currently). Now I

RE: Performance issues of prepending a table

2012-04-19 Thread de Souza Medeiros Andre

Hi Ian, Thank you very much, that pretty much answers it. Best regards, Andre Medeiros From: Ian Varley [ivar...@salesforce.com] Sent: Wednesday, April 18, 2012 17:11 To: user@hbase.apache.org Subject: Re: Performance issues of prepending a table I would

Re: HBaseAdmin needs a close methord

2012-04-19 Thread Eason Lee

I see, thanks to all~~ Hi, fwiw, the "close" method was added in HBaseAdmin for HBase 0.90.5. N. On Thu, Apr 19, 2012 at 8:09 AM, Eason Lee wrote: I don't think this issue can resovle the problem ZKWatcher is removed,but the configuration and HConnectionImplementation objects are still in

Re: HBaseAdmin needs a close methord

2012-04-19 Thread N Keywal

Hi, fwiw, the "close" method was added in HBaseAdmin for HBase 0.90.5. N. On Thu, Apr 19, 2012 at 8:09 AM, Eason Lee wrote: > I don't think this issue can resovle the problem > ZKWatcher is removed,but the configuration and HConnectionImplementation > objects are still in HConnectionManager >

Re: hbase coprocessor unit testing

2012-04-19 Thread Marcin Cylke

On 17/04/12 18:45, Alex Baranau wrote: > I don't think that your error is related to CPs stuff. What lib versions do > you use? Can you compare with those of the HBaseHUT pom? Ok, I've managed to track down the source of my error. If I do normal Put modifications in my prePut/postPut method everyt

Re: Applying filters to ResultScanner

Re: Applying filters to ResultScanner

Re: Applying filters to ResultScanner

Re: Duplicate an HBase cluster

Re: Need help on using hbase on EC2

Re: HBase parallel scanner performance

Re: HBase parallel scanner performance

Re: Applying filters to ResultScanner

Re: More tables, or add a prefix to each row key?

RE: HBase parallel scanner performance

Re: HBase parallel scanner performance

Re: HBase parallel scanner performance

Re: hbase coprocessor unit testing

Re: HBase parallel scanner performance

Re: hbase coprocessor unit testing

RE: HBase parallel scanner performance

Re: HBase parallel scanner performance

Re: HBase parallel scanner performance

Re: HBase parallel scanner performance

HBase parallel scanner performance

RE: Performance issues of prepending a table

Re: HBaseAdmin needs a close methord

Re: HBaseAdmin needs a close methord

Re: hbase coprocessor unit testing

24 matches

Site Navigation

Mail list logo

Footer information