Thanks for pointing about setCacheBlocks() ,
its HBase default value will provide better performance for following
Filters as well as for Kevin's multiple Facet search.
-Alok
On Fri, Apr 20, 2012 at 7:02 AM, Kevin M wrote:
> Thanks for pointing me towards setCacheBlocks() and explaining the
> d
Thanks for pointing me towards setCacheBlocks() and explaining the
difference between those two types of caching in HBase.
According to the API documentation, setCacheBlocks defaults to true, so it
looks like HBase will take care of what I am looking for automatically.
Thanks so much for your answ
Regarding caching during scans there are two types of caches:
* caching (bufferring) the records before returning them to the client,
enabled via scan.setCaching(numRows)
* block cache on a regionserver, enabled via setCacheBlocks(true)
The latter one (block cache) is what you are looking for.
No
A good way of doing that start replicating to the new cluster using HBase
replication.
Then *after* replication has been setup and enabled you would issue a CopyTable
M/R for each table.
After the CopyTable jobs are finished you have a backup cluster that behind
only "a few seconds"
(however lo
Would it be possible for you to pastebin a much bigger portion of the
hbase log?
Thx,
J-D
On Tue, Apr 17, 2012 at 10:35 AM, Xin Liu wrote:
> Hi there,
>
> I setup hadoop and hbase on top of EC2 in Pseudo-distributed mode. I
> can use hbase shell to connect. However, when I use java client to
>
No problem.
One of the hardest things to do is to try to be open to other design ideas and
not become wedded to one.
I think once you get that working you can start to look at your cluster.
On Apr 19, 2012, at 1:26 PM, Narendra yadala wrote:
> Michael,
>
> I will do the redesign and build t
Michael,
I will do the redesign and build the index. Thanks a lot for the insights.
Narendra
On Thu, Apr 19, 2012 at 9:56 PM, Michael Segel wrote:
> Narendra,
>
> I think you are still missing the point.
> 130 seconds to scan the table per iteration.
> Even if you have 10K rows
> 130 * 10^4 or
Thanks for the reply.
I see. Would HBase cache the results of the first scan so it wouldn't take as
long to collect the results? Say there were 5 facets selected one after another.
A new scan would take place with more strict filtering each time on the whole
table rather than to use the results of
Tom,
The overall tradeoff with "table vs prefix" is that the former adds some
(small) amount of cluster management overhead for each new table, whereas the
latter adds runtime overhead (memory, cpu, disk, etc) on every operation. In
your case, since you're just talking about ~3 tables vs 1, my
Narendra,
Since I didn't see the client logs , FullGC is one probably reason I suspect.
No matter it happens in client side or server side. So I suggest to check the
GC log (Open the client GC log both at server and client side) to see whether
FullGC happens with a high frequency, and check the
Narendra,
I think you are still missing the point.
130 seconds to scan the table per iteration.
Even if you have 10K rows
130 * 10^4 or 1.3*10^6 seconds. ~361 hours
Compare that to 10K rows where you then select a single row in your sub select
that has a list of all of the associated rows.
Hi Jieshan
HBase version : Version 0.90.4-cdh3u3
Size of Key Value pair should not be more than 2KB
I changed the GC parameters at the server side. I have not looked into GC
logs yet but I have noticed that it pausing the batch process every now and
then. How do I look at the server GC logs?
Than
Are you sure you need to do table.close() after each put? Looks incorrect.
Alex Baranau
--
Sematext :: http://blog.sematext.com/ :: Solr - Lucene - Hadoop - HBase
On Thu, Apr 19, 2012 at 2:48 AM, Marcin Cylke wrote:
> On 17/04/12 18:45, Alex Baranau wrote:
> > I don't think that your error
Michael,
Thanks for the response. This is a real problem and not a class project.
Boxes itself costed 9k ;)
I think there is some difference in understanding of the problem. The table
has 2m rows but I am looking at the latest 10k rows only in the outer for
loop. Only in the inner for loop i am t
On 17/04/12 18:45, Alex Baranau wrote:
> I don't think that your error is related to CPs stuff. What lib versions do
> you use? Can you compare with those of the HBaseHUT pom?
Ok, I've managed to track down the source of my error. If I do normal
Put modifications in my prePut/postPut method everyt
Hi Narendra,
I have a few doubts:
1. Which version you are using?
2. What's the size of each KeyValue?
3. Did you change the GC parameters in client side or server side? After
changing the GC parameters, did you keep an eye on the GC logs?
Thank you.
Regards,
Jieshan
-Original Message---
Narendra,
Are you trying to solve a real problem, or is this a class project?
Your solution doesn't scale. It's a non starter. 130 seconds for each iteration
times 1 million seconds is how long? 130 million seconds, which is ~36000 hours
or over 4 years to complete.
(the numbers are rough but
Hi Michel
Yes, that is exactly what I do in step 2. I am aware of the reason for the
scanner timeout exceptions. It is the time between two consecutive
invocations of the next call on a specific scanner object. I increased the
scanner timeout to 10 min on the region server and still I keep seeing
So in your step 2 you have the following:
FOREACH row IN TABLE alpha:
SELECT something
FROM TABLE alpha
WHERE alpha.url = row.url
Right?
And you are wondering why you are getting timeouts?
...
...
And how long does it take to do a full table scan? ;-)
(there's more, but that's the
I have an issue with my HBase cluster. We have a 4 node HBase/Hadoop (4*32
GB RAM and 4*6 TB disk space) cluster. We are using Cloudera distribution
for maintaining our cluster. I have a single tweets table in which we store
the tweets, one tweet per row (it has millions of rows currently).
Now I
Hi Ian,
Thank you very much, that pretty much answers it.
Best regards,
Andre Medeiros
From: Ian Varley [ivar...@salesforce.com]
Sent: Wednesday, April 18, 2012 17:11
To: user@hbase.apache.org
Subject: Re: Performance issues of prepending a table
I would
I see, thanks to all~~
Hi,
fwiw, the "close" method was added in HBaseAdmin for HBase 0.90.5.
N.
On Thu, Apr 19, 2012 at 8:09 AM, Eason Lee wrote:
I don't think this issue can resovle the problem
ZKWatcher is removed,but the configuration and HConnectionImplementation
objects are still in
Hi,
fwiw, the "close" method was added in HBaseAdmin for HBase 0.90.5.
N.
On Thu, Apr 19, 2012 at 8:09 AM, Eason Lee wrote:
> I don't think this issue can resovle the problem
> ZKWatcher is removed,but the configuration and HConnectionImplementation
> objects are still in HConnectionManager
>
On 17/04/12 18:45, Alex Baranau wrote:
> I don't think that your error is related to CPs stuff. What lib versions do
> you use? Can you compare with those of the HBaseHUT pom?
Ok, I've managed to track down the source of my error. If I do normal
Put modifications in my prePut/postPut method everyt
24 matches
Mail list logo