I went over related emails in my Inbox.
One previous case was that other task was running on the region server node
which consumed good portion of CPU. In that case I observed a gap of
activities in region server log. I can send that snippet, after
anonymization since there were some IP addresses a
Have you looked at
http://hbase.apache.org/book.html#trouble.rs.runtime.zkexpired , suggested
below ?
Yes I did, but GC is not the issue here.
I think zookeeper timeout should be more closely watched.
What do you mean?
My ZK timeout today is 150 secs, which is very big. However, my
problem
Thanks for the sharing.
Have you looked at
http://hbase.apache.org/book.html#trouble.rs.runtime.zkexpired , suggested
below ?
I think zookeeper timeout should be more closely watched.
On Wed, Apr 3, 2013 at 11:21 AM, Pablo Musa wrote:
> Hello guys,
> I stopped my research on HBase ZK timeout
Hello guys,
I stopped my research on HBase ZK timeout for while due to
other issues I had to do, but I am back.
A very weird behavior that I would like your comments is that my
RegionServers perform better (less crashes) under heavy load instead
of light load.
There is, if I let HBase alone with
Guys,
thank you very much for the help.
Yesterday I spent 14 hours trying to tune the whole cluster.
The cluster is not ready yet needs a lot of tunning, but at least is
working.
My first big problem was namenode + datanode GC. They were not using
CMS and thus were taking "incremental" time to
Be careful with GC tuning, throwing changes at an application without
analysis of what is going on with the heap is shooting in the dark. One
particular good treatment of the subject is here:
http://java.dzone.com/articles/how-tame-java-gc-pauses
If you have made custom changes to blockcache or me
Pablo,
another, what's your java version?
On Mon, Mar 11, 2013 at 10:13 AM, Azuryy Yu wrote:
> Hi Pablo,
> It'a terrible for a long minor GC. I don't think there are swaping from
> your vmstat log.
> but I just suggest you
> 1) add following JVM options:
> -XX:+DisableExplicitGC -XX:+UseCompre
Hi Pablo,
It'a terrible for a long minor GC. I don't think there are swaping from
your vmstat log.
but I just suggest you
1) add following JVM options:
-XX:+DisableExplicitGC -XX:+UseCompressedOops -XX:GCTimeRatio=19
-XX:SoftRefLRUPolicyMSPerMB=0 -XX:SurvivorRatio=2
-XX:MaxTenuringThreshold=3 -XX:+
You could increase your zookeeper session timeout to 5 minutes while you
are figuring why these long pauses.
http://hbase.apache.org/book.html#zookeeper.session.timeout
Above, there is an outage for almost 5 minutes:
>> We slept 225100ms instead of 3000ms, this is likely due to a long
You have g
Hi Sreepathi,
they say in the book (or the site), we could try it to see if it is
really a timeout error
or there is something more. But it is not recomended for production
environments.
I could give it a try if five minutes will ensure to us that the problem
is the GC or
elsewhere!! Anyway,
Hi Stack/Ted/Pablo,
Should we increase the hbase.rpc.timeout property to 5 minutes ( 30 ms
) ?
Regards,
- Sreepathi
On Sun, Mar 10, 2013 at 11:59 AM, Pablo Musa wrote:
> > That combo should be fine.
>
> Great!!
>
>
> > If JVM is full GC'ing, the application is stopped.
> > The below does
> That combo should be fine.
Great!!
> If JVM is full GC'ing, the application is stopped.
> The below does not look like a full GC but that is a long pause in system
> time, enough to kill your zk session.
Exactly. This pause is really making the zk expire the RS which
shutsdown (logs
in the
On Fri, Mar 8, 2013 at 10:58 AM, Pablo Musa wrote:
> 0.94 currently doesn't support hadoop 2.0
>> Can you deploy hadoop 1.1.1 instead ?
>>
>
> I am using cdh4.2.0 which uses this version as default installation.
> I think it will be a problem for me to deploy 1.1.1 because I would need to
> "upgr
0.94 currently doesn't support hadoop 2.0
Can you deploy hadoop 1.1.1 instead ?
I am using cdh4.2.0 which uses this version as default installation.
I think it will be a problem for me to deploy 1.1.1 because I would need to
"upgrade" the whole cluster with 70TB of data (backup everything, go of
What RAM says.
2013-03-07 17:24:57,887 INFO org.apache.zookeeper.**ClientCnxn: Client
session timed out, have not heard from server in 159348ms for sessionid
0x13d3c4bcba600a7, closing socket connection and attempting reconnect
You Full GC'ing around this time?
Put up your configs in a place whe
I think it is with your GC config. What is your heap size? What is the
data that you pump in and how much is the block cache size?
Regards
Ram
On Fri, Mar 8, 2013 at 9:31 PM, Ted Yu wrote:
> 0.94 currently doesn't support hadoop 2.0
>
> Can you deploy hadoop 1.1.1 instead ?
>
> Are you using
0.94 currently doesn't support hadoop 2.0
Can you deploy hadoop 1.1.1 instead ?
Are you using 0.94.5 ?
Thanks
On Fri, Mar 8, 2013 at 7:44 AM, Pablo Musa wrote:
> Hey guys,
> as I sent in an email a long time ago, the RSs in my cluster did not get
> along
> and crashed 3 times a day. I tried a
17 matches
Mail list logo