Re: RegionServer silently stops (only "issue": CMS-concurrent-mark ~80sec)

2012-05-01 Thread Igal Shilman
Hi Alex, just to rule out, oom killer, Try this: http://stackoverflow.com/questions/624857/finding-which-process-was-killed-by-linux-oom-killer On Mon, Apr 30, 2012 at 10:48 PM, Alex Baranau wrote: > Hello, > > During recent weeks I constantly see some RSs *silently* dying on our HBase > cluster

Re: RegionServer silently stops (only "issue": CMS-concurrent-mark ~80sec)

2012-05-01 Thread N Keywal
Hi Alex, On the same idea, note that hbase is launched with -XX:OnOutOfMemoryError="kill -9 %p". N. On Tue, May 1, 2012 at 10:41 AM, Igal Shilman wrote: > Hi Alex, just to rule out, oom killer, > Try this: > > http://stackoverflow.com/questions/624857/finding-which-process-was-killed-by-linux-

Re: RegionServer silently stops (only "issue": CMS-concurrent-mark ~80sec)

2012-05-01 Thread Dhaval Shah
Not sure if its related (or even helpful) but we were using cdh3b4 (which is 0.90.1) and we saw similar issues with region servers going down.. we didn't look at GC logs but we had very high zookeeper leases so its unlikely that the GC could have caused the issue.. this problem went away when

Re: How is reconnection handled?

2012-05-01 Thread Alex Baranau
Unfortunately HTable instance creation fails (in ctor) when HBase cannot be reached. Even if HBase was there before and ZK is still running (and holds all the info about regions locations). During htable initialization client tries to locate the first region of this table. This info has to be fetch

Re: RegionServer silently stops (only "issue": CMS-concurrent-mark ~80sec)

2012-05-01 Thread Alex Baranau
Yep, that is what I thought (OOM). Our monitoring tool (sematext.com/spm) says that everything was OK with theregard to JVM memory consumption at the time process stopped. But found that oom killer is one who killed it (on this cluster we have HBASE_HEAPSIZE=4000): /var/log/kern.log.1:Apr 30 18:59

Re: RegionServer silently stops (only "issue": CMS-concurrent-mark ~80sec)

2012-05-01 Thread Andrew Purtell
Does it make sense to include the below in ./bin/hbase and ./bin/hadoop:     echo -17 > /proc/$PID/oom_adj -17 is OOM_DISABLE: http://linux-mm.org/OOM_Killer Best regards,     - Andy Problems worthy of attack prove their worth by hitting back. - Piet Hein (via Tom White) - Original Mes

Re: RegionServer silently stops (only "issue": CMS-concurrent-mark ~80sec)

2012-05-01 Thread Igal Shilman
This will work, but I don't think that this is a good way to go, since you will be treating the symptom and not the actual problem, which is an overloaded node. Igal. On Tue, May 1, 2012 at 6:57 PM, Andrew Purtell wrote: > Does it make sense to include the below in ./bin/hbase and ./bin/hadoop:

Re: RegionServer silently stops (only "issue": CMS-concurrent-mark ~80sec)

2012-05-01 Thread Andrew Purtell
> This will work, but I don't think that this is a good way to go, since you > will be treating the symptom and not the actual problem, which is an overloaded node. I should have been a bit more specific. One would do this for the HDFS and HBase daemons, and not for other subsystems like MR Chi

Re: HBase Thrift for CDH3U3 leaking file descriptors/socket connections to Zookeeper

2012-05-01 Thread Shrijeet Paliwal
Dhaval was hitting https://issues.apache.org/jira/browse/HBASE-4508 , setting hbase.connection.per.config to false did it. On Mon, Apr 30, 2012 at 8:17 PM, Joey Echeverria wrote: > > I took a quick look, but didn't find a smoking gun. Can you get a > jstack of the ThrfitServer when it has the 8K

Re: HBase Cyclic Replication Issue: some data are missing in the replication for intensive write

2012-05-01 Thread Himanshu Vashishtha
Hello Jerry, Did you try this again. Whenever you try next, can you please share the logs somehow. I tried replicating your scenario today, but no luck. I used the same workload you have copied here; master cluster has 5 nodes and slave has just 2 nodes; and made tiny regions of 8MB (memstore fl

Re: HBase Cyclic Replication Issue: some data are missing in the replication for intensive write

2012-05-01 Thread Jerry Lam
Hi Himanshu: Thanks for following up! I did looked up the log and there were some exceptions. I'm not sure if those exceptions contribute to the problem I've seen a week ago. I did aware of the latency between the time that the master said "Nothing to replicate" and the actual time it takes to

Re: HBase Cyclic Replication Issue: some data are missing in the replication for intensive write

2012-05-01 Thread Himanshu Vashishtha
Yeah, I should have mentioned that: its master-master, and on cdh4b1. But, replication on that specific slave table is disabled (so, effectively its master-slave for this test). Is this same as yours (replication config wise), or shall I enable replication on the destination table too? Thanks, Hi

Re: HBase Cyclic Replication Issue: some data are missing in the replication for intensive write

2012-05-01 Thread Jerry Lam
Hi Himanshu: My team is particularly interested in the cyclic replication so I have enable the master-master replication (so each cluster has the other cluster as its replication peer), although the replication was one direction (from cluster A to cluster B) in the test. I didn't stop_replicati

Re: HBase logo suitable for print?

2012-05-01 Thread Otis Gospodnetic
That did it, thanks Stackolini! Otis  Performance Monitoring for Solr / ElasticSearch / HBase - http://sematext.com/spm  > > From: Stack >To: user@hbase.apache.org; Otis Gospodnetic >Sent: Monday, April 30, 2012 5:54 PM >Subject: Re: HBase logo suitable

RE: RegionServer silently stops (only "issue": CMS-concurrent-mark ~80sec)

2012-05-01 Thread Gopinathan A
Hi Alex, You can check the system logs (/var/log/), whether system issued any kill command for that process. This can happen if the resource utilization is more from the RS (like memory usage) for very long time. Are you using GZIP compression? Thanks & Regards, Gopinathan A *

Re: Exceptions with importtsv

2012-05-01 Thread Sambit Tripathy
Thanks Yifeng. Well thought input :) and it works. On Sun, Apr 29, 2012 at 1:43 PM, Yifeng Jiang wrote: > Hi Sambit, > > Are you specifying a local file system path on the command line? > Before invoking importtsv, you will need to copy your tsv files to HDFS at > first. > > -Yifeng > > On Apr 2