Re: Limiting the number of regions per RS

2013-10-15 Thread Eran Kutner
arrieriq.com > e-mail: vrodio...@carrieriq.com > > > From: e...@gigya-inc.com [e...@gigya-inc.com] On Behalf Of Eran Kutner [ > e...@gigya.com] > Sent: Monday, October 14, 2013 12:15 PM > To: user@hbase.apache.org > Subject: Limiting the

Limiting the number of regions per RS

2013-10-14 Thread Eran Kutner
Hi, We have a cluster with unequal machines, some are newer with more disk space, more RAM and stronger CPUs. HDFS amd M/R can be tuned to consume less resources but I was unable to find a way to cause HBase balancer to put less regions on the weaker servers, so now they have the same number of reg

Re: Slow log splitting (Hbase 0.94.6)

2013-06-18 Thread Eran Kutner
Yu wrote: > What Hadoop version are you using ? > > Can you check NameNode log to see if lease recovery took long time ? > > Cheers > > On Jun 18, 2013, at 5:11 AM, Eran Kutner wrote: > > > Hi, > > We had a brute force cluster shutdown event that was followed b

Slow log splitting (Hbase 0.94.6)

2013-06-18 Thread Eran Kutner
Hi, We had a brute force cluster shutdown event that was followed by log recovery when the cluster went back online. The cluster took hours to split the logs and recover the regions, all of which might have made sense since we have quite a lot of regions (around 13K) but the weird thing is that the

Re: Howto CopyTable from 0.90 to 0.92 ?

2012-06-27 Thread Eran Kutner
that the RPC protocol changed so it's not > possible to copy between those versions. CopyTable just uses the > TableOuputFormat which is an HBase client. > > You need to do an Export to dump the data on HDFS, distcp if needed, > then run an Import. > > Hope this helps, &g

Howto CopyTable from 0.90 to 0.92 ?

2012-06-27 Thread Eran Kutner
Hi, I can't figure out what to put in the rs.class and rs.impl in order to get CopyTable to copy from a 0.90 cluster to 0.92. Also, how should I reference the other cluster JAR? should I add it to the classpath? Thanks. -eran

Re: Occasional regionserver crashes following socket errors writing to HDFS

2012-05-24 Thread Eran Kutner
Thanks Stack for noticing the ZooKeeper timeout, don't know how could I have missed that. After analyzing this for a while it is definitely unrelated to GC. In fact during the last 4 days no GC operation took more than 2 seconds, and those that got close were all concurrent mark sweeps, so they sh

Re: Occasional regionserver crashes following socket errors writing to HDFS

2012-05-10 Thread Eran Kutner
Michale I appreciate the feedback but I'd have to disagree. In my case for example, I need to look at a complete set of data produced by the map phase in order to make a decision and write it to Hbase. So sure I could write all the mappers output to hbase then have another map only job to scan the

Re: Occasional regionserver crashes following socket errors writing to HDFS

2012-05-10 Thread Eran Kutner
; > Sent from a remote device. Please excuse any typos... > > Mike Segel > > On May 10, 2012, at 6:33 AM, Eran Kutner wrote: > > > Thanks Igal, but we already have that setting. These are the relevant > > setting from hdfs-site.xml : > > > >

Re: Occasional regionserver crashes following socket errors writing to HDFS

2012-05-10 Thread Eran Kutner
, Igal Shilman wrote: > Hi Eran, > Do you have: dfs.datanode.socket.write.timeout set in hdfs-site.xml ? > (We have set this to zero in our cluster, which means waiting as long as > necessary for the write to complete) > > Igal. > > On Thu, May 10, 2012 at 11:17 AM, Eran

Occasional regionserver crashes following socket errors writing to HDFS

2012-05-10 Thread Eran Kutner
Hi, We're seeing occasional regionserver crashes during heavy write operations to Hbase (at the reduce phase of large M/R jobs). I have increased the file descriptors, HDFS xceivers, HDFS threads to the recommended settings and actually way above. Here is an example of the HBase log (showing only

Re: Region server shutting down due to HDFS error

2012-04-05 Thread Eran Kutner
Freudian slip :) -eran On Thu, Apr 5, 2012 at 16:52, Ted Yu wrote: > Thanks for writing back. > > I guess you meant 'things are now operating well', below :-) > > On Thu, Apr 5, 2012 at 6:25 AM, Eran Kutner wrote: > > > As promised I'm writing back t

Re: Region server shutting down due to HDFS error

2012-04-05 Thread Eran Kutner
3 just in case. Now that the log is clean a new exception shows up but I'll open a separate thread about it. Thanks everyone. -eran On Wed, Mar 28, 2012 at 23:06, Eran Kutner wrote: > hmmm... I couldn't find it either, so I've looked at the history of that > file and sur

Re: Region server shutting down due to HDFS error

2012-03-28 Thread Eran Kutner
CLOSI" {} \; -print > zhihyu$ find src/main -name '*.java' -exec grep 'Error getting ' {} \; > -print > > Cheers > > On Wed, Mar 28, 2012 at 9:45 AM, Eran Kutner wrote: > > > I don't see any prior HDFS issues in the 15 minutes before this &g

Re: Region server shutting down due to HDFS error

2012-03-28 Thread Eran Kutner
t; J-D > > On Wed, Mar 28, 2012 at 4:28 AM, Eran Kutner wrote: > > Hi, > > > > We have region server sporadically stopping under load due supposedly to > > errors writing to HDFS. Things like: > > > > 2012-03-28 00:37:11,210 WARN org.apache.hadoop.hdf

Re: Region server shutting down due to HDFS error

2012-03-28 Thread Eran Kutner
om https://issues.apache.org/jira/browse/HBASE-4222 > will also be in CDH3u4. > > On Wed, Mar 28, 2012 at 8:39 PM, Eran Kutner wrote: > > Hi Jimmy, > > HBase is built from latest sources of 0.90 branch (0.90.7-SNAPSHOT), I > had > > the same problem with 0.90.4 > > H

Re: Region server shutting down due to HDFS error

2012-03-28 Thread Eran Kutner
> Jimmy > > On Wed, Mar 28, 2012 at 4:28 AM, Eran Kutner wrote: > > Hi, > > > > We have region server sporadically stopping under load due supposedly to > > errors writing to HDFS. Things like: > > > > 2012-03-28 00:37:11,210 WARN org.apache.hadoop.hd

Region server shutting down due to HDFS error

2012-03-28 Thread Eran Kutner
Hi, We have region server sporadically stopping under load due supposedly to errors writing to HDFS. Things like: 2012-03-28 00:37:11,210 WARN org.apache.hadoop.hdfs.DFSClient: Error while syncing java.io.IOException: All datanodes 10.1.104.10:50010 are bad. Aborting.. It's happening with a diff

Re: Lease does not exist exceptions

2011-10-20 Thread Eran Kutner
Perfect! Thanks. -eran On Thu, Oct 20, 2011 at 23:27, Jean-Daniel Cryans wrote: > hbase.regionserver.lease.period > > Set it bigger than 6. > > J-D > > On Thu, Oct 20, 2011 at 2:23 PM, Eran Kutner wrote: > > > > Thanks J-D! > > Since my main table

Re: Lease does not exist exceptions

2011-10-20 Thread Eran Kutner
Thanks J-D! Since my main table is expected to continue growing I guess at some point even setting the cache size to 1 will not be enough. Is there a way to configure the lease timeout? -eran On Thu, Oct 20, 2011 at 23:16, Jean-Daniel Cryans wrote: > On Wed, Oct 19, 2011 at 12:51 PM, E

Re: Lease does not exist exceptions

2011-10-19 Thread Eran Kutner
, but it seems unlikely since what happens in those sync > blocks is fast. > > If you do get some UnknownScannerExceptions, they will show how long you > took before going back to the server by say like 65340ms ms passed since > the > last invocation, timeout is currently set to

Re: Lease does not exist exceptions

2011-10-18 Thread Eran Kutner
> P.S IIRC, J-D tripped over a cause recently but I can't find it at the mo. > > On Tue, Oct 18, 2011 at 10:28 AM, Eran Kutner wrote: > > Hi, > > I'm having a problem when running map/reduce on a table with about 500 > > regions. > > The MR job shows this

Lease does not exist exceptions

2011-10-18 Thread Eran Kutner
Hi, I'm having a problem when running map/reduce on a table with about 500 regions. The MR job shows this kind of excpetions: 11/10/18 06:03:39 INFO mapred.JobClient: Task Id : attempt_201110030100_0086_m_62_0, Status : FAILED org.apache.hadoop.hbase.regionserver.LeaseException: org.apache.hado

Re: Errors after major compaction

2011-07-07 Thread Eran Kutner
> Well, the master doesn't know that s05 has the region open -- thats > why it gives it to s02 -- and then, there is no channel available to > s05 to figure who has what The way I see it, that's the root of the problem. It would probably make sense if the RS could figure this out independently fro

Re: Errors after major compaction

2011-07-05 Thread Eran Kutner
gt; > > > > > On Mon, Jul 4, 2011 at 2:30 AM, Ted Yu wrote: > > > >> Thanks for the understanding. > >> > >> Can you log a JIRA and put your ideas below in it ? > >> > >> > >> > >> On Jul 4, 2011, at 12:42 AM, Eran Ku

Re: Errors after major compaction

2011-07-05 Thread Eran Kutner
anks for the understanding. > > > > Can you log a JIRA and put your ideas below in it ? > > > > > > > > On Jul 4, 2011, at 12:42 AM, Eran Kutner wrote: > > > > > Thanks for the explanation Ted, > > > > > > I will try to apply HBASE-

Re: Errors after major compaction

2011-07-04 Thread Eran Kutner
Sure, I'll do that. -eran On Mon, Jul 4, 2011 at 12:30, Ted Yu wrote: > Thanks for the understanding. > > Can you log a JIRA and put your ideas below in it ? > > > > On Jul 4, 2011, at 12:42 AM, Eran Kutner wrote: > > > Thanks for the explanation Ted, >

Re: Errors after major compaction

2011-07-04 Thread Eran Kutner
k it makes sense to evaluate the effect of HBASE-3789 in 0.90.4 > > BTW were the incorrect region assignments observed for a table with > multiple > initial regions ? > If so, I have HBASE-4010 in TRUNK which speeds up initial region assignment > by about 50%. > > Cheers >

Re: Errors after major compaction

2011-07-03 Thread Eran Kutner
at patch. > > Regards > > On Sun, Jul 3, 2011 at 10:01 AM, Eran Kutner wrote: > > > Thanks Ted, but, as stated before, I'm already using 0.90.3, so either > it's > > not fixed or it's not the same thing. > > > > -eran > > > &

Re: Errors after major compaction

2011-07-03 Thread Eran Kutner
gt; though it doesn't directly handle 'PENDING_OPEN for too long' case. > > https://issues.apache.org/jira/browse/HBASE-3741 is in 0.90.3 and actually > close to the symptom you described. > > On Sun, Jul 3, 2011 at 12:00 AM, Eran Kutner wrote: > > > It does

Re: Errors after major compaction

2011-07-03 Thread Eran Kutner
reassigning > > region=gs_raw_events,GSLoad_1308518553_168_WEB204,1308533970928.584dac5cc70d8682f71c4675a843c309. > > The double assignment should have been fixed by J-D's recent checkin. > > On Fri, Jul 1, 2011 at 3:14 PM, Stack wrote: > > > Is > > > gs_raw_ev

Re: Errors after major compaction

2011-07-01 Thread Eran Kutner
assignments). > > What version of hbase? > > St.Ack > > > On Thu, Jun 30, 2011 at 3:58 AM, Eran Kutner wrote: > > Hi, > > I have a cluster of 5 nodes with one large table that currently has > around > > 12000 regions. Everything was working fine for rela

Errors after major compaction

2011-06-30 Thread Eran Kutner
Hi, I have a cluster of 5 nodes with one large table that currently has around 12000 regions. Everything was working fine for relatively long time, until now. Yesterday I significantly reduced the TTL on the table and initiated major compaction. This should have reduced the table size to about 20%

Re: How to efficiently join HBase tables?

2011-06-09 Thread Eran Kutner
t; > > > > > > > My suggested from earlier in the thread was a variant of nested loops > > by > > > using multi-get in HTable, which would reduce the number of RPC calls. > > So > > > it's a "bulk-select nested loops" of sorts (i.e., as

Re: How to efficiently join HBase tables?

2011-06-08 Thread Eran Kutner
seems like a reasonable goal). > I am not exactly sure what Eran is going for here, but it seems like Eran is > glossing over a piece. If you have two scanners for table A and B, then > table B needs to be rescanned for every unique part of the join condition in > table A. There are certa

Re: How to efficiently join HBase tables?

2011-06-03 Thread Eran Kutner
n addition to lessening the load on the perhaps live RegionServer. > > >> There's no Jira for this, I'm tempted to open one. > > >> > > >> On Tue, May 31, 2011 at 5:18 PM, Jason Rutherglen > > >> wrote: > > >>>> The Hive-HB

Re: How to efficiently join HBase tables?

2011-06-01 Thread Eran Kutner
n operations do not require realtime-ness, and so > > faster batch jobs using Hive -> frozen HBase files in HDFS could be > > the optimal way to go? > > > > On Tue, May 31, 2011 at 1:41 PM, Patrick Angeles > wrote: > >> On Tue, May 31, 2011 at 3:19 PM, Eran Kutne

Re: How to efficiently join HBase tables?

2011-05-31 Thread Eran Kutner
For my need I don't really need the general case, but even if I did I think it can probably be done simpler. The main problem is getting the data from both tables into the same MR job, without resorting to lookups. So without the theoretical MutliTableInputFormat, I could just copy all the data fro

Re: How to efficiently join HBase tables?

2011-05-31 Thread Eran Kutner
gt; > > > With multi-get in .90.x you could perform some reasonably clever > processing and not do the lookups one-by-one but in batches. > > > > Also, if the other table is "small" you could have the leverage the block > cache on the lookups (i.e., if it's a domain/lookup table). > &g

Re: How to efficiently join HBase tables?

2011-05-31 Thread Eran Kutner
ns for the two > tables, perhaps something similar to Hadoop's MultipleInputs should do the > trick. > > > On 05/31/2011 02:06 PM, Eran Kutner wrote: > >> Hi, >> I need to join two HBase tables. The obvious way is to use a M/R job for >> that. The prob

How to efficiently join HBase tables?

2011-05-31 Thread Eran Kutner
Hi, I need to join two HBase tables. The obvious way is to use a M/R job for that. The problem is that the few references to that question I found recommend pulling one table to the mapper and then do a lookup for the referred row in the second table. This sounds like a very inefficient way to do

Re: Performance test results

2011-05-09 Thread Eran Kutner
I tried flushing the table, not a specific region. -eran On Mon, May 9, 2011 at 20:03, Stack wrote: > On Mon, May 9, 2011 at 9:31 AM, Eran Kutner wrote: > > OK, I tried it, truncated the table and ran inserts for about a day. Now > I > > tried flushing the table but I ge

Re: Performance test results

2011-05-09 Thread Eran Kutner
posedly in the offline region (I'm assuming the region name indicates the first key in the region). -eran On Wed, May 4, 2011 at 15:20, Eran Kutner wrote: > J-D, > I'll try what you suggest but it is worth pointing out that my data set has > over 300M rows, however in my

Re: Performance test results

2011-05-04 Thread Eran Kutner
aniel Cryans wrote: > On Tue, May 3, 2011 at 6:20 AM, Eran Kutner wrote: > > Flushing, at least when I try it now, long after I stopped writing, > doesn't > > seem to have any effect. > > Bummer. > > > > > In my

Re: Performance test results

2011-05-03 Thread Eran Kutner
a moving average for > the percentages so at some point those numbers are set in stone... > > The handler config is only good if you are using a ton of clients, > which doesn't seem to be the case (at least now). > > J-D > > On Wed, Apr 27, 2011 at 6:42 AM, Eran Kutner

Re: Performance test results

2011-04-27 Thread Eran Kutner
Since the attachment didn't make it, here it is again: http://shortText.com/jp73moaesx -eran On Wed, Apr 27, 2011 at 16:51, Eran Kutner wrote: > Hi Josh, > > The connection pooling code is attached AS IS (with all the usual legal > disclaimers), note that you will have to m

Re: Performance test results

2011-04-27 Thread Eran Kutner
on't close it in the application code. -eran On Wed, Apr 27, 2011 at 00:30, Josh wrote: > On Tue, Apr 26, 2011 at 3:34 AM, Eran Kutner wrote: > > Hi J-D, > > I don't think it's a Thrift issue. First, I use the TBufferedTransport > > transport, second, I

Re: Performance test results

2011-04-27 Thread Eran Kutner
000B=100MB. This is spread across 5 servers with 16GB of RAM, out of which 12.5GB are allocated to the region servers. -eran On Tue, Apr 26, 2011 at 21:57, Stack wrote: > > On Thu, Apr 21, 2011 at 5:13 AM, Eran Kutner wrote: > > I tested again on a clean table using 100 insert thr

Re: Performance test results

2011-04-26 Thread Eran Kutner
> > As you didn't really debug the speed of Thrift itself in your setup, > this is one more variable in the problem. > > Also you don't really provide metrics about your system apart from > requests/second. Would it be possible for you set them up using this > guide? h

Re: Performance test results

2011-04-21 Thread Eran Kutner
Hi J-D, After stabilizing the configuration, with your great help, I was able to go back to the the load tests. I tried using IRC, as you suggested, to continue this discussion but because of the time difference (I'm GMT+3) it is quite difficult to find a time when people are present and I am avail

Re: Cluster crash

2011-04-13 Thread Eran Kutner
what was > > before that? Also it would be nice to have a view for those blocks on > > all the datanodes. > > > > It would be nice to do this debugging on IRC is it can require a lot > > of back and forth. > > > > J-D > > > > On Mon, Apr 11, 2011

Re: Cluster crash

2011-04-11 Thread Eran Kutner
2434963279 Is there a better way to associate a file with a block? -eran On Mon, Apr 11, 2011 at 21:00, Stack wrote: > On Sun, Apr 10, 2011 at 11:30 PM, Eran Kutner wrote: > > Hi St.Ack and J-D, > > Thanks for looking into this. > > > > It can definitely be a configu

Re: Cluster crash

2011-04-10 Thread Eran Kutner
s even with no load > makes me think it's a configuration issue. Can we see your hdfs > config? > > BTW the HBase log was pointing at 10.1.104.1 as the one having an > issue, is that the log we are looking at? (it doesn't seem so) > > Thx, > > J-D > > On Sun

Re: Cluster crash

2011-04-10 Thread Eran Kutner
s hbase and hadoop processes has > indeed the upped limits? > > Yours, > St.Ack > > 1. http://hbase.apache.org/book/notsoquick.html#requirements > > On Sun, Apr 10, 2011 at 8:07 AM, Eran Kutner wrote: >> Hi, >> While doing load testing on HBase the entire cluster

Cluster crash

2011-04-10 Thread Eran Kutner
Hi, While doing load testing on HBase the entire cluster crashed with errors like these in hbase logs: 2011-04-10 10:14:30,844 WARN org.apache.hadoop.hdfs.DFSClient: Error Recovery for block blk_1213779416283711358_54194 bad datanode[0] 10.1.104.1:50010 2011-04-10 10:14:30,844 WARN org.apache.hado

Re: Performance test results

2011-03-31 Thread Eran Kutner
I assume the block cache tunning key you talk about is "hfile.block.cache.size", right? If it is only 20% by default than what is the rest of the heap used for? Since there are no fancy operations like joins and since I'm not using memory tables the only thing I can think of is the memstore right?

Re: Region server crashes when using replication

2011-03-29 Thread Eran Kutner
Thanks again J-D. I will avoid using stop_replication from now on. As for the shell, JRuby (or even Java for that matter) is not really our strong suit here, but I'll try to give it a look when I have some time. -eran On Mon, Mar 28, 2011 at 23:43, Jean-Daniel Cryans wrote: > Inline. > >> Tha

Re: Performance test results

2011-03-29 Thread Eran Kutner
-eran On Mon, Mar 28, 2011 at 17:38, Stack wrote: > > On Mon, Mar 28, 2011 at 3:16 AM, Eran Kutner wrote: > > I started with a basic insert operation. Inserting rows with one > > column with 1KB of data each. > > Initially, when the table was empty I was getting around 3

Re: Region server crashes when using replication

2011-03-28 Thread Eran Kutner
he previous day, on the 25th when replication was enabled no > other log with data in it was rolled so none was added to replicate. > > Bottom line, disabling replication is a kill switch and shouldn't only > be used with that functionality in mind. Starting the cluster with >

Performance test results

2011-03-28 Thread Eran Kutner
Hi, I'm running some performance tests on a cluster with 5 member servers (not counting the masters of all kinds), each node running a data node, a region server and a thrift server. Each server has 2 quad core CPUs and 16GB of RAM. The data set I'm using is built of 50 sets of consecutive keys wit

Re: Region server crashes when using replication

2011-03-27 Thread Eran Kutner
ld be missing before? -eran On Fri, Mar 25, 2011 at 21:26, Eran Kutner wrote: > > Thanks, J-D, that managed to solve a part of the problem. The servers > have stopped crashing and the master now properly detects when a RS > goes down, by the way, since the RS does detect this it may be

Re: Region server crashes when using replication

2011-03-25 Thread Eran Kutner
Thanks, J-D, that managed to solve a part of the problem. The servers have stopped crashing and the master now properly detects when a RS goes down, by the way, since the RS does detect this it may be a good idea to stop the server on this event which is a significant configuration issue. However n

Re: Region server crashes when using replication

2011-03-24 Thread Eran Kutner
Now it doesn't like the email because it was in HTML format... As I said, not a very smart piece of software. On Fri, Mar 25, 2011 at 00:07, Eran Kutner wrote: > > You make it sound like it's a bad thing :) > But seriously, SpamAssassin is really not the brightest anti spam

Re: Region server crashes when using replication

2011-03-22 Thread Eran Kutner
, Mar 22, 2011 at 21:01, Jean-Daniel Cryans wrote: > Inline. > > J-D > > On Tue, Mar 22, 2011 at 11:51 AM, Eran Kutner wrote: >> Thanks, J-D. >> As for the first issue, why does this behavior make sense? What happens when >> the connection between the two cluster fail

Re: Region server crashes when using replication

2011-03-22 Thread Eran Kutner
Thanks, J-D. As for the first issue, why does this behavior make sense? What happens when the connection between the two cluster fails? Will the region servers of the primary fail as well? or at least won't be able to start? Seems very radical. Regarding the second issue, I didn't see anything el

Re: CopyTable MR job hangs

2011-03-16 Thread Eran Kutner
ce like the distcp documentation recommends. > > J-D > > On Tue, Mar 15, 2011 at 1:11 AM, Eran Kutner wrote: > > No idea anyone? > > > > -eran > > > > > > > > On Wed, Mar 2, 2011 at 16:40, Eran Kutner wrote: > > > >> Hi, > >>

Re: CopyTable MR job hangs

2011-03-15 Thread Eran Kutner
No idea anyone? -eran On Wed, Mar 2, 2011 at 16:40, Eran Kutner wrote: > Hi, > I'm trying to copy data from an older cluster using 0.89 (CDH3b3) to a new > one using 0.91 (CDH3b4) using the CopyTable MR job but it always hangs on > "map 0% reduce 0%" until even

CopyTable MR job hangs

2011-03-02 Thread Eran Kutner
Hi, I'm trying to copy data from an older cluster using 0.89 (CDH3b3) to a new one using 0.91 (CDH3b4) using the CopyTable MR job but it always hangs on "map 0% reduce 0%" until eventually the job is killed by Hadoop for not responding after 600 seconds. I verified that it works fine when copying f

Re: Which LZO library to use?

2010-08-02 Thread Eran Kutner
are no version compatibility > issues currently. > > Alex K > > On Mon, Aug 2, 2010 at 3:25 AM, Eran Kutner wrote: > > > Hi, > > I want to enable LZO compression on my cluster but see there are a few > > alternatives and the wiki page itself is very confusing s

Which LZO library to use?

2010-08-02 Thread Eran Kutner
Hi, I want to enable LZO compression on my cluster but see there are a few alternatives and the wiki page itself is very confusing so it's not clear what is the right choice. I was looking at this page: http://wiki.apache.org/hadoop/UsingLzoCompression, at the top it recommends using Kevin Weil's v