Re: EC2 + Thrift inserts

2010-04-30 Thread Chris Tarnas
Thanks, that would be great. Actually the code is perl, I'm using streaming to do the map-reduce (bioinformatics data that we have lots of perl libraries for). So far on a single thread it works quite well (in house we get ~300 rows/sec, on EC2 maybe half that with indexes), usually with the pe

Re: EC2 + Thrift inserts

2010-04-30 Thread Jean-Daniel Cryans
Not sure why you are going through thrift if you are already using java (you want to test thrift's speed because java isn't your main dev language?) but it will maybe add 1ms or 2, really not that bad. Here at StumbleUpon we use thrift to get our php website to talk to HBase and on average we stay

Re: EC2 + Thrift inserts

2010-04-30 Thread Chris Tarnas
On Apr 30, 2010, at 4:44 PM, Jean-Daniel Cryans wrote: > On Fri, Apr 30, 2010 at 4:32 PM, Chris Tarnas wrote: >> >> >> I'm also using thrift to connect and am wondering if that itself puts an >> overall limit on scaling? It does seem that no matter how many more mappers >> and servers I add,

Re: Hbase: GETs are very slow

2010-04-30 Thread Jean-Daniel Cryans
So we chatted a bit on IRC, the reason GETs were slower is because block caching was disabled and all calls were hitting HDFS. I was confused by the first email as it seemed that for some time it was still speedy without caching. I wanted to look at the import issue, but logs weren't available. J

Re: EC2 + Thrift inserts

2010-04-30 Thread Jean-Daniel Cryans
On Fri, Apr 30, 2010 at 4:32 PM, Chris Tarnas wrote: > Thank you, it is nice to get this help. > > I definitely understand the overhead of writing the index, although it seems > much worse than just that overhead would indicate. If I understand you > correctly that is because all inserts into an

Re: EC2 + Thrift inserts

2010-04-30 Thread Chris Tarnas
Thank you, it is nice to get this help. I definitely understand the overhead of writing the index, although it seems much worse than just that overhead would indicate. If I understand you correctly that is because all inserts into an IndexedTable are synchronized on one table? If that was swit

Re: HTable checkAndPut equivalent for Deletes

2010-04-30 Thread Michael Dalton
Thanks Ryan and Jonathan, I'll just do the check-and-Put approach just to get this application into staging. Then I'll file a JIRA soon and start on adding a generic checkAndMutate to handle Puts/Deletes. Best regards, Mike On Fri, Apr 30, 2010 at 2:57 PM, Ryan Rawson wrote: > Hey, > > We do n

Re: Hbase & Hive

2010-04-30 Thread Nick Dimiduk
If by "efficiently", you mean "low latency" then no, you will not get ms-response time for your hive queries over hbase as the hive query planner still results in m/r jobs being run over the cluster. Hope that helps. Cheers, -Nick On Fri, Apr 30, 2010 at 9:55 AM, Jean-Daniel Cryans wrote: > Inl

Re: HTable checkAndPut equivalent for Deletes

2010-04-30 Thread Michael Dalton
Deletes would be fine if I was always comfortable deleting a row, whether or not the row existed. In my application, I'd need to perform a check on a cell which may result in that cell's deletion. So let's say I read in a cell, determine that it's supposed to be deleted, then commit a Delete. I wan

Re: EC2 + Thrift inserts

2010-04-30 Thread Jean-Daniel Cryans
The contrib packages doesn't get as much love as core HBase, so they tend to be under performant and/or reliable and/or maintained and/or etc. In this case the issue doesn't seem that bad since it could just use a HTablePool, but using IndexedTables will definitely be slower than straight insert si

Re: HTable checkAndPut equivalent for Deletes

2010-04-30 Thread Ryan Rawson
Hey, We do need a 'check and delete' but it should really be more like a 'check and mutate' where the mutation could be a delete or a put. As for using explicit locks, the problem with explicit that is lock waiters will consume a handler thread (there is only so many of them!) and eventually you

RE: HTable checkAndPut equivalent for Deletes

2010-04-30 Thread Jonathan Gray
One option would be to just do the delete. Deletes are cheap and nothing bad will happen if you delete data which doesn't exist (unless you do the delete latest version which does require a value to exist). > -Original Message- > From: Michael Dalton [mailto:mwdal...@gmail.com] > Sent:

Re: EC2 + Thrift inserts

2010-04-30 Thread Chris Tarnas
It appears that for multiple simulations loads using the IndexTables probably not the best choice? -chris On Apr 30, 2010, at 2:39 PM, Jean-Daniel Cryans wrote: > Yeah more handlers won't do it here since there's tons of calls > waiting on a single synchronized method, I guess the IndexedRegion

HTable checkAndPut equivalent for Deletes

2010-04-30 Thread Michael Dalton
Hi everyone, I have a quick question -- I'd like to do a simple atomic check-and-Delete for a row. For Put operations, HTable.checkAndPut appears to allow a simple atomic compare-and-update, which is great. However, there doesn't seem to be an equivalent function for deletes. I was thinking about

Re: EC2 + Thrift inserts

2010-04-30 Thread Jean-Daniel Cryans
Yeah more handlers won't do it here since there's tons of calls waiting on a single synchronized method, I guess the IndexedRegion should use a pool of HTables instead of a single one in order to improve indexation throughput. J-D On Fri, Apr 30, 2010 at 2:26 PM, Chris Tarnas wrote: > Here is th

Re: EC2 + Thrift inserts

2010-04-30 Thread Chris Tarnas
Here is the thread dump: I cranked up the handlers to 300 just in case and ran 40 mappers that loaded data via thrift. Each node runs its own thrift server. I saw an average of 18 rows/sec/mapper with no node using more than 10% CPU and no IO wait. It seems no matter how many mappers I throw th

Re: Hbase: GETs are very slow

2010-04-30 Thread Ruben Quintero
We're running 20.3, and it has a 6 GB heap. With block caching on, it seems we were running out of memory. It would temporarily lose a region server (usually when it attempted to split) and that caused a chain reaction when it attempted to recover. The heap would start to surge and cause a he

Re: Pathological ZK cluster: 1 server verbosely WARN'ing, other 2 servers pegging CPU

2010-04-30 Thread Patrick Hunt
On 04/30/2010 10:16 AM, Aaron Crow wrote: Hi Patrick, thanks for your time and detailed questions. No worries. When we hear about an issue we're very interested to followup and resolve it, regardless of the source. We take the project goals of high reliability/availablity _very_ seriously,

Re: Hbase & Hive

2010-04-30 Thread Jean-Daniel Cryans
Inline (and added hbase-user to the recipients). J-D On Thu, Apr 29, 2010 at 9:23 PM, Amit Kumar wrote: > Hi Everyone, > > I want to ask about Hbase and Hive. > > Q1> Is there any dialect available which can be used with Hibernate to > create persistence with Hbase. Has somebody written one. I c

Re: Unique row ID constraint

2010-04-30 Thread Tatsuya Kawano
Thanks all for your responses; they are very helpful. 4/30/2010 Todd Lipcon : > Note that your solution is not correct in the case of failure, since the > check and put are not atomic with each other. > > If your client or server fails between the ICV and the put, no other clients > will be able t

Re: Hbase: GETs are very slow

2010-04-30 Thread Jean-Daniel Cryans
Which version? How much heap was given to HBase? WRT block caching, I don't see how it could impact uploading in any way, you should enable it. What was the problem inserting 1B rows exactly? How were you running the upload? Are you making sure there's no swap on the machines? That kills java per

Re: EC2 + Thrift inserts

2010-04-30 Thread Chris Tarnas
Thank you, I'll bump the handler higher and run a jstack on the most loaded one. Now I just need more hours in the day to do it! -chris On Apr 29, 2010, at 9:14 PM, Ryan Rawson wrote: > One thing to check is at the peak of your load, run jstack on one of > the regionservers, and look at the han

Hbase: GETs are very slow

2010-04-30 Thread Ruben Quintero
Hi, I have a hadoop/hbase cluster running on 9 machines (only 8 GB RAM, 1 TB drives), and have recently noticed that Gets from Hbase have slowed down significantly. I'd say at this point I'm not getting more than 100/sec when using the Hbase Java API. DFS-wise, there's plenty of space left (usi

RE: Theoretical question...

2010-04-30 Thread Andrew Purtell
Given your take, I encourage you to check out HBASE-1697. - Andy On Fri Apr 30th, 2010 6:14 AM PDT Michael Segel wrote: > >Andrew, > >Not exactly. > >Within HBase, if you have access, you can do anything to any resource. I don't >believe there's a concept of permissions. (Unless you can use

RE: Theoretical question...

2010-04-30 Thread Michael Segel
Andrew, Not exactly. Within HBase, if you have access, you can do anything to any resource. I don't believe there's a concept of permissions. (Unless you can use the HDFS permissions inside HBase...) So one idea was to isolate the hbase instance within the cloud. Since people talk about isola