Thanks, that would be great.
Actually the code is perl, I'm using streaming to do the map-reduce
(bioinformatics data that we have lots of perl libraries for). So far on a
single thread it works quite well (in house we get ~300 rows/sec, on EC2 maybe
half that with indexes), usually with the pe
Not sure why you are going through thrift if you are already using
java (you want to test thrift's speed because java isn't your main dev
language?) but it will maybe add 1ms or 2, really not that bad. Here
at StumbleUpon we use thrift to get our php website to talk to HBase
and on average we stay
On Apr 30, 2010, at 4:44 PM, Jean-Daniel Cryans wrote:
> On Fri, Apr 30, 2010 at 4:32 PM, Chris Tarnas wrote:
>>
>>
>> I'm also using thrift to connect and am wondering if that itself puts an
>> overall limit on scaling? It does seem that no matter how many more mappers
>> and servers I add,
So we chatted a bit on IRC, the reason GETs were slower is because
block caching was disabled and all calls were hitting HDFS. I was
confused by the first email as it seemed that for some time it was
still speedy without caching.
I wanted to look at the import issue, but logs weren't available.
J
On Fri, Apr 30, 2010 at 4:32 PM, Chris Tarnas wrote:
> Thank you, it is nice to get this help.
>
> I definitely understand the overhead of writing the index, although it seems
> much worse than just that overhead would indicate. If I understand you
> correctly that is because all inserts into an
Thank you, it is nice to get this help.
I definitely understand the overhead of writing the index, although it seems
much worse than just that overhead would indicate. If I understand you
correctly that is because all inserts into an IndexedTable are synchronized on
one table? If that was swit
Thanks Ryan and Jonathan, I'll just do the check-and-Put approach just to
get this application into staging. Then I'll file a JIRA soon and start on
adding a generic checkAndMutate to handle Puts/Deletes.
Best regards,
Mike
On Fri, Apr 30, 2010 at 2:57 PM, Ryan Rawson wrote:
> Hey,
>
> We do n
If by "efficiently", you mean "low latency" then no, you will not get
ms-response time for your hive queries over hbase as the hive query planner
still results in m/r jobs being run over the cluster.
Hope that helps.
Cheers,
-Nick
On Fri, Apr 30, 2010 at 9:55 AM, Jean-Daniel Cryans wrote:
> Inl
Deletes would be fine if I was always comfortable deleting a row, whether or
not the row existed. In my application, I'd need to perform a check on a
cell which may result in that cell's deletion. So let's say I read in a
cell, determine that it's supposed to be deleted, then commit a Delete. I
wan
The contrib packages doesn't get as much love as core HBase, so they
tend to be under performant and/or reliable and/or maintained and/or
etc. In this case the issue doesn't seem that bad since it could just
use a HTablePool, but using IndexedTables will definitely be slower
than straight insert si
Hey,
We do need a 'check and delete' but it should really be more like a
'check and mutate' where the mutation could be a delete or a put.
As for using explicit locks, the problem with explicit that is lock
waiters will consume a handler thread (there is only so many of them!)
and eventually you
One option would be to just do the delete. Deletes are cheap and nothing bad
will happen if you delete data which doesn't exist (unless you do the delete
latest version which does require a value to exist).
> -Original Message-
> From: Michael Dalton [mailto:mwdal...@gmail.com]
> Sent:
It appears that for multiple simulations loads using the IndexTables probably
not the best choice?
-chris
On Apr 30, 2010, at 2:39 PM, Jean-Daniel Cryans wrote:
> Yeah more handlers won't do it here since there's tons of calls
> waiting on a single synchronized method, I guess the IndexedRegion
Hi everyone,
I have a quick question -- I'd like to do a simple atomic check-and-Delete
for a row. For Put operations, HTable.checkAndPut appears to allow a simple
atomic compare-and-update, which is great. However, there doesn't seem to be
an equivalent function for deletes.
I was thinking about
Yeah more handlers won't do it here since there's tons of calls
waiting on a single synchronized method, I guess the IndexedRegion
should use a pool of HTables instead of a single one in order to
improve indexation throughput.
J-D
On Fri, Apr 30, 2010 at 2:26 PM, Chris Tarnas wrote:
> Here is th
Here is the thread dump:
I cranked up the handlers to 300 just in case and ran 40 mappers that loaded
data via thrift. Each node runs its own thrift server. I saw an average of 18
rows/sec/mapper with no node using more than 10% CPU and no IO wait. It seems
no matter how many mappers I throw th
We're running 20.3, and it has a 6 GB heap.
With block caching on, it seems we were running out of memory. It would
temporarily lose a region server (usually when it attempted to split) and that
caused a chain reaction when it attempted to recover. The heap would start to
surge and cause a he
On 04/30/2010 10:16 AM, Aaron Crow wrote:
Hi Patrick, thanks for your time and detailed questions.
No worries. When we hear about an issue we're very interested to
followup and resolve it, regardless of the source. We take the project
goals of high reliability/availablity _very_ seriously,
Inline (and added hbase-user to the recipients).
J-D
On Thu, Apr 29, 2010 at 9:23 PM, Amit Kumar wrote:
> Hi Everyone,
>
> I want to ask about Hbase and Hive.
>
> Q1> Is there any dialect available which can be used with Hibernate to
> create persistence with Hbase. Has somebody written one. I c
Thanks all for your responses; they are very helpful.
4/30/2010 Todd Lipcon :
> Note that your solution is not correct in the case of failure, since the
> check and put are not atomic with each other.
>
> If your client or server fails between the ICV and the put, no other clients
> will be able t
Which version? How much heap was given to HBase?
WRT block caching, I don't see how it could impact uploading in any
way, you should enable it. What was the problem inserting 1B rows
exactly? How were you running the upload?
Are you making sure there's no swap on the machines? That kills java
per
Thank you, I'll bump the handler higher and run a jstack on the most loaded
one. Now I just need more hours in the day to do it!
-chris
On Apr 29, 2010, at 9:14 PM, Ryan Rawson wrote:
> One thing to check is at the peak of your load, run jstack on one of
> the regionservers, and look at the han
Hi,
I have a hadoop/hbase cluster running on 9 machines (only 8 GB RAM, 1 TB
drives), and have recently noticed that Gets from Hbase have slowed down
significantly. I'd say at this point I'm not getting more than 100/sec when
using the Hbase Java API. DFS-wise, there's plenty of space left (usi
Given your take, I encourage you to check out HBASE-1697.
- Andy
On Fri Apr 30th, 2010 6:14 AM PDT Michael Segel wrote:
>
>Andrew,
>
>Not exactly.
>
>Within HBase, if you have access, you can do anything to any resource. I don't
>believe there's a concept of permissions. (Unless you can use
Andrew,
Not exactly.
Within HBase, if you have access, you can do anything to any resource. I don't
believe there's a concept of permissions. (Unless you can use the HDFS
permissions inside HBase...)
So one idea was to isolate the hbase instance within the cloud.
Since people talk about isola
25 matches
Mail list logo