Re: Considerations for using HBase in User Facing applications

2011-05-05 Thread Xavier Stevens
1.) We use a different cluster for each app. I don't know if this is best practice or not to be honest. We just wanted to separate downtime and potential damage for each application. 2.) We usually use the HBase APIs directly. Having said that, we recently started working on a new service. We were

HBase 0.90.1 REST API

2011-04-18 Thread Xavier Stevens
Currently we're trying to use the REST API on HBase 0.90.1. I'm getting a 500 response saying "Invalid row key". I am trying to post this from a python program: URL: http://ourserver:8080/rest_test/713c5967-b2e9-4f44-b0a1-8b862838f865/metrics:json Data: {"Row": {"Cell": [{"@column": "metrics:json

Re: estimate HBase DFS filesystem usage

2011-01-24 Thread Xavier Stevens
d like ideally is to get an idea of what the > fixed cost (in terms of bytes) is for each my tables, and then understand > how I can calculate a variable bytes/record cost. > > Is this feasible? > > Norbert > > On Mon, Jan 24, 2011 at 1:16 PM, Xavier Stevens wrote: > >>

Re: estimate HBase DFS filesystem usage

2011-01-24 Thread Xavier Stevens
Norbert, It would probably be best if you wrote a quick MapReduce job that iterates over those records and outputs the sum of bytes for each one. Then you could use that output and get some general descriptive statistics based on it. Cheers, -Xavier On 1/24/11 9:37 AM, Norbert Burger wrote:

Re: Java Commited Virtual Memory significally larged then Heap Memory

2011-01-11 Thread Xavier Stevens
Are you using a newer linux kernel with the new and "improved" memory allocator? If so try setting this in hadoop-env.sh: export MALLOC_ARENA_MAX= Maybe start by setting it to 4. You can thank Todd Lipcon if this works for you. Cheers, -Xavier On 1/11/11 7:24 AM, Andrey Stepachev wrote: > N

Re: MR sharded Scans giving poor performance..

2010-07-26 Thread Xavier Stevens
ut why your > system is better performing? The default TableInputFormat is just > creating N map tasks, one for each region, which are all roughly the > same data-size. > > What do you do? > -ryan > > On Mon, Jul 26, 2010 at 3:29 PM, Xavier Stevens wrote: >> We have

Re: MR sharded Scans giving poor performance..

2010-07-26 Thread Xavier Stevens
We have something that might interest you. http://socorro.googlecode.com/svn/trunk/analysis/src/java/org/apache/hadoop/hbase/mapreduce/ We haven't fully tested everything yet, so don't blame us if something goes wrong. It's basically the exact same as TableInputFormat except it takes an array o