Re: How to speedup Hbase query throughput

2011-05-18 Thread Stack
On Wed, May 18, 2011 at 5:11 PM, Weihua JIANG wrote: > All the DNs almost have the same number of blocks. Major compaction > makes no difference. > I would expect major compaction to even the number of blocks across the cluster and it'd move the data for each region local to the regionserver. Th

Re: [ANN]: HBaseWD: Distribute Sequential Writes in HBase

2011-05-18 Thread Weishung Chung
I have another question about option 2. It seems like I need to handle the distributed scan differently to read from start row to end row, assuming 1 byte hash of the original key is used as prefix since the order of the original key range is different from the resulting distributed key range. On

Re: How to speedup Hbase query throughput

2011-05-18 Thread Weihua JIANG
All the DNs almost have the same number of blocks. Major compaction makes no difference. Thanks Weihua 2011/5/18 Stack : > Are there more blocks on these hot DNs than there are on the cool > ones?   If you run a major compaction and then run your tests, does it > make a difference? > St.Ack > > O

Re: [ANN]: HBaseWD: Distribute Sequential Writes in HBase

2011-05-18 Thread Ted Yu
Alex: Can you summarize HBaseWD in your blog, including points 1 and 2 below ? Thanks On Wed, May 18, 2011 at 8:03 AM, Alex Baranau wrote: > There are several options here. E.g.: > > 1) Given that you have "original key" of the record, you can fetch the > stored record key from HBase and use it

RE: HBase Scability

2011-05-18 Thread Doug Meil
Hi there- Re: " When I started inserting data in the tables it seems that they are always inserting in a single region," You probably want to read this as a general warning... http://hbase.apache.org/book.html#timeseries .. and check this out as a potential solution for bucketing timeseries k

HBase Scability

2011-05-18 Thread Miguel Costa
Hi, I have three tables and I receive in one 1500 m/s and the other two about 500 m/s. My row key is based on time on the three tables. When I started inserting data in the tables it seems that they are always inserting in a single region, what is supposed to be normal based that the key i

Re: Port 0 being used when calling HBaseTestingUtility().startMiniCluster(1) on AWS

2011-05-18 Thread Ted Yu
Ian: Please take a look at https://issues.apache.org/jira/browse/HBASE-3794: +TEST_UTIL.getConfiguration().setInt("hbase.regionserver.port", 0); TestRegionServer rs = new TestRegionServer(TEST_UTIL.getConfiguration()); On Wed, May 18, 2011 at 2:23 PM, Stack wrote: > On Wed, May 18, 201

Re: Port 0 being used when calling HBaseTestingUtility().startMiniCluster(1) on AWS

2011-05-18 Thread Stack
On Wed, May 18, 2011 at 1:50 PM, Ian Stevens wrote: > Hi everyone. We had some tests which were using HBaseTestingUtility to start > a single node cluster. These worked fine on our desktops and testing > environments, but when we switched to running the tests on Amazon Web > Services, startMini

Re: [ANN]: HBaseWD: Distribute Sequential Writes in HBase

2011-05-18 Thread Weishung Chung
Thank you, I like the second option better to avoid the roundtrip to HBase. I am trying it out now. On Wed, May 18, 2011 at 10:03 AM, Alex Baranau wrote: > There are several options here. E.g.: > > 1) Given that you have "original key" of the record, you can fetch the > stored record key from HBa

Port 0 being used when calling HBaseTestingUtility().startMiniCluster(1) on AWS

2011-05-18 Thread Ian Stevens
Hi everyone. We had some tests which were using HBaseTestingUtility to start a single node cluster. These worked fine on our desktops and testing environments, but when we switched to running the tests on Amazon Web Services, startMiniCluster() raised a BindException: > [exec] u.startM

Re: Regions count is not consistant between the WEBUI and LoaderBalancer

2011-05-18 Thread Jean-Daniel Cryans
Can you run hbck? J-D 2011/5/17 bijieshan : > Yes, you're right. While count the .META., the result will exclude the -ROOT- > region and the .META. region. Pardon me ,I should not mention about that. > Maybe the less 2 region is just a coincidence here, I can show another > scenario about this

Re: A few issues we ran into the last couple of weeks.

2011-05-18 Thread Ted Yu
Vidhyashankar: table.getRegionsInfo() is for advanced users (such as you) :-) Anyway, we shouldn't enforce user to call it. On Wed, May 18, 2011 at 11:12 AM, Vidhyashankar Venkataraman < vidhy...@yahoo-inc.com> wrote: > Thanks Ted! Will do it right away. > > 1. we should provide the following new

Re: A few issues we ran into the last couple of weeks.

2011-05-18 Thread Vidhyashankar Venkataraman
Thanks Ted! Will do it right away. 1. we should provide the following new API where numOfRegions is the expected number of regions to go online: I used table.getRegionsInfo() to make sure all regions were online instead of this function. But that function requires apriori knowledge of the number

Re: A few issues we ran into the last couple of weeks.

2011-05-18 Thread Ted Yu
Vidhyashankar: Please file the following JIRAs: 1. we should provide the following new API where numOfRegions is the expected number of regions to go online: public boolean isTableAvailable(final byte[] tableName, int numOfRegions) throws IOException { 2. HBaseAdmin.createTableAsync() should c

Re: A few issues we ran into the last couple of weeks.

2011-05-18 Thread Stack
On Tue, May 17, 2011 at 4:25 PM, Vidhyashankar Venkataraman wrote: >   2. The master getting stuck unable to delete a WAL (I have seen this before > on this forum and a related JIRA on this one): We had worked around by > manually deleting a WAL. But during times when the master crashed during

Re: A few issues we ran into the last couple of weeks.

2011-05-18 Thread Vidhyashankar Venkataraman
As in, the use of isTableAvailable there indicates, a bulk load should happen only if all the regions are available. But that may not be the case since the function returns back true if even one region (regionCount.get()>0 check) is online. V On 5/17/11 7:14 PM, "Ted Yu" wrote: Did you mean

Re: Hbase 0.90.2 doesn't start on Ubuntu 11.04

2011-05-18 Thread Stack
Is this the issue: http://search-hadoop.com/m/4uDV51XrPxj/ipv6&subj=Re+HBase+Client+connect+to+remote+HBase? St.Ack On Tue, May 17, 2011 at 7:47 AM, Sergey Bartunov wrote: > I'd just installed new Ubuntu 11.04, downloaded hbase 0.90.2 and run > bin/start-hbase.sh > > It starts but not correctly,

Re: Mapreduce counters

2011-05-18 Thread Ophir Cohen
For jobs you create it sounds as a great idea. What about other jobs such as Hive/Pig jobs? Can anyone as any idea how it can be done on all MR jobs in a cluster no matter the triggering of the jobs? Ophir On Wed, May 18, 2011 at 7:09 PM, Joey Echeverria wrote: > Hi Ophir, > > That sounds like

Hbase 0.90.2 doesn't start on Ubuntu 11.04

2011-05-18 Thread Sergey Bartunov
I'd just installed new Ubuntu 11.04, downloaded hbase 0.90.2 and run bin/start-hbase.sh It starts but not correctly, i.e. I couldn't create new table from the shell. Everything had worked on Ubuntu 10.10. The error message from the logs: org.apache.hadoop.hbase.client.RetriesExhaustedException: F

Re: Mapreduce counters

2011-05-18 Thread Joey Echeverria
Hi Ophir, That sounds like a useful feature, maybe file a jira? I've never tried to save counters form the MR job into HBase, but you could pull it from the file as you said or from the Job object after waitForCompletion() returns by calling getCounters(). -Joey On Wed, May 18, 2011 at 8:21 AM,

Mapreduce counters

2011-05-18 Thread Ophir Cohen
Hi All, Currently MR job spilled his counters into file at the end of the run. Is there any built-in configuration/plug-in to make it store these counters into HBase as well? Sounds to me like a great feature! Does anybody did something similar? If you did, how did you do it? Run on directory an

Re: [ANN]: HBaseWD: Distribute Sequential Writes in HBase

2011-05-18 Thread Alex Baranau
There are several options here. E.g.: 1) Given that you have "original key" of the record, you can fetch the stored record key from HBase and use it to create Put with updated (or new) cells. Currently you'll need to use distributes scan for that, there's not analogue for Get operation yet (see h

Re: How to speedup Hbase query throughput

2011-05-18 Thread Stack
Are there more blocks on these hot DNs than there are on the cool ones? If you run a major compaction and then run your tests, does it make a difference? St.Ack On Tue, May 17, 2011 at 8:03 PM, Weihua JIANG wrote: > -ROOT- and .META. table are not served by these hot region servers. > > I gener

Re: Max Table Count

2011-05-18 Thread Stack
Its not the number of tables that is of import, its the number of regions. You can have your regions in as many tables as you like. I do not believe there a cost to having more tables. St.Ack On Wed, May 18, 2011 at 5:54 AM, Wayne wrote: > How many tables can a cluster realistically handle or

Re: HTable.put hangs on bulk loading

2011-05-18 Thread Stan Barton
stack-3 wrote: > > On Mon, May 16, 2011 at 4:55 AM, Stan Barton wrote: >>> Sorry.  How do you enable overcommitment of memory, or do you mean to >>> say that your processes add up to more than the RAM you have? >>> >> >> The memory overcommitment is needed because in order to let java still >>

Max Table Count

2011-05-18 Thread Wayne
How many tables can a cluster realistically handle or how many tables/node can be supported? I am looking for a realistic idea of whether a 10 node cluster can support 100 or even 500 tables. I realize it is recommended to have a few tables at most (and to use the row key to add everything to one t

Re: LZO Compression

2011-05-18 Thread Ferdy Galema
Not out of the box. I use the following resource for packaging lzo with the cloudera release: https://github.com/toddlipcon/hadoop-lzo-packager On 05/18/2011 08:35 AM, Pete Haidinyak wrote: Does the Cloudera VM have LZO data compression available? If not, since its a 32 bit system what's the be