Tweaking HBase splits

2011-05-11 Thread Mayuresh
Hi, I have a question on how the splits work on hbase. I have one master which also acts as a region server, along with other 3 region servers. I have set the following parameters on all the region servers hbase.hregion.max.filesize 1048576 Maximum HStoreFile size. If any one

Re: Very slow Scan performance using Filters

2011-05-11 Thread Ryan Rawson
Scans are in serial. To use DB parlance, consider a Scan + filter the moral equivalent of a "SELECT * FROM <> WHERE col='val'" with no index, and a full table scan is engaged. The typical ways to help solve performance issues are such: - arrange your data using the primary key so you can scan the

Re: Very slow Scan performance using Filters

2011-05-11 Thread Connolly Juhani
By naming rows from the timestamp the rowids are going to all be sequential when inserting. So all new inserts will be going into the same region. When checking the last 30 days you will also be reading from the same region where all the writing is happening, i.e the one that is already busy writin

Very slow Scan performance using Filters

2011-05-11 Thread Himanish Kushary
Hi, We have a table split across multiple regions(approx 50-60 regions for 64 MB split size) with rowid schema as [ReverseTimestamp/itemtimestamp/customerid/itemid].This stores the activities for an item for a customer.We have lots of data for lots of item for a custoer in this table. When we try

Hardware configuration for a pure-Hbase cluster

2011-05-11 Thread Miles Spielberg
We're planning out our first Hbase cluster, and we'd like to get some feedback on our proposed hardware configuration. We're intending to use this cluster purely for Hbase; it will not generally be running MapReduce jobs, nor will we be using HDFS for other storage tasks. In addition, our projec

Re: any static column name behavior in hbase? (ie. not storing column name per row)

2011-05-11 Thread Bill Graham
HBase will always need to store the column name in each cell that uses it. The only way to reduce the size taken by storing repeated column names (besides using compression) is to instead store a small pointer to a lookup table that holds the column name. Check out OpenTSDB, which does something si

any static column name behavior in hbase? (ie. not storing column name per row)

2011-05-11 Thread Hiller, Dean x66079
I like how I can have X columns in a row that varies from another row. I am wondering if there is a way to have hbase have "static" column names(for lack of a better term) where the column names don't take up space for each row I add to my database. It just would be nice to have a significantl

Master crash during assignment.

2011-05-11 Thread Vidhyashankar Venkataraman
The master of my Hbase instance (0.90.x) crashes each time it is restarted, with the exceptions shown below. Can you let me know what this is usually due to? (I also saw these exceptions in a JIRA but they were about uncaught EOF exception). Only the master dies while the region servers wait for

Re: Data retention in HBase

2011-05-11 Thread Ophir Cohen
Thanks for the comments, Going to work on it tomorrow - I'll keep you updated. Ophir On Wed, May 11, 2011 at 8:01 PM, Stack wrote: > On Wed, May 11, 2011 at 6:14 AM, Ophir Cohen wrote: > > My results from today's researches: > > > > I tried to delete region as Stack suggested: > > > > 1. *cl

txntype:unknown reqpath:n/a Error Path:/hbase Error:KeeperErrorCode = NodeExists for /hbase

2011-05-11 Thread Άρμεν Αρσακιάν
I installed hbase on my Mac Os 10.6 machine and when i try to run hbase master start I get the following error: my error is similar if not the same with the following thread http://article.gmane.org/gmane.comp.java.hadoop.hbase.user/17432/match=got+user+level+keeperexception+processing+sessionid

Re: Error of "Got error in response to OP_READ_BLOCK for file"

2011-05-11 Thread Stanley Xu
And another question, shall I use hbase 0.20.6 if I used the append branch of hadoop? 在 2011-5-11 上午12:51,"Jean-Daniel Cryans" 写道: > Data cannot be corrupted at all, since the files in HDFS are immutable > and CRC'ed (unless you are able to lose all 3 copies of every block). > > Corruption would h

Re: HBase filtered scan problem

2011-05-11 Thread Stack
On Wed, May 11, 2011 at 2:05 AM, Iulia Zidaru wrote: >  Hi, > I'll try to rephrase the problem... > We have a table where we add an empty value.(The same thing happen also if > we have a value). > Afterward we put a value inside.(Same put, just other value). When scanning > for empty values (first

Re: Data retention in HBase

2011-05-11 Thread Stack
On Wed, May 11, 2011 at 6:14 AM, Ophir Cohen wrote: > My results from today's researches: > > I tried to delete region as Stack suggested: > >   1. *close_region* >   2. Remove files from file system. >   3. *assign* the region again. > Try inserting something into that region and then getting it

Re: ArrayIndexOutOfBoundsException in FSOutputSummer.write()

2011-05-11 Thread Stack
I have not seen this before. You are failing because of java.lang.ArrayIndexOutOfBoundsException in org.apache.hadoop.fs.FSOutputSummer.write(FSOutputSummer.java:83). Tell us more about your context. Are you using compression? What kind of hardware, operating system (I'm trying to figure what is

Re: Error of "Got error in response to OP_READ_BLOCK for file"

2011-05-11 Thread Stanley Xu
Dear all, I just checked our log today. And found the following logs 2011-05-11 16:46:06,258 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving block blk_7212216405058183301_3974453 src: /10.0.2.39:60393 dest: /10.0.2.39:50010 2011-05-11 16:46:14,716 INFO org.apache.hadoop.hdfs.serv

Re: What is the recommended number of zookeeper server on 11 nodes cluster

2011-05-11 Thread Bennett Andrews
Running 2 ZooKeeper's isn't a good idea as it doesn't handle any server failure. ZooKeeper needs a majority of nodes in the ensemble to be available to handle failures. So 1,3,5 are better choices. See: http://wiki.apache.org/hadoop/ZooKeeper/FAQ#A7 http://zookeeper.apache.org/doc/r3.3.3/zookeepe

RE: What is the recommended number of zookeeper server on 11 nodes cluster

2011-05-11 Thread Doug Meil
See the performance section of the HBase book. http://hbase.apache.org/book.html#performance -Original Message- From: Ferdy Galema [mailto:ferdy.gal...@kalooga.com] Sent: Wednesday, May 11, 2011 10:25 AM To: user@hbase.apache.org Cc: byambajargal; cdh-u...@cloudera.org Subject: Re: What

How could I make sure the famous "xceiver" parameters works in the data node?

2011-05-11 Thread Stanley Xu
Dear all, We are using hadoop 0.20.2 with a couple of patches, and hbase 0.20.6, when we are running a MapReduce job which contains a lots of random access to a hbase table. We met a lot of logs like the following at the same time in the region server and data node: For RegionServer: "INFO org.ap

Re: What is the recommended number of zookeeper server on 11 nodes cluster

2011-05-11 Thread Ferdy Galema
A rowcounter is a scan job, so you should use hbase.client.scanner.caching for better scan performance. (Depending on your value sizes, set to 1000 or something like that). For us, 1 zookeeper is able to manage our 15node cluster perfectly fine. On 05/11/2011 02:40 PM, byambajargal wrote: Hel

Re: Scans on salted rowkeys

2011-05-11 Thread Ted Yu
See '[ANN]: HBaseWD: Distribute Sequential Writes in HBase' thread. https://github.com/sematext/HBaseWD On Wed, May 11, 2011 at 2:21 AM, Felix Sprick wrote: > Hi guys, > > I am using rowkeys with a pattern like [minute]_[timestamp] because my > main use case is to read time ranges over a couple

ArrayIndexOutOfBoundsException in FSOutputSummer.write()

2011-05-11 Thread Chris Bohme
Dear community, We are doing a test on a 5 node cluster with a table of about 50 million rows (writes and reads). At some point we end up getting the following exception on 2 of the region servers: 2011-05-11 14:18:28,660 INFO org.apache.hadoop.hbase.regionserver.Store: Started compaction o

Re: Data retention in HBase

2011-05-11 Thread Ophir Cohen
My results from today's researches: I tried to delete region as Stack suggested: 1. *close_region* 2. Remove files from file system. 3. *assign* the region again. It looks like it works! The region still exists but its empty. Looks good but definitely not the end of the way. In order t

What is the recommended number of zookeeper server on 11 nodes cluster

2011-05-11 Thread byambajargal
Hello everybody I have run a cluster with 11 nodes hbase CDH3u0 and i have 2 zookeeper server in my cluster It seems very slowly when i run the rowcounter example my question is what is the recommended number of zookeeper server should i run for 11 nodes cluster cheers Byambajargal

Re: Mapping "Object-HBase data" Framework!

2011-05-11 Thread Frédéric Fondement
Le 10/05/11 11:34, Kobla Gbenyo a écrit : Hello, I am new at this list and I start testing HBase. I download and install HBase successfully and now I am looking for a framework which can help me performing CRUD operations (create, read, update and delete). Through my research, I found JDO but

Aw: Re: Lost hbase table after restart

2011-05-11 Thread Frank Kloeker
Hi Andrew, You're right. I try to upgrade to the latest version. Frank - Original Nachricht Von: Andrew Purtell An: user@hbase.apache.org Datum: 11.05.2011 11:10 Betreff: Re: Lost hbase table after restart > Hi, > > HBase 0.20.4 is very much out of date. It was released o

Scans on salted rowkeys

2011-05-11 Thread Felix Sprick
Hi guys, I am using rowkeys with a pattern like [minute]_[timestamp] because my main use case is to read time ranges over a couple of hours and I want to read in parallel from as many nodes in the cluster as possible, thus, distributing the data in minute buckets across the cluster. Problem now i

Re: Lost hbase table after restart

2011-05-11 Thread Andrew Purtell
Furthermore, be sure to read about what HBase 0.90.x requires: http://hbase.apache.org/notsoquick.html#requirements Best regards, - Andy --- On Wed, 5/11/11, Andrew Purtell wrote: > From: Andrew Purtell > Subject: Re: Lost hbase table after restart > To: user@hbase.apache.org > Date: Wed

Re: Lost hbase table after restart

2011-05-11 Thread Andrew Purtell
Hi, HBase 0.20.4 is very much out of date. It was released on 10 May 2010. The current release is 0.90.2, released on 11 April 2011. Why are you using such an out of date version? For many many reasons you should be using the latest 0.90.x version. Best regards, - Andy --- On Wed, 5/11/1

Lost hbase table after restart

2011-05-11 Thread Frank Kloeker
Hadoop: 0.20.2 Hbase: 0.20.4, r941076 Hi, I've running a hbase table on 4 regionservers with 4 datanodes back cross on the same machines. After restart hbase and hadoop I've lost the one and only hbase table. The hbase-master-log says: 2011-04-16 15:45:03,411 INFO org.apache.hadoop.hbase.mas

Re: HBase filtered scan problem

2011-05-11 Thread Iulia Zidaru
Hi, I'll try to rephrase the problem... We have a table where we add an empty value.(The same thing happen also if we have a value). Afterward we put a value inside.(Same put, just other value). When scanning for empty values (first values inserted), the result is wrong because the filter gets

Re: any performance results of transferring tera bytes from db to hbase?

2011-05-11 Thread Todd Lipcon
Last week I loaded ~1TB on a 100 node cluster in about 6 hours. In this case the dataset was made of rows each with 50 columns at about 12 bytes each (12 byte qualifier, empty value). This was not using the bulk load API, which in my experience is at least 10x faster than using the normal API. The

re: A question about client

2011-05-11 Thread Gaojinchao
Sorry, give other information: Ycsb don't share the HTables. One thread has a Htable instance -邮件原件- 发件人: Ted Yu [mailto:yuzhih...@gmail.com] 发送时间: 2011年5月11日 10:31 收件人: user@hbase.apache.org 主题: Re: A question about client I think the second explanation is plausible. From http://dow