bulkloader zookeeper connectString

2011-05-24 Thread Geoff Hendrey
Hi - How can I pass the zookeeper connectString to the completebulkoad utility so that it will not try to use the default "localhost". -geof

Re: About RegionServer checkin

2011-05-24 Thread Stack
On Tue, May 24, 2011 at 9:27 PM, Gaojinchao wrote: > If Region server checkin after call this.serverManager.waitForRegionServers(). > It seems that regionServerReport shouldn't add Region server to onlineServers. > Otherwise The region may be opened again. > I don't follow? Usually what happens

About RegionServer checkin

2011-05-24 Thread Gaojinchao
If Region server checkin after call this.serverManager.waitForRegionServers(). It seems that regionServerReport shouldn't add Region server to onlineServers. Otherwise The region may be opened again. In my cluster: 2011-05-23 10:56:30,726 INFO org.apache.hadoop.hbase.master.HMaster: Master start

Re: HBase Transaction per second in Map-Reduce

2011-05-24 Thread Michel Segel
You could, but you don't really need to do that. Of course the size of your cache is tunable and based on how much memory you have. On a side note... You said you're on CDH3, it doesn't have co-processor support, unless they snuck it in at the last minute. What I. Think you need to do is to look

Re: One map task to two HFiles

2011-05-24 Thread Stack
On Tue, May 24, 2011 at 9:04 PM, Stack wrote: > Nothing fancier than open on map task init, append, append, append, > inside in the map, and then be sure to close it in the map close. > Study HFileOutputFormat I'd say. > But then also study the HFile loader script, LoadIncrementalHFiles. Notice h

Re: One map task to two HFiles

2011-05-24 Thread Stack
On Tue, May 24, 2011 at 2:26 PM, Jon Stewart wrote: > I have enough control over the keys for the primary table that the map > task could write rows to the primary table in order, making it > map-side only (assuming one HFile per task). The map task could then > emit KeyValue objects for the secon

Re: a question storefileIndexSize

2011-05-24 Thread Stack
Oh, I forgot about this suggestion: http://hbase.apache.org/book.html#keysize I mention it because it cites a study done by Marc Limotte where he had a similar relatively big storefile index and he dug in. You might be interested in how he did his research. St.Ack On Tue, May 24, 2011 at 8:57 PM

Re: a question storefileIndexSize

2011-05-24 Thread Stack
2011/5/24 Gaojinchao : > Stack, Thanks for your reply. > block size is default. > My Key length is 26 bytes and value is 300~400 bytes. > Is it big keys and small values ? > Looks like you have 'small' keys. It looks like the index is about 1MB per storefile (storefiles=3103, storefileIndexSize=3

Re: 0.90.3

2011-05-24 Thread Stack
Its different in 0.92.0 Jack. We'll use whatever the master tells us our name is, not what the regionserver finds for its name. St.Ack On Tue, May 24, 2011 at 7:03 PM, Jack Levin wrote: > "HBase uses the local hostname to self-report it's IP address." > > using 'hostname' as authoritative name f

Fw: SingleColumnValueFilter

2011-05-24 Thread hmchiud
Cool! Thanks you very much. I forgot to use regex | (or) new RegexStringComparator("^AA|^BB|^CC")) ; Fleming Chiu(邱宏明) Ext: 707-2260 Be Veg, Go Green, Save the Planet! - Forwarded by HMCHIUD(Fleming Chiu 邱宏明)/TWN on 2011/05/25 上午 11:01 -

Re: SingleColumnValueFilter

2011-05-24 Thread 梁景明
FilterList list = new FilterList(FilterList.Operator.MUST_PASS_ALL, fs); u can define a list like this 在 2011年5月25日 上午10:39, 写道: > > > Hi there, > > Can I set multiple SingleColumnValueFilter for a qualifier like following > sample codes? > The scan result is nothing, so I think whether the filt

SingleColumnValueFilter

2011-05-24 Thread hmchiud
Hi there, Can I set multiple SingleColumnValueFilter for a qualifier like following sample codes? The scan result is nothing, so I think whether the filters can combine these filter with "OR" instead of " AND". Any ideas? //==

Re: 0.90.3

2011-05-24 Thread Jack Levin
like 19:09:23 208.94.1.52 jack@zero:~ $ host 38.99.76.204 204.76.99.38.in-addr.arpa domain name pointer img646.imageshack.us. 19:10:26 208.94.1.52 jack@zero:~ $ This is the name I wanted it to use. It appears that with current setup, we can't change hostnames. -Jack On Tue, May 24, 2011 at 7:0

Re: 0.90.3

2011-05-24 Thread Jack Levin
"HBase uses the local hostname to self-report it's IP address." using 'hostname' as authoritative name for regionserver is what caused all of the confusion, hostname usually not governed by name resolution (/etc/hosts, dns), some users may call their servers something other than whats in dns, so

Re: a question storefileIndexSize

2011-05-24 Thread Gaojinchao
Stack, Thanks for your reply. block size is default. My Key length is 26 bytes and value is 300~400 bytes. Is it big keys and small values ? -邮件原件- 发件人: saint@gmail.com [mailto:saint@gmail.com] 代表 Stack 发送时间: 2011年5月25日 1:01 收件人: user@hbase.apache.org 主题: Re: a question storefileI

Re: 0.90.3

2011-05-24 Thread Jean-Daniel Cryans
Zookeeper doesn't query addresses, it's all done in HBase which in turn stores it in ZK. Also http://hbase.apache.org/book.html#dns J-D On Tue, May 24, 2011 at 4:37 PM, Jack Levin wrote: > figured it out... the /etc/hosts file has ip to name, was used by > zookeeper was *.prod.imageshack.com, w

Re: 0.90.3

2011-05-24 Thread Jack Levin
Then I recommend scratching hostname use in leu of reverse lookup only -Jack On May 24, 2011, at 5:45 PM, Andrew Purtell wrote: >> From: Jack Levin >> figured it out... the /etc/hosts file has ip to name, was used by >> zookeeper was *.prod.imageshack.com, while hostname was >> imgXX.imagesha

Re: 0.90.3

2011-05-24 Thread Andrew Purtell
> From: Jack Levin > figured it out... the /etc/hosts file has ip to name, was used by > zookeeper was *.prod.imageshack.com, while hostname was > imgXX.imageshack.us... use by Regionserver/Master -  Ideally, all > three components should source hostnames form same place, whether its > hostname or

Re: Any trigger like facility for HBase tables

2011-05-24 Thread Andrew Purtell
For coprocessors you need to use trunk. - Andy --- On Tue, 5/24/11, Ted Yu wrote: > From: Ted Yu > Subject: Re: Any trigger like facility for HBase tables > To: user@hbase.apache.org > Cc: billgra...@gmail.com > Date: Tuesday, May 24, 2011, 1:48 PM > I don't think so. > > On Tue, May 24, 2

Re: 0.90.3

2011-05-24 Thread Jack Levin
figured it out... the /etc/hosts file has ip to name, was used by zookeeper was *.prod.imageshack.com, while hostname was imgXX.imageshack.us... use by Regionserver/Master - Ideally, all three components should source hostnames form same place, whether its hostname or /etc/hosts (or dns), etc... i

Re: HBase Transaction per second in Map-Reduce

2011-05-24 Thread Himanish Kushary
To understand that whether out HBase instance is slow or is at par with industry standards/known implementations I was looking for some article,stats or paper for HBase TPS and performance.So the initial question :-( It seems to be slow as per our SLA of processing the data we get. On Tue, May 24

Re: 0.90.3

2011-05-24 Thread Jack Levin
img645.prod.imageshack.us and img645.imageshack.us are both point to the same IP. -Jack On Tue, May 24, 2011 at 3:50 PM, Jack Levin wrote: > looks like our balancer is on: > > hbase(main):001:0> balance_switch true > true > 0 row(s) in 0.3700 seconds > > I simply kill PID for RS, and it stays on

Re: 0.90.3

2011-05-24 Thread Jack Levin
looks like our balancer is on: hbase(main):001:0> balance_switch true true 0 row(s) in 0.3700 seconds I simply kill PID for RS, and it stays on the list with regions assigned, and master does not know about it. So it still does not work. -Jack On Tue, May 24, 2011 at 3:43 PM, Dave Latham wrot

Re: 0.90.3

2011-05-24 Thread Dave Latham
Are you using the graceful_stop script? In 0.90.3 the bin/graceful_stop.sh script was updated to disable the master's balancer. However, it doesn't seem that anything re-enables it, so if you're using it you need to re-enable it on your own. See the book for more details: http://hbase.apache.org

Re: 0.90.3

2011-05-24 Thread Ted Yu
What's the relationsjp between img645.imageshack.usand img645.prod.imageshack.com ? On Tue, May 24, 2011 at 3:33 PM, Jack Levin wrote: > just put new hbase version on our test cluster. and been testing it... > so far if I shutdown an RS, master does not reassi

0.90.3

2011-05-24 Thread Jack Levin
just put new hbase version on our test cluster. and been testing it... so far if I shutdown an RS, master does not reassign its regions, and we remain inconsistent forerver, likewise when new RS is up, it does not get regions assigned to it, this is the master log: 2011-05-24 15:30:57,724 DEBUG o

One map task to two HFiles

2011-05-24 Thread Jon Stewart
I have a map task that's extracting documents from a flat file and writing them into an HBase table as individual records; the key is based off the path of the file (idempotent) but balances key-space distribution with locality of reference. Additionally, I have a secondary table where the key is t

Re: HBase Transaction per second in Map-Reduce

2011-05-24 Thread Stack
Figure first what is slow before adding yet more stuff On May 24, 2011, at 14:06, Himanish Kushary wrote: > Don't worry..its ok...i am going through one of those days for the last few > days :-) > > Jokes apart, when you talk about Caching, could we put something like > ehCache in-front of hb

Re: HBase Transaction per second in Map-Reduce

2011-05-24 Thread Himanish Kushary
Don't worry..its ok...i am going through one of those days for the last few days :-) Jokes apart, when you talk about Caching, could we put something like ehCache in-front of hbase integrated like a level-2 cache on top of the already provided block cache ? - Himanish On Tue, May 24, 2011 at 3:1

Re: Any trigger like facility for HBase tables

2011-05-24 Thread Ted Yu
I don't think so. On Tue, May 24, 2011 at 1:45 PM, Himanish Kushary wrote: > Thanks Ted and Bill. Will take a look into both of these. I am using CDH3, > does it have co-processors ? > > On Tue, May 24, 2011 at 3:24 PM, Bill Graham wrote: > > > As well as > > http://www.lilyproject.org/lily/abou

Re: Any trigger like facility for HBase tables

2011-05-24 Thread Himanish Kushary
Thanks Ted and Bill. Will take a look into both of these. I am using CDH3, does it have co-processors ? On Tue, May 24, 2011 at 3:24 PM, Bill Graham wrote: > As well as > http://www.lilyproject.org/lily/about/playground/hbaserowlog.html > > I'd like to hear if anyone has had good or bad experien

Re: Any trigger like facility for HBase tables

2011-05-24 Thread Bill Graham
As well as http://www.lilyproject.org/lily/about/playground/hbaserowlog.html I'd like to hear if anyone has had good or bad experiences using either of these techniques, as we'll soon have a need to implement update notifications as well. On Tue, May 24, 2011 at 11:31 AM, Ted Yu wrote: > Tak

RE: HBase Transaction per second in Map-Reduce

2011-05-24 Thread Michael Segel
Sorry, Its been one of those days. > From: michael_se...@hotmail.com > To: user@hbase.apache.org > Subject: RE: HBase Transaction per second in Map-Reduce > Date: Tue, 24 May 2011 14:18:28 -0500 > > > > Himanish, > > Are we talking about an African or Eur

RE: HBase Transaction per second in Map-Reduce

2011-05-24 Thread Michael Segel
Himanish, Are we talking about an African or European Swallow? (Sorry its a reference to the Monty Python movie scene where they cross the bridge after being asked 3 questions which they must answer correctly? [What's the forward air speed velocity of an unladen swallow?]) The point is that y

Re: HBase Transaction per second in Map-Reduce

2011-05-24 Thread Stack
See http://hbase.apache.org/book.html#performance St.Ack On Tue, May 24, 2011 at 11:31 AM, Himanish Kushary wrote: > Hi, > > Could anybody please point to some article or paper which can give an > understanding of the transaction per second (both read and write) that is > supported or seen to be

HBase Transaction per second in Map-Reduce

2011-05-24 Thread Himanish Kushary
Hi, Could anybody please point to some article or paper which can give an understanding of the transaction per second (both read and write) that is supported or seen to be accomplished using HBase Map-Reduce. We have written few HBase Map-reduces which are not giving us the desired/expected perfo

Re: Any trigger like facility for HBase tables

2011-05-24 Thread Ted Yu
Take a look at http://hbaseblog.com/2010/11/30/hbase-coprocessors/ On Tue, May 24, 2011 at 11:28 AM, Himanish Kushary wrote: > Hi, > > Is there any trigger-like facility in HBase. I would like to get notified > about any data update/insert into a HBase table and fire up a map-reduce > based on th

Any trigger like facility for HBase tables

2011-05-24 Thread Himanish Kushary
Hi, Is there any trigger-like facility in HBase. I would like to get notified about any data update/insert into a HBase table and fire up a map-reduce based on that update/insert event on the newly inserted or updated data. Any framework which supports this for HBase ? Could somebody please sugg

Modeling suggestions

2011-05-24 Thread James Pettyjohn
I am planning out a central database for contact information, invoices and a bunch of other domain specific information that well be coming from hundreds geographically disparate locations. With the requirements of having every change ever made kept forever I wanted this in Hadoop/HBASE but am no

Re: HBase jmx stats

2011-05-24 Thread Ted Dunning
In case anybody wants estimates of medians, Mahout has some easily extractable code to compute medians and first and third quartiles without keeping lots of data around. As a side effect, it computes averages and standard deviations as well. I don't think that such a small thing as this warrants

Re: HBase jmx stats

2011-05-24 Thread Stack
Tim: You should get a copy of Lars' book. Then you'd know what the below are (smile). Quoting: "A commonly used metric in HBase is called time varying rate, which not only tracks the number of events, but also how long each event took to complete. A TVR exposes four values. [Below] shows the

Re: HBase Not Starting after improper shutdown

2011-05-24 Thread Stack
On Tue, May 24, 2011 at 7:19 AM, Himanish Kushary wrote: > The Region Server logs also shows the same -ROOT- Region not online error. > The above does not give us any information that we can use to help us diagnose your issue. Can you pastebin the master log? Yours, St.Ack

Re: Region split behavior

2011-05-24 Thread Stack
Upgrade your hbase to 0.90.3 (And your CDH to the released version). The issue 'HBASE-3586 Improve the selection of regions to balance' should help. It does a more random assignment which should help undo some of the table clumping you are seeing. That said, others have been observing that the

Re: a question storefileIndexSize

2011-05-24 Thread Stack
What Ted says or you could change the hfile block size; currently its 64k. Make it bigger? Do you have big keys and small values? If so, can you make do with smaller keys? That would help with index size too. St.Ack On Tue, May 24, 2011 at 5:29 AM, Gaojinchao wrote: > My observation is that

Re: a question storefileIndexSize

2011-05-24 Thread Ted Yu
See https://issues.apache.org/jira/browse/HBASE-3857 and https://issues.apache.org/jira/browse/HBASE-3856 Cheers On Tue, May 24, 2011 at 5:29 AM, Gaojinchao wrote: > My observation is that storefileIndexSize is large. > Is there a way to reduce it ? > > Region server metric: > requests=11447, r

Re: about TestRollingRestart

2011-05-24 Thread Stack
Gao: Check out our hudson build. Look there for TestRollingRestart failures. Usually it passes. See here https://builds.apache.org/hudson/job/HBase-TRUNK/ for TRUNK. And here for branch: https://builds.apache.org/hudson/view/G-L/view/HBase/job/hbase-0.90/. Can you figure what its waiting on

RE: How to compile simple Hbase code ?

2011-05-24 Thread Buttler, David
You should use a tool like ant or maven for configuring your classpath and compiling code. Ant allows you to specify a directory hierarchy -- like ${HADOOP_HOME}/lib Dave -Original Message- From: praveenesh kumar [mailto:praveen...@gmail.com] Sent: Tuesday, May 24, 2011 3:57 AM To: us

Re: hbase handle requests based on geo locations?

2011-05-24 Thread Stack
There is nothing built-in. You will have to add it yourself. Some folks have had some success building geohash-based apps on hbase (purportedly). St.Ack On Mon, May 23, 2011 at 11:13 PM, elton sky wrote: >  I wonder if hbase has some inbuilt mechanism to handle request based on > geographic of u

Region split behavior

2011-05-24 Thread Kleegrewe, Christian
Dear all, We have a small test cluster with 5 nodes, 1 master and 4 datanodes. The nodes are installed with Ubuntu desktop 10.10, hadoop version 'Hadoop 0.20.2-CDH3B4' and hbase version 0.90.1-CDH3B4. The hbase database is well balanced and contains one table (TAB_1) containing 270.000.000 data

Re: HBase Not Starting after improper shutdown

2011-05-24 Thread Himanish Kushary
The Region Server logs also shows the same -ROOT- Region not online error. On Mon, May 23, 2011 at 1:10 PM, Bill Graham wrote: > Is there anything meaningful in the RS logs? I've seen situations like this > where a RS is failing to start due to issues reading the WAL. If this is > the > case it

Log4j changes not working inside static mapper and reducer classes

2011-05-24 Thread Himanish Kushary
Hi, I have enabled debug for my Map-Reduce package inside the log4j.properties under the $HADOOP_HOME/conf directory (using CDH3). log4j.logger.com.himanish.analytics.mapreduce=DEBUG The logging messages are getting logged for the main enclosing Map-Reduce job class but not for the static Mapper

a question storefileIndexSize

2011-05-24 Thread Gaojinchao
My observation is that storefileIndexSize is large. Is there a way to reduce it ? Region server metric: requests=11447, regions=10394, stores=10394, storefiles=3103, storefileIndexSize=3717, memstoreSize=1002, compactionQueueSize=1234, flushQueueSize=0, usedHeap=6916, maxHeap=8165, blockCacheSize

Re: How to compile simple Hbase code ?

2011-05-24 Thread praveenesh kumar
Okie what i did is added the folder containing my .class file in the classpath, along with commons-logging-1.0.4.jar and log4j-1.2.15.jar in my classpath: so now Myclasspath variable looks like : MYCLASSPATH="/usr/local/hadoop/hadoop/hadoop-0.20.2-core.jar:/usr/local/hadoop/hbase/hbase/hbase-0.

Re: How to compile HBase code ?

2011-05-24 Thread Harsh J
Praveenesh, HBase has their own user mailing lists where such queries ought to go. Am moving the discussion to user@hbase.apache.org and bcc-ing common-user@ here. Also added you to cc. Regarding your first error, going forward you can use the useful `hbase classpath` to generate a HBase-provided

Re: How to compile simple Hbase code ?

2011-05-24 Thread praveenesh kumar
which class is not in classpath ??? I already put hadoop-core.jar , hbase-core.jar and zookeeper.jar in my classpaths... What else I need to put in my classpath ??? On Tue, May 24, 2011 at 3:48 PM, Vivek Mishra wrote: > It means Class is not in your classpath. > > -Original Message- > Fro

RE: How to compile simple Hbase code ?

2011-05-24 Thread Vivek Mishra
It means Class is not in your classpath. -Original Message- From: praveenesh kumar [mailto:praveen...@gmail.com] Sent: Tuesday, May 24, 2011 3:31 PM To: user@hbase.apache.org Subject: How to compile simple Hbase code ? I am simply using HBase API, not doing any Map-reduce work on it. Fol

How to compile simple Hbase code ?

2011-05-24 Thread praveenesh kumar
I am simply using HBase API, not doing any Map-reduce work on it. Following is the code I have written , simply creating the file on HBase: import java.io.IOException; import org.apache.hadoop.hbase.HBaseConfiguration; import org.apache.hadoop.hbase.HColumnDescriptor; import org.apache.hadoop.hba

about TestRollingRestart

2011-05-24 Thread Gaojinchao
hbase.master.assignment.timeoutmonitor.timeout should be set higher in TestRollingRestart case. It is killed sometimes when we run all case. This is my analysis,. Is there anyone who encounter? Logs: // reassigned root and meta when regionsever has hutdowed 2011-05-24 09:09:32,989 DEBUG [MASTER