RE: Why does the default hbase.hstore.compactionThreshold is 3?

2010-04-06 Thread Jonathan Gray
Shen, You are right. Currently the default flush size is 64MB, the compactionThreshold is 3, and the splitSize/max.filesize is 256MB. So we end up compacting into a 192MB file when filling an empty region. Take a look at HBASE-2375 (https://issues.apache.org/jira/browse/HBASE-2375). That is

Re: Why does the default hbase.hstore.compactionThreshold is 3?

2010-04-06 Thread Jean-Daniel Cryans
It does incremental compacting since you don't want to spend too much time doing the compactions, and you don't want to compact very large store files with much smaller ones (that would result in rewriting the same data x times per day). Looking at Store.compact, you can see this comment:

Why does the default hbase.hstore.compactionThreshold is 3?

2010-04-06 Thread ChingShen
Hi, I got when the menstore reaches a configurable size(64MB), it's flushed to HDFS, and create a new StoreFile, therefore, when these StoreFiles more than 3 files, they will be compacted to a single StoreFile. But, if the default hbase.hstore.compactionThreshold is 3, does it mean that a compa

Re: About test/production server configuration

2010-04-06 Thread Imran M Yousuf
Hi Jonathan, Thanks for your reply. Please find my replies inline. On Wed, Apr 7, 2010 at 4:04 AM, Jonathan Gray wrote: > Or if you have a budget in mind, we can help you determine what would be the > best way to allocate those dollars. > That would be just great. Budget provisioned for the wh

Re: how can I check the I/O influence HBase to HDFS

2010-04-06 Thread steven zhuang
hi, Jonathan, * * On Wed, Apr 7, 2010 at 6:15 AM, Jonathan Gray wrote: > Can you explain more about what information you are trying to find out? > > You had an existing HDFS and you want to measure the additional impact > adding HBase is? Is that in terms of reads/writes/iops or data size? > > *

Re: enabling hbase metrics on a running instance

2010-04-06 Thread Stack
Also need to do configuration in hbase/conf/hadoop-metrics.xml (yes, thats hadoop-metrics, not hbase-metrics) which I believe is only read on restart. So double-no. St.Ack On Tue, Apr 6, 2010 at 4:18 PM, Jean-Daniel Cryans wrote: > This boils down to the question: can you enable JMX while the J

Re: enabling hbase metrics on a running instance

2010-04-06 Thread Jean-Daniel Cryans
This boils down to the question: can you enable JMX while the JVM is running? The answer is no (afaik). More doc here http://java.sun.com/javase/6/docs/technotes/guides/management/agent.html J-D On Tue, Apr 6, 2010 at 4:12 PM, Igor Ranitovic wrote: > Is it possible to enable the hbase metrics

enabling hbase metrics on a running instance

2010-04-06 Thread Igor Ranitovic
Is it possible to enable the hbase metrics without a restart? Thanks. i.

RE: About test/production server configuration

2010-04-06 Thread Jonathan Gray
Or if you have a budget in mind, we can help you determine what would be the best way to allocate those dollars. > -Original Message- > From: Jonathan Gray [mailto:jg...@facebook.com] > Sent: Tuesday, April 06, 2010 3:11 PM > To: hbase-user@hadoop.apache.org > Subject: RE: About test/prod

RE: how can I check the I/O influence HBase to HDFS

2010-04-06 Thread Jonathan Gray
Can you explain more about what information you are trying to find out? You had an existing HDFS and you want to measure the additional impact adding HBase is? Is that in terms of reads/writes/iops or data size? If you have a steady-state set of metrics for HDFS w/o HBase, can you not just mon

RE: About test/production server configuration

2010-04-06 Thread Jonathan Gray
Imran, Have you run Solr atop HDFS? I doubt this will be performant. Also, to properly scope your cluster, you need to come up with actual number targets if you want to be able to accurately provision hardware. "not much" data now, but "lots" of data later could mean anything. Decide what yo

Re: hbase mapreduce scan

2010-04-06 Thread Jürgen Jakobitsch
hi, thanks for your inputs, i was asking with respect to do sparql queries over hbase tables. i have read that yahoo and other use hbase or bigtable for their searchresults and so i'm thinking of how to apply a sparql query - which is nothing else than a normal query - to hbase. openrdf's sail-

Re: hbase mapreduce scan

2010-04-06 Thread Jean-Daniel Cryans
Or put it in MySQL, or in S3, or...or... so my point was that you need a recipient that transcends the JVMs ;) So it is doable and pretty normal to output in tables the result of MRs that map other tables, we have dozens of those here at StumbleUpon. But if it fits in a single HashMap in a single

Re: DFS too busy/down? while writing back to HDFS.

2010-04-06 Thread Jean-Daniel Cryans
>From DataXceiver's javadoc /** * Thread for processing incoming/outgoing data stream. */ So it's a bit different from the handlers AFAIK. J-D On Mon, Apr 5, 2010 at 10:57 PM, steven zhuang wrote: > than, J.D. >          my cluster has the first problem. BTW, dfs.datanode.max.xcievers > mean

how can I check the I/O influence HBase to HDFS

2010-04-06 Thread steven zhuang
hi, there, I have this problem of checking the influence HBase brought to HDFS. I have a Hadoop cluster which has 30+ data nodes, and a Hbase cluster based on it, with 18 regionservers residing on 18 datanodes. we have observed the HDFS IO has increased a l

RE: hbase mapreduce scan

2010-04-06 Thread Michael Segel
J-D, There's an alternative... He could write a M/R that takes the input from a scan() , do something, reduce() and then output the reduced set back to hbase in the form of a temp table. (Even an in memory temp table) and then at the end pull the data out in to a hash table? In theory this

Re: More about LogFlusher

2010-04-06 Thread ChingShen
Thanks, Stack. But, I think I want to know what's different between SequenceFile.Writer.sync() and syncFs() (HDFS-200) ? Could someone tell me what does the HLog sync with something? Shen On Sat, Apr 3, 2010 at 3:13 AM, Stack wrote: > On Fri, Apr 2, 2010 at 10:59 AM, ChingShen > wrote: > >