RE: HBase performance tuning

2008-03-27 Thread Goel, Ankur
Thanks for the explanation Stack. Using my threaded client I got a throughput of 6000 inserts/sec. Let me use and modify the code you posted on wiki to see if I can get a better throughput. I'll write the list again once I have some performance data. -Ankur -Original Message- From:

Lost tables

2008-03-27 Thread cure
Hi I use hbase 0.2. devel from trunk and hadoop 0.17, and i lost all tables after restart hbase. I do : 1) start hadoop dfs 2) start hbase 3) create table X 4) make insert to table X 5) select from X - there are data inserted in 4 (everything is ok) 6) stop hbase 7) stop

Re: HBase performance tuning

2008-03-27 Thread stack
Looks like you are crawling the web. What crawler are you using? Could you write direct into hbase from the crawler? St.Ack Goel, Ankur wrote: Thanks for the explanation Stack. Using my threaded client I got a throughput of 6000 inserts/sec. Let me use and modify the code you posted on

Re: Lost tables

2008-03-27 Thread Bryan Duxbury
The pattern of events as you list them is the correct way to bring up and down an HBase cluster. Is this being run on a single node or multiple machines? What command are you using to start HBase? (bin/start-hbase.sh is what I use) Is there anything interesting in the HBase logs for either

RE: HBase performance tuning

2008-03-27 Thread Goel, Ankur
I am crawling the web indeed, but only the sites that are present in my seedlist. The crawler used here is heritrix 2.0 - http://webteam.archive.org/confluence/display/Heritrix/2.0.0. I developed a Heritrix specific HBase writer that can be integrated with Heritrix to write the crawled content

Re: HBase performance tuning

2008-03-27 Thread stack
I have some familiarity with that crawler. Tell us more about your writer. Is it proprietary? If not, can we get it into a place where others could use it if wanted? Thanks, St.Ack Goel, Ankur wrote: I am crawling the web indeed, but only the sites that are present in my seedlist. The

RE: HBase Sample Schemas

2008-03-27 Thread Goel, Ankur
Hi Bryan, Here is the sample schema I have (looks closer to RDBMS, I know) TABLE: seed_list DESCRIPTION: Used to store seed urls (both old and newly discovered). Initially populated with some seed URLs. The crawl controller picks up the seeds from