RE: Split META manually

2010-03-11 Thread Jonathan Gray
Fleming, We're looking at a few different ideas for this problem right now. One is to make an efficient method for warming up a clients META cache by issuing a META scan for a single table or all tables. This will be significantly faster than lots of gets. The other bigger change is that META

Re: when Hbase open a region, what does it do? problem with super big row(1.5GB data in one row)

2010-03-11 Thread Stack
On Thu, Mar 11, 2010 at 10:18 PM, Yi Liang wrote: > Hi St.Ack, > > Can hbase-1537 applied to 0.20.3? It should be very useful, but the patch > can't compile Scan.java and HRegion.java for me. > I took a quick look. It looks like it wouldn't take too much massaging getting the patch to apply to t

Re: Split META manually

2010-03-11 Thread y_823910
It doesn't work! I would like META table can split by table name. If I have 10 tables, there are 10 regions in META that can be dispatched to different region server, so I can start many clients concurrently read different tables' data without META's bottleneck. I had started 2000 concurrent hbase

Re: when Hbase open a region, what does it do? problem with super big row(1.5GB data in one row)

2010-03-11 Thread Yi Liang
Hi St.Ack, Can hbase-1537 applied to 0.20.3? It should be very useful, but the patch can't compile Scan.java and HRegion.java for me. Thanks, Yi On Tue, Mar 9, 2010 at 2:26 PM, Stack wrote: > On Mon, Mar 8, 2010 at 6:58 PM, William Kang > wrote: > > Hi, > > Can you give me some more details ab

Re: Split META manually

2010-03-11 Thread Stack
Why split .META.? I'm not sure it works properly so would advise against it (We don't have tests in place for that... we've not been too concerned about it up to this since your install would have to be massive for .META. to split). St.Ack 2010/3/11 : > Hi there, > > I want to split META table m

Re: lucene index{reader|writer} on hbase (GSOC idea?)

2010-03-11 Thread Kay Kay
On 03/11/2010 07:56 AM, Stack wrote: Whats this look like Kay Kay? Implementations of IndexReader and IndexWriter? Yes, true to begin with, primarily applicable for the indexing phase, and an hbase uri that can be exported to katta , eventually ( like the hdfs uri, that is being currently

Re: importing into hbase 0.20.4

2010-03-11 Thread Jean-Daniel Cryans
Well it's hard to go through this path since it's not released yet ;) So we didn't change the file format in 0.20.4 (and probably won't for 0.21 too FWIW), so the only thing you have to do is to deploy the release somewhere and replace hbase-site.xml + hbase-env.sh, then you can restart HBase and

Split META manually

2010-03-11 Thread y_823910
Hi there, I want to split META table manually but I wonder how to set the optional Region Key in the webpage. (using the value like BIG_TABLE,FRPFXRD_NF61904-0 1.001.Main.0,1268309701214) BIG_TABLE,FRPFXRD_NF61904-0 column=info:regioninfo, timestamp=1268309711446, value=REGION => {NAME => 'BIG_TA

importing into hbase 0.20.4

2010-03-11 Thread Ted Yu
Hi, We may upgrade to hbase 0.20.4 after it is released. This means we will have to export from hbase 0.20.1 and import into hbase 0.20.4 Has anybody gone through this path ? Thanks

RE: Live table switching

2010-03-11 Thread Rodrick Megraw
Makes sense. No magic, but not hard to implement. For now I'm only dealing with a single client, so I dodge the issue you point out about many simultaneous client hits. Thanks much. > Date: Thu, 11 Mar 2010 14:23:04 -0800 > Subject: Re: Live table switching > From: st...@duboce.net > To: hbase-

RE: region server appearing twice on HBase Master page

2010-03-11 Thread Michael Segel
> Date: Thu, 11 Mar 2010 13:55:20 -0800 > Subject: Re: region server appearing twice on HBase Master page > From: st...@duboce.net > To: hbase-user@hadoop.apache.org > > On Thu, Mar 11, 2010 at 1:48 PM, Michael Segel > wrote: > > > > Hey! > > The patch appears to be working, but can anyone giv

RE: [databasepro-48] HUG9

2010-03-11 Thread Jonathan Gray
Pardon the link vomit, hopefully this comes across okay... HBase Project Update by Jonathan Gray http://wiki.apache.org/hadoop/HBase/HBasePresentations?action=AttachFile&do= get&target=HUG9_HBaseUpdate_JonathanGray.pdf HBase and HDFS by Todd Lipcon of Cloudera http://wiki.apache.org/hadoop/HBas

FW: [databasepro-48] HUG9

2010-03-11 Thread Jonathan Gray
For anyone not in the bay area, we had HUG9 last night. Links to the presentations below. JG From: databasepro-48-annou...@meetup.com [mailto:databasepro-48-annou...@meetup.com] On Behalf Of Jonathan Gray Sent: Thursday, March 11, 2010 1:57 PM To: databasepro-48-annou...@meetup.com Subject

Re: Live table switching

2010-03-11 Thread Stack
On Thu, Mar 11, 2010 at 2:10 PM, Rodrick Megraw wrote: > > I am building a web service that looks up data > from several HBase tables and returns a result. There will be an hourly > batch process that generates new versions of the tables, and, when > they’re available, the web service should switc

Re: TableMapReduceUtil class question

2010-03-11 Thread vipul sharma
Thanks! I was looking in the wrong place. It is still active in Hbase.Mapreduce. On Thu, Mar 11, 2010 at 2:14 PM, Jean-Daniel Cryans wrote: > The whole mapred package was deprecated just like in hadoop 0.20 and > the new package to use is now mapreduce, although I would recommend > still using it

Re: TableMapReduceUtil class question

2010-03-11 Thread Jean-Daniel Cryans
The whole mapred package was deprecated just like in hadoop 0.20 and the new package to use is now mapreduce, although I would recommend still using it as we don't plan on removing it for 0.21 J-D On Thu, Mar 11, 2010 at 2:12 PM, vipul sharma wrote: > TableMapReduceUtil seems to have been deprec

TableMapReduceUtil class question

2010-03-11 Thread vipul sharma
TableMapReduceUtil seems to have been deprecated. What is its replacement. I want to use it to set mapper like this TableMapReduceUtil.initTableMapperJob(HTable table,Get get, Mapper.class, Text.class, Text.class, job); -- Vipul Sharma sharmavipul AT gmail DOT com

Live table switching

2010-03-11 Thread Rodrick Megraw
Hi, I am building a web service that looks up data from several HBase tables and returns a result. There will be an hourly batch process that generates new versions of the tables, and, when they’re available, the web service should switch over to using them. The switchover cannot introduce any

RE: Trying to understand the results from the status command in hbase shell

2010-03-11 Thread Michael Segel
Ah, Ok said that way, it makes sense. Thx > Date: Thu, 11 Mar 2010 13:56:28 -0800 > Subject: Re: Trying to understand the results from the status command in > hbaseshell > From: jdcry...@apache.org > To: hbase-user@hadoop.apache.org > > That's the average region load, in this case you h

Re: Trying to understand the results from the status command in hbase shell

2010-03-11 Thread Jean-Daniel Cryans
That's the average region load, in this case you have exactly 75 regions. J-D On Thu, Mar 11, 2010 at 1:55 PM, Michael Segel wrote: > > Hi, > > Here's something that is puzzling me. > > We have a small cluster where there is literally no activity. > I open up an hbase shell and I type in  status

Re: region server appearing twice on HBase Master page

2010-03-11 Thread Stack
On Thu, Mar 11, 2010 at 1:48 PM, Michael Segel wrote: > > Hey! > The patch appears to be working, but can anyone give any more information on > what could be causing the DNS 'hiccup'? > Isn't this a question for your ops team? Why a lookup gives different answers at different times (IIUC)? >

Trying to understand the results from the status command in hbase shell

2010-03-11 Thread Michael Segel
Hi, Here's something that is puzzling me. We have a small cluster where there is literally no activity. I open up an hbase shell and I type in status. I get back the following: hbase(main):003:0> status 3 servers, 0 dead, 25. average load Ok, so what does the 25.00 average load mean if the

RE: region server appearing twice on HBase Master page

2010-03-11 Thread Michael Segel
> Date: Thu, 11 Mar 2010 12:08:34 -0800 > Subject: Re: region server appearing twice on HBase Master page > From: st...@duboce.net > To: hbase-user@hadoop.apache.org > > I just applied hbase-2174 to branch and trunk. > St.Ack > > On Thu, Mar 11, 2010 at 10:19 AM, Jean-Daniel Cryans > wrote: >

Re: random access and hotspots

2010-03-11 Thread TuX RaceR
Hi Alex Thanks again for your detailed answer. Alex Baranov wrote: So, 2 to 50 columns in each row. In case the single row size (in bytes) is not large then if requests load (number of concurrent clients which perform described queries) is heavy, then you probably should consider simple dat

Re: region server appearing twice on HBase Master page

2010-03-11 Thread Stack
I just applied hbase-2174 to branch and trunk. St.Ack On Thu, Mar 11, 2010 at 10:19 AM, Jean-Daniel Cryans wrote: > Bringing the discussion in hbase-user > > That usually happens after a DNS hiccup. There's a fix for that in > https://issues.apache.org/jira/browse/HBASE-2174 > > J-D > > On Wed, M

Re: random access and hotspots

2010-03-11 Thread Alex Baranov
> > How many columns Random table would have? > > few 10 of millions (10^7) > What is the row size? > > Rows will contain from two to 50 columns You probably meant "few 10 of millions (10^7)" is a row count. So, 2 to 50 columns in each row. In case the single row size (in bytes) is not large the

Re: region server appearing twice on HBase Master page

2010-03-11 Thread Jean-Daniel Cryans
Yes, servers and clients will all need to be on 0.20.4... that's the tradeoff for more flexibility in the future (see http://issues.apache.org/jira/browse/HBASE-2219). J-D On Thu, Mar 11, 2010 at 10:52 AM, Ted Yu wrote: > That makes sense. > This means we have to replace hbase client when we upg

Re: region server appearing twice on HBase Master page

2010-03-11 Thread Jean-Daniel Cryans
We are about to commit it to 0.20.4 since we voted on it breaking RPC compatibility. In this case the DNS lookup gives a different address, so the master treats it as a different region server and gives it a new startcode. In the web ui the two lines should have the same address, but not the same

Re: region server appearing twice on HBase Master page

2010-03-11 Thread Ted Yu
0.20.5 seems a bit far in the future :-) What I couldn't explain is that why serversToServerInfo, backed by ConcurrentHashMap, would contain two entries with same key - X.com. On Thu, Mar 11, 2010 at 10:19 AM, Jean-Daniel Cryans wrote: > Bringing the discussion in hbase-user > > That usually hap

Re: region server appearing twice on HBase Master page

2010-03-11 Thread Jean-Daniel Cryans
Bringing the discussion in hbase-user That usually happens after a DNS hiccup. There's a fix for that in https://issues.apache.org/jira/browse/HBASE-2174 J-D On Wed, Mar 10, 2010 at 1:41 PM, Ted Yu wrote: > I noticed two lines for the same region server on HBase Master page: > X.com:60030    12

Re: Table left unresponsive after Thrift socket timeout

2010-03-11 Thread Jean-Daniel Cryans
Joe, We'll need to learn what happened to that region, they usually don't throw up after a few inserts ;) So in that region server's log, before you tried disabling that table, do you see anything wrong (exceptions probably)? If you have a web server, it would be nice to drop the full RS log and

Re: random access and hotspots

2010-03-11 Thread TuX RaceR
Hello Alex, Thank you for your mail. Alex Baranov wrote: How many columns Random table would have? few 10 of millions (10^7) What is the row size? Rows will contain from two to 50 columns How many rows are you going to fetch at one time (I assume just for displaying one page with 10, 20, 1

Re: lucene index{reader|writer} on hbase (GSOC idea?)

2010-03-11 Thread Stack
Whats this look like Kay Kay? Implementations of IndexReader and IndexWriter? St.Ack On Wed, Mar 10, 2010 at 10:05 PM, Kay Kay wrote: > Hi - >  I had initially forked out the component , of indexing a specific column > (family) contents in HBase, that was already present and moved it to github.

Re: Tables Miss after Adding New Slave

2010-03-11 Thread Stack
No need to restart hbase. Just start up pertinent services on added node. For example start datanode and then then start regionserver daemons as follows: > ${HADOOP_HOME}/bin/hadoop-daemon.sh start datanode > ${HBASE_HOME}./bin/hbase-daemon.sh start regionserver St.Ack On Wed, Mar 10, 2010 at

Re: when Hbase open a region, what does it do? problem with super big row(1.5GB data in one row)

2010-03-11 Thread Stack
Yes. Specify a column family or a column family + column qualifier to load less than total row. St.Ack On Wed, Mar 10, 2010 at 11:36 PM, William Kang wrote: > Hi, > I have another question. If we do things like following: > > "Get g = new Get(Bytes.toBytes("rowname")); > > > Result r = table.ge

Re: random access and hotspots

2010-03-11 Thread Alex Baranov
How many columns Random table would have? What is the row size? How many rows are you going to fetch at one time (I assume just for displaying one page with 10, 20, 100 records?)? How big is your data (estimated rows count)? How many different types of "indexes" are you planning to have? > ...I n

Call for presentations - Berlin Buzzwords - Summer 2010

2010-03-11 Thread Isabel Drost
Call for Presentations Berlin Buzzwords http://buzzwordsberlin.de Berlin Buzzwords 2010 - Search, Store, Scale 7/8 June 2010 This is to announce the Berlin Buzzwords 2010. The first conference on scalable and open search, data process

Re: random access and hotspots

2010-03-11 Thread TuX RaceR
Thanks Alex for your answer. I am not yet at a stage where I can measure the performance (I am still at the db design stage, initial population) but my understanding what that randomizing the keys was a way of avoiding keys hotspots. To simplify let's assume that have documents attached to use

Re: random access and hotspots

2010-03-11 Thread Alex Baranov
Hello Tux, Accessing a table in "random access"-manner is not the reason for randomizing keys. You will likely need to randomize your keys only for better performance during importing existed large dataset into HBase. Otherwise if you don't have insertion rate bigger than 20K records/sec I wouldn'

random access and hotspots

2010-03-11 Thread TuX RaceR
Hello List, I'll be accessing a table mainly in random access and I am looking for an efficient way of randomizing the keys. I thought about a MD5 hash of the ID of the record, but as MD5 returns a string of chars [0-9A-F] I was wondering if there was a better method to use. Thanks TuX