Re: the files of one table is so big?

2011-03-31 Thread 陈加俊
I want to copy the files of one table from one cluster to another cluster . So I do it by step: 1 . bin/hadoop fs -copyToLocal at A 2. scp files from A to B 3. bin/hadoop fs -copyFromLocal at B I scan the table and save the data to a file by a file formats which is defined by me before yesterday

Re: HBase Case Studies

2011-03-31 Thread Ted Dunning
Do you mean 100Mb rows? That seems pretty fast. On Thu, Mar 31, 2011 at 5:29 PM, Jean-Daniel Cryans wrote: > Sub-second responses for 100MBs files? You sure that's right? > > Regarding proper case studies, I don't think a single one exists. > You'll find presentations decks about some use cases

Re: Modelling threaded messages

2011-03-31 Thread Ted Dunning
Solr/Elastic search is a fine solution, but probably won't be quite as fast as a well-tuned hbase solution. One key assumption you seem to be making is that you will store messages only once. If you are willing to make multiple updates to tables, then you can arrange the natural ordering of the t

Re: HBase strangeness and double deletes of HDFS blocks and writing to closed blocks

2011-03-31 Thread Chris Tarnas
Thanks for your help J.D., answers inline: On Mar 31, 2011, at 8:00 PM, Jean-Daniel Cryans wrote: > I wouldn't worry too much at the moment for what seems to be double > deletes of blocks, I'd like to concentrate on the state of your > cluster first. > > So if you run hbck, do you see any incons

RE: A lot of data is lost when name node crashed

2011-03-31 Thread Gaojinchao
Thanks, please submit a patch and I can try to test it. Jira is : https://issues.apache.org/jira/browse/HBASE-3722 -邮件原件- 发件人: jdcry...@gmail.com [mailto:jdcry...@gmail.com] 代表 Jean-Daniel Cryans 发送时间: 2011年4月1日 1:20 收件人: Gaojinchao; user@hbase.apache.org 主题: Re: A lot of data is lost when

Re: hadoop.log.file

2011-03-31 Thread Jean-Daniel Cryans
This is all determined in hbase-daemon.sh: https://github.com/apache/hbase/blob/trunk/bin/hbase-daemon.sh#L117 The log4j file sets default values just in case the processes are started in another way (as far as I understand it). J-D On Thu, Mar 31, 2011 at 5:58 PM, Geoff Hendrey wrote: > whoop

Re: HBase strangeness and double deletes of HDFS blocks and writing to closed blocks

2011-03-31 Thread Jean-Daniel Cryans
I wouldn't worry too much at the moment for what seems to be double deletes of blocks, I'd like to concentrate on the state of your cluster first. So if you run hbck, do you see any inconsistencies? In the datanode logs, do you see any exceptions regarding xcievers (just in case). In the region

RE: hadoop.log.file

2011-03-31 Thread Geoff Hendrey
whoops, yep that's the one. Just trying to understand how it related to the master logfile, and regionserver logfile, and zookeeper logfile. -geoff -Original Message- From: jdcry...@gmail.com [mailto:jdcry...@gmail.com] On Behalf Of Jean-Daniel Cryans Sent: Thursday, March 31, 2011 5:54 P

Modelling threaded messages

2011-03-31 Thread Mark Jarecki
Hi all, I'm modelling a schema for storing and retrieving threaded messages, where, for planning purposes: - there are many millions of users. - a user might have up to 1000 threads. - each thread might have up to 5 messages (with some threads being sparse with only

Re: hadoop.log.file

2011-03-31 Thread Jean-Daniel Cryans
The HBase log4j.properties doesn't have that, but it has hbase.log.file https://github.com/apache/hbase/blob/trunk/conf/log4j.properties Is it what you're talking about? Thx, J-D On Thu, Mar 31, 2011 at 5:48 PM, Geoff Hendrey wrote: > it is in log4j.properties (/conf). > > -geoff > > -Ori

RE: hadoop.log.file

2011-03-31 Thread Geoff Hendrey
it is in log4j.properties (/conf). -geoff -Original Message- From: jdcry...@gmail.com [mailto:jdcry...@gmail.com] On Behalf Of Jean-Daniel Cryans Sent: Thursday, March 31, 2011 5:26 PM To: user@hbase.apache.org Subject: Re: hadoop.log.file Where is that "hadoop.log.file" you're talking a

Re: HBase Case Studies

2011-03-31 Thread Jean-Daniel Cryans
Sub-second responses for 100MBs files? You sure that's right? Regarding proper case studies, I don't think a single one exists. You'll find presentations decks about some use cases if you google a bit tho. J-D On Thu, Mar 31, 2011 at 12:20 PM, Shantian Purkad wrote: > Hello, > > Does anyone kno

Re: hadoop.log.file

2011-03-31 Thread Jean-Daniel Cryans
Where is that "hadoop.log.file" you're talking about? J-D On Thu, Mar 31, 2011 at 3:22 PM, Geoff Hendrey wrote: > Hi - > > > > I was wondering where I can find an explanation of what hbase logs to > hadoop.log.file. This file is defined in log4j.properties. I see > DFSClient logging to it, but I

RE: ScannerTimeoutException when a scan enables caching, no exception when it doesn't

2011-03-31 Thread Buttler, David
I think this is expected. The caching means that you only get blocks of 2000 rows. And if you go for longer than 60 seconds between blocks, then the scanner will time out. You could try tuning your caching down to 100 to see if that works for a bit (although, due to variance in the time you t

Re: ScannerTimeoutException when a scan enables caching, no exception when it doesn't

2011-03-31 Thread Jean-Daniel Cryans
That's the correct guess. J-D On Thu, Mar 31, 2011 at 4:59 PM, Joseph Boyd wrote: > We're using hbase 0.90.0 here, and I'm seeing a curious behavior with my > scans. > > I have some code that does a scan over a table, and for each row > returned some work to verify the data... > > I set the sca

HBase strangeness and double deletes of HDFS blocks and writing to closed blocks

2011-03-31 Thread Christopher Tarnas
I've been trying to track down some hbase strangeness from what looks to be lost hbase puts: in one thrift put we insert data into two different column families at different rowkeys, but only one of the rows is there. There were no errors to the client or the thrift log, which is a little disturbi

ScannerTimeoutException when a scan enables caching, no exception when it doesn't

2011-03-31 Thread Joseph Boyd
We're using hbase 0.90.0 here, and I'm seeing a curious behavior with my scans. I have some code that does a scan over a table, and for each row returned some work to verify the data... I set the scan up like so : byte[] family = Bytes.toBytes("mytable"); Scan scan = new Scan(); scan.setCac

hadoop.log.file

2011-03-31 Thread Geoff Hendrey
Hi - I was wondering where I can find an explanation of what hbase logs to hadoop.log.file. This file is defined in log4j.properties. I see DFSClient logging to it, but I can't locate a doc describing exactly what hadoop.log.file is for, by Hbase. -geoff

Re: Speeding up LoadIncrementalHFiles?

2011-03-31 Thread Adam Phelps
On 3/31/11 12:41 PM, Ted Yu wrote: Adam: I logged https://issues.apache.org/jira/browse/HBASE-3721 Thanks for opening that. I haven't delved much into the HBase code previously, but I may take a look into this since it is causing us some trouble currently. - Adam

Re: Speeding up LoadIncrementalHFiles?

2011-03-31 Thread Ted Yu
Adam: I logged https://issues.apache.org/jira/browse/HBASE-3721 Feel free to comment on that JIRA. On Thu, Mar 31, 2011 at 11:14 AM, Adam Phelps wrote: > On 3/30/11 8:39 PM, Stack wrote: > >> What is slow? The running of the LoadIncrementHFiles or the copy? >> > > Its the LoadIncrementHFiles p

HBase Case Studies

2011-03-31 Thread Shantian Purkad
Hello, Does anyone know of any case studies where HBase is used in production for a large data volumes (including big files/documents on the scale of few KBs-100MBs stored in rows) and giving subsecond responses to online queries? Thanks and Regards, Shantian

Re: Speeding up LoadIncrementalHFiles?

2011-03-31 Thread Adam Phelps
On 3/30/11 8:39 PM, Stack wrote: What is slow? The running of the LoadIncrementHFiles or the copy? Its the LoadIncrementHFiles portion. If the former, is it because the table its loading into has different boundaries than those of the HFiles so the HFiles have to be split? I'm sure that co

Re: the files of one table is so big?

2011-03-31 Thread Jean-Daniel Cryans
Depends what you're trying to do? Like I said you didn't give us a lot of information so were pretty much in the dark regarding what you're trying to achieve. At first you asked why the files were so big, I don't see the relation with the log files. Also I'm not sure why you referred to the numbe

Re: Performance test results

2011-03-31 Thread Jean-Daniel Cryans
Inline. J-D > I assume the block cache tunning key you talk about is > "hfile.block.cache.size", right? If it is only 20% by default than > what is the rest of the heap used for? Since there are no fancy > operations like joins and since I'm not using memory tables the only > thing I can think of

Re: A lot of data is lost when name node crashed

2011-03-31 Thread Jean-Daniel Cryans
(sending back to the list, please don't answer to directly to the sender, always send back to the mailing list) MasterFileSystem has most of DFS interactions, it seems that checkFileSystem is never called (it should be) and splitLog catches the ERROR when splitting but doesn't abort. Would you mi

Re: Changing Zookeeper address programmatically for reduces

2011-03-31 Thread Jean-Daniel Cryans
So you tried to write to another cluster instead? Because it says you didn't specify the other cluster correctly, CopyTable contains a help that describes how that value should be constructed. J-D On Thu, Mar 31, 2011 at 2:23 AM, Stuart Scott wrote: > Hi J-D, > > Thanks for the info. > I tried t

Re: Performance test results

2011-03-31 Thread Eran Kutner
I assume the block cache tunning key you talk about is "hfile.block.cache.size", right? If it is only 20% by default than what is the rest of the heap used for? Since there are no fancy operations like joins and since I'm not using memory tables the only thing I can think of is the memstore right?

Re: hole in META

2011-03-31 Thread Venkatesh
Yeah...excise_regions seem to work but plug_hole does n't plug the hole..thinks the region still exists in META May be the issue is with excise_regions..& does n't cleanly remove it.. I also tried /hbase org.apache.hadoop.hbase.util.Merge That does n't work for me in 0.20.6.. What are the r

RE: How to copy HTable from one cluster to another cluster ?

2011-03-31 Thread Buttler, David
You will have to make sure you are not writing to the table that you are copying to the local disk. It seems reasonable to me, but I would suggest trying it out with a small data set to make sure you get the process down. Dave -Original Message- From: 陈加俊 [mailto:cjjvict...@gmail.com]

Re: the files of one table is so big?

2011-03-31 Thread Stack
If you skip the log files, you are likely dropping data. St.Ack On Thu, Mar 31, 2011 at 12:27 AM, 陈加俊 wrote: > Can I skip the log files? > > On Thu, Mar 31, 2011 at 2:17 PM, 陈加俊 wrote: > >> I found there is so many log files under the table folder and it is very >> big ! >> >> >> On Thu, Mar 31,

Re: what is this mean ?

2011-03-31 Thread Stack
Our session with zookeeper expired so the regionserver shut itself down probably because of long GC. Please upgrade to 0.90.x. St.Ack On Thu, Mar 31, 2011 at 2:50 AM, 陈加俊 wrote: > 2011-03-30 20:25:12,798 WARN org.apache.zookeeper.ClientCnxn: Exception > closing session 0x932ed83611540001 to sun

what is this mean ?

2011-03-31 Thread 陈加俊
2011-03-30 20:25:12,798 WARN org.apache.zookeeper.ClientCnxn: Exception closing session 0x932ed83611540001 to sun.nio.ch.SelectionKeyImpl@54e184e6 java.io.IOException: Read error rc = -1 java.nio.DirectByteBuffer[pos=0 lim=4 cap=4] at org.apache.zookeeper.ClientCnxn$SendThread.doIO(ClientCn

RE: Changing Zookeeper address programmatically for reduces

2011-03-31 Thread Stuart Scott
Hi J-D, Thanks for the info. I tried this but ended up with the following error. Any ideas? Exception in thread "main" java.io.IOException: Please specify the peer cluster as hbase.zookeeper.quorum:zookeeper.znode.parent at org.apache.hadoop.hbase.mapreduce.TableMapReduceUtil.initTableR

Re: the files of one table is so big?

2011-03-31 Thread 陈加俊
Can I skip the log files? On Thu, Mar 31, 2011 at 2:17 PM, 陈加俊 wrote: > I found there is so many log files under the table folder and it is very > big ! > > > On Thu, Mar 31, 2011 at 2:16 PM, 陈加俊 wrote: > >> I fond there is so many log files under the table folder and it is very >> big ! >> >>