Re: Loading text files from local file system

2013-04-16 Thread Surendra , Manchikanti
Hi Joshi, You can use Flume + AsyncHbaseSink / HBasesink to move data from local file sytem to HBase. Thanks, Surendra M -- Surendra Manchikanti On Wed, Apr 17, 2013 at 10:01 AM, Omkar Joshi wrote: > The background thread is here : > > > http://mail-archives.apache.org/mod_mbox/hbase-user/2

Loading text files from local file system

2013-04-16 Thread Omkar Joshi
The background thread is here : http://mail-archives.apache.org/mod_mbox/hbase-user/201304.mbox/%3ce689a42b73c5a545ad77332a4fc75d8c1efbe84...@vshinmsmbx01.vshodc.lntinfotech.com%3E Following are the commands that I'm using to load files onto HBase : HADOOP_CLASSPATH=`${HBASE_HOME}/bin/hbase clas

Re: how to evaluate the up-limit number of Regions?

2013-04-16 Thread Ted Yu
I am not sure I understand your question completely. Were you asking the upper bound of number of regions, given certain hardware resources ? Can you outline your expectation for throughput / latency ? I guess answers you may get would vary, depending on type of application, etc. On Tue, Apr 16,

how to evaluate the up-limit number of Regions?

2013-04-16 Thread Bing Jiang
hi,all I want to know whether it it a criterio or bible to measure the capicity of hbase cluster. >From my views, it depends on: 1. hdfs volumn 2. system memory setting 3. Network IO, etc However, with the increase of number of table and region, how to evaluate the ability of service is not enough,

RE: Data not loaded in table via ImportTSV

2013-04-16 Thread Omkar Joshi
Hi Anoop, Actually, I got confused after reading the doc. - I thought a simple importtsv command(which also takes table name as the argument) would suffice. But as you pointed out, completebulkload is required. HADOOP_CLASSPATH=`${HBASE_HOME}/bin/hbase classpath` ${HADOOP_HOME}/bin/hadoop jar

Renaming ".snapshot" directory used by HBase snapshots

2013-04-16 Thread Ted Yu
Hi, In the first half of this email, let me summarize our findings: Yesterday afternoon Huned ad I discovered an issue while playing with HBase Snapshots on top of Hadoop's Snapshot branch ( http://svn.apache.org/viewvc/hadoop/common/branches/HDFS-2802/). HDFS (built from HDFS-2802 branch) doesn'

Re: Region has been CLOSING for too long, this should eventually complete or the server will expire, send RPC again

2013-04-16 Thread rajesh babu chintaguntla
Hi Dylan, $HBASE_HOME/bin/hbase hbck -fix the above command can bring back the cluster to normal state. Just if master restarted while ROOT/META region in closing(during move or balance) then the problem you have reported will come easily. Thanks for detailed logs. Raised an issue for this. You

How to remove all traces of a dropped table.

2013-04-16 Thread David Koch
Hello, We had problems with not being able to scan over a large (~8k regions) table so we disabled and dropped it and decided to re-import data from scratch into a table with the SAME name. This never worked and I list some log extracts below. The only way to make the import go through was to imp

Re: 答复: HBase random read performance

2013-04-16 Thread lars hofhansl
This fundamentally different, though. A scanner by default scans all regions serially, because it promises to return all rows in sort order. A multi get is already parallelized across regions (and hence accross region servers). Before we do a lot of work here we should fist make sure that nothi

Re: schema design: rows vs wide columns

2013-04-16 Thread Michael Segel
I think the important thing about Column Families is trying to understand on how to use them properly in a design. Sparse data may make sense. It depends on the use case and an understanding of the trade offs. It all depends on how the data breaks down in to specific use cases. Keeping CFs

Re: schema design: rows vs wide columns

2013-04-16 Thread Ted Yu
bq. Maybe we can explain why there is some impacts, or what to consider? The above would be covered in the JIRA. Thanks On Tue, Apr 16, 2013 at 7:04 AM, Jean-Marc Spaggiari < jean-m...@spaggiari.org> wrote: > Can we add more details than just changing the maximum CF number? Maybe we > can expla

Re: schema design: rows vs wide columns

2013-04-16 Thread Jean-Marc Spaggiari
Can we add more details than just changing the maximum CF number? Maybe we can explain why there is some impacts, or what to consider? JM 2013/4/16 Ted Yu > If there is no objection, I will create a JIRA to increase the maximum > number of column families described here: > > http://hbase.apache

Re: schema design: rows vs wide columns

2013-04-16 Thread Ted Yu
If there is no objection, I will create a JIRA to increase the maximum number of column families described here: http://hbase.apache.org/book.html#number.of.cfs Cheers On Mon, Apr 8, 2013 at 7:21 AM, Doug Meil wrote: > > > For the record, the refGuide mentions potential issues of CF lumpiness >

Re: How practical is it to add a timestamp oracle on Zookeeper

2013-04-16 Thread Jean-Marc Spaggiari
Hi Yun, Attachements are not working on the mailing list. However, everyone using HBase should have the book on its desk, so I have ;) On the figure 8-11, you can see that client wil contact ZK to know where the root region is. Then the root region to find the meta, and so on. BUT This will

Re: How practical is it to add a timestamp oracle on Zookeeper

2013-04-16 Thread yun peng
Hi, Jean and Jieshan, Are you saying client can directly contact region servers? Maybe I overlooked, but I think the client may need lookup regions by first contacting Zk as in figure 8-11 from definitive book(as attached)... Nevertheless, if it is the case, to assign a global timestamp, what is t

Re: How practical is it to add a timestamp oracle on Zookeeper

2013-04-16 Thread Ted Yu
Have you looked at https://github.com/yahoo/omid/wiki ? The Status Oracle implementation may give you some clue. Cheers On Apr 16, 2013, at 5:14 AM, yun peng wrote: > Hi, All, > I'd like to add a global timestamp oracle on Zookeep to assign globally > unique timestamp for each Put/Get issued

RE: How practical is it to add a timestamp oracle on Zookeeper

2013-04-16 Thread Bijieshan
Yes, Jean-Marc Spaggiari is right. Performance is the big problem of this approach, though zookeeper can help you implement this. Regards, Jieshan -Original Message- From: Jean-Marc Spaggiari [mailto:jean-m...@spaggiari.org] Sent: Tuesday, April 16, 2013 8:20 PM To: user@hbase.apache.or

Re: How practical is it to add a timestamp oracle on Zookeeper

2013-04-16 Thread Jean-Marc Spaggiari
Hi Yun, If I understand you correctly, that mean that each time our are going to do a put or a get you will need to call ZK first? Since ZK has only one master active, that mean that this ZK master will be called for each HBase get/put? You are going to create a bottle neck there. I don't know h

How practical is it to add a timestamp oracle on Zookeeper

2013-04-16 Thread yun peng
Hi, All, I'd like to add a global timestamp oracle on Zookeep to assign globally unique timestamp for each Put/Get issued from HBase cluster. The reason I put it on Zookeeper is that each Put/Get needs to go through it and unique timestamp needs some global centralised facility to do it. But I am a

Re: 答复: HBase random read performance

2013-04-16 Thread Jean-Marc Spaggiari
Hi Nicolas, I think it might be good to create a JIRA for that anyway since seems that some users are expecting this behaviour. My 2¢ ;) JM 2013/4/16 Nicolas Liochon > I think there is something in the middle that could be done. It was > discussed here a while ago, but without any JIRA create

Re: 答复: HBase random read performance

2013-04-16 Thread Nicolas Liochon
I think there is something in the middle that could be done. It was discussed here a while ago, but without any JIRA created. See thread: http://mail-archives.apache.org/mod_mbox/hbase-user/201302.mbox/%3CCAKxWWm19OC+dePTK60bMmcecv=7tc+3t4-bq6fdqeppix_e...@mail.gmail.com%3E If someone can spend s

RE: 答复: HBase random read performance

2013-04-16 Thread Liu, Raymond
So what is lacking here? The action should also been parallel inside RS for each region, Instead of just parallel on RS level? Seems this will be rather difficult to implement, and for Get, might not be worthy? > > I looked > at src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.ja