HDFS footprint of a table

2012-09-11 Thread Lin Ma
Hi guys, Supposing I have a table in HBase, how to estimate its storage footprint? Thanks. regards, Lin

Re: batch update question

2012-09-07 Thread Lin Ma
; > http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HTable.html#getRegionLocation%28byte[],%20boolean%29 > > Because the Hbase client talks directly to each RS, it has to know the > region boundaries. > > > > From: Lin Ma > Date: Thursday, September 6, 20

Re: batch update question

2012-09-06 Thread Lin Ma
6, 2012 at 12:59 AM, Doug Meil wrote: > > Hi there, if you look in the source code for HTable there is a list of Put > objects. That's the buffer, and it's a client-side buffer. > > > > > > On 9/5/12 12:04 PM, "Lin Ma" wrote: > > >Thank

Re: confused by two add method of Put class

2012-09-06 Thread Lin Ma
There are use cases, where either one is easier to use to achieve things. > > Cheers > > Julian > > 2012/9/2 Lin Ma > > > Hello HBase masters, > > > > For the two add methods of Put class, > > > > > > > http://hbase.apache.org/apidocs/or

Re: batch update question

2012-09-05 Thread Lin Ma
Sun, Sep 2, 2012 at 2:13 AM, Lin Ma wrote: > > Hello guys, > > > > I am reading the book "HBase, the definitive guide", at the beginning of > > chapter 3, it is mentioned in order to reduce performance impact for > > clients to update the same row (lock

Re: batch update question

2012-09-04 Thread Lin Ma
see > https://github.com/sematext/HBaseHUT > > > > ---------- > Lin Ma schrieb am So., 2. Sep 2012 11:13 MESZ: > > >Hello guys, > > > >I am reading the book "HBase, the definitive guide", at the beginning of > >chapter 3, it i

confused by two add method of Put class

2012-09-02 Thread Lin Ma
Hello HBase masters, For the two add methods of Put class, http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Put.html#add%28org.apache.hadoop.hbase.KeyValue%29 http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Put.html#add%28byte[],%20byte[],%20long,%20byte[]%29 I thin

batch update question

2012-09-02 Thread Lin Ma
Hello guys, I am reading the book "HBase, the definitive guide", at the beginning of chapter 3, it is mentioned in order to reduce performance impact for clients to update the same row (lock contention issues for automatic write), batch update is preferred. My questions is, for MR job, what are th

Re: client cache for all region server information?

2012-08-28 Thread Lin Ma
am wrong. regards, Lin On Tue, Aug 28, 2012 at 2:41 PM, Harsh J wrote: > Lin, > > On Tue, Aug 28, 2012 at 9:09 AM, Lin Ma wrote: > > Thanks Harsh, > > > > A two more comments / thoughts, > > > > 1. For mapper: mapper normally runs on the same regional ser

Re: client cache for all region server information?

2012-08-27 Thread Lin Ma
nt to the RS. Whatever is accumulated > as the result of the Scan operation (server-side) is accumulated in > sizes of 500 rows and returned in one Scanner.next() call from the > client. > > Does this clear it up Lin? > > On Mon, Aug 27, 2012 at 8:40 PM, Lin Ma wrote: > &g

Re: client cache for all region server information?

2012-08-27 Thread Lin Ma
this value to 500, for example, will transfer 500 rows at a time to the client to be processed."* regards, Lin On Thu, Aug 23, 2012 at 11:37 PM, Harsh J wrote: > Hi Lin, > > On Thu, Aug 23, 2012 at 7:56 PM, Lin Ma wrote: > > Harsh, thanks for the detailed information.

Re: how client location a region/tablet?

2012-08-23 Thread Lin Ma
le and Hbase. > > Thanks, > Abhishek > > > -Original Message- > From: Lin Ma [mailto:lin...@gmail.com] > Sent: Thursday, August 23, 2012 9:41 AM > To: user@hbase.apache.org; ha...@cloudera.com > Cc: doug.m...@explorysmedical.com > Subject: Re: how client loca

Re: how client location a region/tablet?

2012-08-23 Thread Lin Ma
utilizing one room for living before having more children. :-) regards, Lin On Fri, Aug 24, 2012 at 12:46 AM, Harsh J wrote: > Lin, > > On Thu, Aug 23, 2012 at 10:10 PM, Lin Ma wrote: > > Thanks, Harsh! > > > > - "HBase currently keeps a single META region (Doesn'

Re: how client location a region/tablet?

2012-08-23 Thread Lin Ma
regards, Lin On Thu, Aug 23, 2012 at 11:48 PM, Harsh J wrote: > HBase currently keeps a single META region (Doesn't split it). ROOT > holds META region location, and META has a few rows in it, a few of > them for each table. See also the class MetaScanner. > > On Thu, Aug

Re: how client location a region/tablet?

2012-08-23 Thread Lin Ma
out which META region server to access. Not sure if I get the points. Please feel free to correct me. regards, Lin On Thu, Aug 23, 2012 at 11:15 PM, Lin Ma wrote: > Doug, very informative document. Thanks a lot! > > I read through it and have some thoughts, > > - Supposing at the

Re: how client location a region/tablet?

2012-08-23 Thread Lin Ma
On Thu, Aug 23, 2012 at 8:21 PM, Doug Meil wrote: > > For further information about the catalog tables and region-regionserver > assignment, see thisŠ > > http://hbase.apache.org/book.html#arch.catalog > > > > > > > On 8/19/12 7:36 AM, "Lin Ma" w

Re: client cache for all region server information?

2012-08-23 Thread Lin Ma
you could point me to some more detailed information. regards, Lin On Thu, Aug 23, 2012 at 9:35 PM, Harsh J wrote: > Hi Lin, > > On Thu, Aug 23, 2012 at 4:31 PM, Lin Ma wrote: > > Thank you Abhishek, > > > > Two more comments, > > > > -- "Client onl

Re: client cache for all region server information?

2012-08-23 Thread Lin Ma
server. > > Btw,Client only caches information as needed for its queries and not > necessarily for 'all' region servers. > > Abhishek > > > i Sent from my iPad with iMstakes > > On Aug 22, 2012, at 23:31, "Lin Ma" wrote: > > > Hello

client cache for all region server information?

2012-08-22 Thread Lin Ma
Hello HBase masters, I am wondering whether in current implementation, each client of HBase cache all information of region server, for example, where is region server (physical hosting machine of region server), and also cache row-key range managed by the region server. If so, two more questions,

Re: Using HBase serving to replace memcached

2012-08-22 Thread Lin Ma
e such that 'x' comes in between and Hbase > will load that block. So usage of blooms can avoid this IO. Hope this is > clear for you now. > > -Anoop- > > From: Lin Ma [lin...@gmail.com] > Sent: Wednesday, August 22, 2012 5:41 PM > To: J Mohamed Zahoor;

Re: Using HBase serving to replace memcached

2012-08-22 Thread Lin Ma
Thanks Zahoor, I read through the document you referred to, I am confused about what means leaf-level index, intermediate-level index and root-level index. It is appreciate if you could give more details what they are, or point me to the related documents. BTW: the document you pointed me is very

Re: Using HBase serving to replace memcached

2012-08-21 Thread Lin Ma
Thanks Zahoor, > If there is no bloom... you have to load every block and scan to find if the row exists.. I could be wrong. I think HFile index block (which is located at the end of HFile) is a binary search tree containing all row-key values (of the HFile) in the binary search tree. Searching a

Re: Using HBase serving to replace memcached

2012-08-21 Thread Lin Ma
Thank you Zahoor, Two more comments, 1. After reading the materials you sent to me, I am confused how Bloom Filter could save I/O during random read. Supposing I am not using Bloom Filter, in order to find whether a row (or row-key) exists, we need to scan the index block which is at the end part

Re: Using HBase serving to replace memcached

2012-08-21 Thread Lin Ma
e are some requests > > which will be handled incorrectly. > > > > Memcached is great but also look at Guava cache for similar use cases. > > > > Asif Ali > > > > > > On Mon, Aug 20, 2012 at 9:09 AM, Lin Ma wrote: > > > > > Thank you D

Re: Using HBase serving to replace memcached

2012-08-21 Thread Lin Ma
reat but also look at Guava cache for similar use cases. > > Asif Ali > > > On Mon, Aug 20, 2012 at 9:09 AM, Lin Ma wrote: > > > Thank you Drew. I like your reply, especially blocking cache nature > > provided by HBase. A quick question, for traditional memcached,

Re: Using HBase serving to replace memcached

2012-08-20 Thread Lin Ma
odel is persist things in HBase and then cache things > with memcached just as you would with any other data store. If you're > looking for a spiffy memcached replacement I'd recommend checking out > Redis. > > > On Sat, Aug 18, 2012 at 3:12 AM, Lin Ma wrote: >

Re: how client location a region/tablet?

2012-08-19 Thread Lin Ma
hich is region / physical server mapping data. Why you say not data (do you mean real content in each region)? regards, Lin On Sun, Aug 19, 2012 at 12:40 PM, Stack wrote: > On Sat, Aug 18, 2012 at 2:13 AM, Lin Ma wrote: > > Hello guys, > > > > I am referencing the Big T

how client location a region/tablet?

2012-08-18 Thread Lin Ma
Hello guys, I am referencing the Big Table paper about how a client locates a tablet. In section 5.1 Tablet location, it is mentioned that client will cache all tablet locations, I think it means client will cache root tablet in METADATA table, and all other tablets in METADATA table (which means

Re: column based or row based storage for HBase?

2012-08-12 Thread Lin Ma
llowing link comparing "traditional" columnar databases against > HBase/BigTable interesting: > > > http://dbmsmusings.blogspot.com/2010/03/distinguishing-two-major-types-of_29.html > > -Jason > > On Sun, Aug 5, 2012 at 8:03 PM, Lin Ma wrote: > > > Thank

Re: consistency, availability and partition pattern of HBase

2012-08-09 Thread Lin Ma
region servers serve a single region gets you into the land > of maintaining consistency across copies, which is challenging. It might be > doable but that's not the design choice Bigtable (and hence HBase) made > initially. > > On Thu, Aug 9, 2012 at 11:04 AM, Lin Ma wrote:

Re: consistency, availability and partition pattern of HBase

2012-08-09 Thread Lin Ma
:41 PM, Amandeep Khurana wrote: > HDFS also chooses to degrade availability in the face of partitions. > > > On Thu, Aug 9, 2012 at 11:08 AM, Lin Ma wrote: > >> Amandeep, thanks for your comments, and I will definitely read the paper >> you suggested. >> >&g

Re: consistency, availability and partition pattern of HBase

2012-08-08 Thread Lin Ma
will be visible to all > clients. There is no concept of multiple different versions that the > clients need to reconcile between. When you read, you always get the same > version of the row you are reading. In other words, HBase is strongly > consistent. > > Hope that clears things u

Re: consistency, availability and partition pattern of HBase

2012-08-08 Thread Lin Ma
gt; > On Wed, Aug 8, 2012 at 10:32 PM, Lin Ma wrote: > > > Thank you Lars. > > > > Is the same data store duplicated copy across region server? If so, if > one > > primary server for the region dies, client just need to read from the > > secondary server for the

Re: consistency, availability and partition pattern of HBase

2012-08-08 Thread Lin Ma
> > - Original Message - > From: Lin Ma > To: user@hbase.apache.org > Cc: > Sent: Wednesday, August 8, 2012 8:47 AM > Subject: Re: consistency, availability and partition pattern of HBase > > And consistency is not sacrificed? i.e. all distributed clients' upda

Re: consistency, availability and partition pattern of HBase

2012-08-08 Thread Lin Ma
ility is sacrificed in the sense that if region server > fails clients will have data inaccessible for the time region comes up on > some other server, not to confuse with data loss. > > Sent from my iPad > > On Aug 7, 2012, at 11:56 PM, Lin Ma wrote: > > > Thank

Re: consistency, availability and partition pattern of HBase

2012-08-07 Thread Lin Ma
gt; Wei Tan > Research Staff Member > IBM T. J. Watson Research Center > 19 Skyline Dr, Hawthorne, NY 10532 > w...@us.ibm.com; 914-784-6752 > > > > From: Lin Ma > To: user@hbase.apache.org, > Date: 08/07/2012 09:30 PM > Subject:consistency, availa

consistency, availability and partition pattern of HBase

2012-08-07 Thread Lin Ma
Hello guys, According to the notes by Werner*, "*He presented the CAP theorem, which states that of three properties of shared-data systems—data consistency, system availability, and tolerance to network partition—only two can be achieved at any given time." => http://www.allthingsdistributed.com/

Re: column based or row based storage for HBase?

2012-08-06 Thread Lin Ma
you'd better take a look > at the content of HFile. > > regards! > > Yong > > On Mon, Aug 6, 2012 at 5:03 AM, Lin Ma wrote: > > Thank you for the informative reply, Mohit! > > > > Some more comments, > > > > 1. actually my confusion about co

Re: column based or row based storage for HBase?

2012-08-06 Thread Lin Ma
2, region B has row3. > A region is shard of a table based on the row key and just > > #1 above means that HBase will never place key value for "row1" in > different regions. > #2 means you very efficiently locate specific keys, as they are always > stored sorted. > &

Re: column based or row based storage for HBase?

2012-08-05 Thread Lin Ma
t; > I tried to write this up a while back: > http://hadoop-hbase.blogspot.com/2011/12/introduction-to-hbase.html > > -- Lars > > > > - Original Message - > From: Lin Ma > To: user@hbase.apache.org > Cc: > Sent: Sunday, August 5, 2012 6:04 AM > Subject: colu

Re: column based or row based storage for HBase?

2012-08-05 Thread Lin Ma
is used to describe the pattern to store sparse, large number of columns (with NULL for free). Any comments? regards, Lin On Mon, Aug 6, 2012 at 12:08 AM, Mohit Anchlia wrote: > On Sun, Aug 5, 2012 at 6:04 AM, Lin Ma wrote: > > > Hi guys, > > > > I am wondering whether H

column based or row based storage for HBase?

2012-08-05 Thread Lin Ma
Hi guys, I am wondering whether HBase is using column based storage or row based storage? - I read some technical documents and mentioned advantages of HBase is using column based storage to store similar data together to foster compression. So it means same columns of different rows are