HDFS footprint of a table

2012-09-11 Thread Lin Ma
Hi guys, Supposing I have a table in HBase, how to estimate its storage footprint? Thanks. regards, Lin

Re: batch update question

2012-09-07 Thread Lin Ma
/client/HTable.html#getRegionLocation%28byte[],%20boolean%29 Because the Hbase client talks directly to each RS, it has to know the region boundaries. From: Lin Ma lin...@gmail.com Date: Thursday, September 6, 2012 11:54 AM To: user@hbase.apache.org user@hbase.apache.org, Doug Meil

Re: confused by two add method of Put class

2012-09-06 Thread Lin Ma
to use to achieve things. Cheers Julian 2012/9/2 Lin Ma lin...@gmail.com Hello HBase masters, For the two add methods of Put class, http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Put.html#add%28org.apache.hadoop.hbase.KeyValue%29 http://hbase.apache.org

Re: batch update question

2012-09-06 Thread Lin Ma
6, 2012 at 12:59 AM, Doug Meil doug.m...@explorysmedical.comwrote: Hi there, if you look in the source code for HTable there is a list of Put objects. That's the buffer, and it's a client-side buffer. On 9/5/12 12:04 PM, Lin Ma lin...@gmail.com wrote: Thank you Stack for the details

Re: batch update question

2012-09-05 Thread Lin Ma
2, 2012 at 2:13 AM, Lin Ma lin...@gmail.com wrote: Hello guys, I am reading the book HBase, the definitive guide, at the beginning of chapter 3, it is mentioned in order to reduce performance impact for clients to update the same row (lock contention issues for automatic write), batch

Re: batch update question

2012-09-04 Thread Lin Ma
) see https://github.com/sematext/HBaseHUT -- Lin Ma schrieb am So., 2. Sep 2012 11:13 MESZ: Hello guys, I am reading the book HBase, the definitive guide, at the beginning of chapter 3, it is mentioned in order to reduce performance impact for clients

batch update question

2012-09-02 Thread Lin Ma
Hello guys, I am reading the book HBase, the definitive guide, at the beginning of chapter 3, it is mentioned in order to reduce performance impact for clients to update the same row (lock contention issues for automatic write), batch update is preferred. My questions is, for MR job, what are the

confused by two add method of Put class

2012-09-02 Thread Lin Ma
Hello HBase masters, For the two add methods of Put class, http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Put.html#add%28org.apache.hadoop.hbase.KeyValue%29 http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Put.html#add%28byte[],%20byte[],%20long,%20byte[]%29 I

Re: client cache for all region server information?

2012-08-28 Thread Lin Ma
wrong. regards, Lin On Tue, Aug 28, 2012 at 2:41 PM, Harsh J ha...@cloudera.com wrote: Lin, On Tue, Aug 28, 2012 at 9:09 AM, Lin Ma lin...@gmail.com wrote: Thanks Harsh, A two more comments / thoughts, 1. For mapper: mapper normally runs on the same regional server which owns

Re: client cache for all region server information?

2012-08-27 Thread Lin Ma
this value to 500, for example, will transfer 500 rows at a time to the client to be processed.* regards, Lin On Thu, Aug 23, 2012 at 11:37 PM, Harsh J ha...@cloudera.com wrote: Hi Lin, On Thu, Aug 23, 2012 at 7:56 PM, Lin Ma lin...@gmail.com wrote: Harsh, thanks for the detailed information

Re: client cache for all region server information?

2012-08-27 Thread Lin Ma
. Whatever is accumulated as the result of the Scan operation (server-side) is accumulated in sizes of 500 rows and returned in one Scanner.next() call from the client. Does this clear it up Lin? On Mon, Aug 27, 2012 at 8:40 PM, Lin Ma lin...@gmail.com wrote: Hi Harsh, I read through

client cache for all region server information?

2012-08-23 Thread Lin Ma
Hello HBase masters, I am wondering whether in current implementation, each client of HBase cache all information of region server, for example, where is region server (physical hosting machine of region server), and also cache row-key range managed by the region server. If so, two more

Re: client cache for all region server information?

2012-08-23 Thread Lin Ma
caches information as needed for its queries and not necessarily for 'all' region servers. Abhishek i Sent from my iPad with iMstakes On Aug 22, 2012, at 23:31, Lin Ma lin...@gmail.com wrote: Hello HBase masters, I am wondering whether in current implementation, each client of HBase

Re: client cache for all region server information?

2012-08-23 Thread Lin Ma
point me to some more detailed information. regards, Lin On Thu, Aug 23, 2012 at 9:35 PM, Harsh J ha...@cloudera.com wrote: Hi Lin, On Thu, Aug 23, 2012 at 4:31 PM, Lin Ma lin...@gmail.com wrote: Thank you Abhishek, Two more comments, -- Client only caches information as needed

Re: how client location a region/tablet?

2012-08-23 Thread Lin Ma
On Thu, Aug 23, 2012 at 8:21 PM, Doug Meil doug.m...@explorysmedical.comwrote: For further information about the catalog tables and region-regionserver assignment, see thisŠ http://hbase.apache.org/book.html#arch.catalog On 8/19/12 7:36 AM, Lin Ma lin...@gmail.com wrote: Thank you

Re: how client location a region/tablet?

2012-08-23 Thread Lin Ma
which META region server to access. Not sure if I get the points. Please feel free to correct me. regards, Lin On Thu, Aug 23, 2012 at 11:15 PM, Lin Ma lin...@gmail.com wrote: Doug, very informative document. Thanks a lot! I read through it and have some thoughts, - Supposing

Re: how client location a region/tablet?

2012-08-23 Thread Lin Ma
utilizing one room for living before having more children. :-) regards, Lin On Fri, Aug 24, 2012 at 12:46 AM, Harsh J ha...@cloudera.com wrote: Lin, On Thu, Aug 23, 2012 at 10:10 PM, Lin Ma lin...@gmail.com wrote: Thanks, Harsh! - HBase currently keeps a single META region (Doesn't split

Re: how client location a region/tablet?

2012-08-23 Thread Lin Ma
Big Table and Hbase. Thanks, Abhishek -Original Message- From: Lin Ma [mailto:lin...@gmail.com] Sent: Thursday, August 23, 2012 9:41 AM To: user@hbase.apache.org; ha...@cloudera.com Cc: doug.m...@explorysmedical.com Subject: Re: how client location a region/tablet? Thanks, Harsh

Re: Using HBase serving to replace memcached

2012-08-22 Thread Lin Ma
Thanks Zahoor, I read through the document you referred to, I am confused about what means leaf-level index, intermediate-level index and root-level index. It is appreciate if you could give more details what they are, or point me to the related documents. BTW: the document you pointed me is

Re: Using HBase serving to replace memcached

2012-08-22 Thread Lin Ma
will be fetched. But if bloom is not enabled, we might find one block which is having a row range such that 'x' comes in between and Hbase will load that block. So usage of blooms can avoid this IO. Hope this is clear for you now. -Anoop- From: Lin Ma [lin

Re: Using HBase serving to replace memcached

2012-08-21 Thread Lin Ma
use cases. Asif Ali On Mon, Aug 20, 2012 at 9:09 AM, Lin Ma lin...@gmail.com wrote: Thank you Drew. I like your reply, especially blocking cache nature provided by HBase. A quick question, for traditional memcached, all of the items are in memory, no disk is used, correct? regards

Re: Using HBase serving to replace memcached

2012-08-21 Thread Lin Ma
look at Guava cache for similar use cases. Asif Ali On Mon, Aug 20, 2012 at 9:09 AM, Lin Ma lin...@gmail.com wrote: Thank you Drew. I like your reply, especially blocking cache nature provided by HBase. A quick question, for traditional memcached, all of the items

Re: Using HBase serving to replace memcached

2012-08-21 Thread Lin Ma
Thank you Zahoor, Two more comments, 1. After reading the materials you sent to me, I am confused how Bloom Filter could save I/O during random read. Supposing I am not using Bloom Filter, in order to find whether a row (or row-key) exists, we need to scan the index block which is at the end

Re: Using HBase serving to replace memcached

2012-08-21 Thread Lin Ma
Thanks Zahoor, If there is no bloom... you have to load every block and scan to find if the row exists.. I could be wrong. I think HFile index block (which is located at the end of HFile) is a binary search tree containing all row-key values (of the HFile) in the binary search tree. Searching a

Re: Using HBase serving to replace memcached

2012-08-20 Thread Lin Ma
things with memcached just as you would with any other data store. If you're looking for a spiffy memcached replacement I'd recommend checking out Redis. On Sat, Aug 18, 2012 at 3:12 AM, Lin Ma lin...@gmail.com wrote: Hello guys, In your experience, is it practical to use HBase directly

Re: how client location a region/tablet?

2012-08-19 Thread Lin Ma
server mapping data. Why you say not data (do you mean real content in each region)? regards, Lin On Sun, Aug 19, 2012 at 12:40 PM, Stack st...@duboce.net wrote: On Sat, Aug 18, 2012 at 2:13 AM, Lin Ma lin...@gmail.com wrote: Hello guys, I am referencing the Big Table paper about how a client

how client location a region/tablet?

2012-08-18 Thread Lin Ma
Hello guys, I am referencing the Big Table paper about how a client locates a tablet. In section 5.1 Tablet location, it is mentioned that client will cache all tablet locations, I think it means client will cache root tablet in METADATA table, and all other tablets in METADATA table (which means

Re: column based or row based storage for HBase?

2012-08-12 Thread Lin Ma
the following link comparing traditional columnar databases against HBase/BigTable interesting: http://dbmsmusings.blogspot.com/2010/03/distinguishing-two-major-types-of_29.html -Jason On Sun, Aug 5, 2012 at 8:03 PM, Lin Ma lin...@gmail.com wrote: Thank you for the informative reply, Mohit

Re: consistency, availability and partition pattern of HBase

2012-08-09 Thread Lin Ma
:41 PM, Amandeep Khurana ama...@gmail.com wrote: HDFS also chooses to degrade availability in the face of partitions. On Thu, Aug 9, 2012 at 11:08 AM, Lin Ma lin...@gmail.com wrote: Amandeep, thanks for your comments, and I will definitely read the paper you suggested. For Hadoop itself

Re: consistency, availability and partition pattern of HBase

2012-08-08 Thread Lin Ma
...@us.ibm.com; 914-784-6752 From: Lin Ma lin...@gmail.com To: user@hbase.apache.org, Date: 08/07/2012 09:30 PM Subject:consistency, availability and partition pattern of HBase Hello guys, According to the notes by Werner*, *He presented the CAP theorem, which states that of three

Re: consistency, availability and partition pattern of HBase

2012-08-08 Thread Lin Ma
think availability is sacrificed in the sense that if region server fails clients will have data inaccessible for the time region comes up on some other server, not to confuse with data loss. Sent from my iPad On Aug 7, 2012, at 11:56 PM, Lin Ma lin...@gmail.com wrote: Thank you Wei! Two

Re: consistency, availability and partition pattern of HBase

2012-08-08 Thread Lin Ma
at 10:32 PM, Lin Ma lin...@gmail.com wrote: Thank you Lars. Is the same data store duplicated copy across region server? If so, if one primary server for the region dies, client just need to read from the secondary server for the same region. Why there is data is unavailable time

Re: consistency, availability and partition pattern of HBase

2012-08-08 Thread Lin Ma
to reconcile between. When you read, you always get the same version of the row you are reading. In other words, HBase is strongly consistent. Hope that clears things up a bit. On Thu, Aug 9, 2012 at 8:02 AM, Lin Ma lin...@gmail.com wrote: Thank you Lars. Is the same data store duplicated

consistency, availability and partition pattern of HBase

2012-08-07 Thread Lin Ma
Hello guys, According to the notes by Werner*, *He presented the CAP theorem, which states that of three properties of shared-data systems—data consistency, system availability, and tolerance to network partition—only two can be achieved at any given time. =

column based or row based storage for HBase?

2012-08-05 Thread Lin Ma
Hi guys, I am wondering whether HBase is using column based storage or row based storage? - I read some technical documents and mentioned advantages of HBase is using column based storage to store similar data together to foster compression. So it means same columns of different rows

Re: column based or row based storage for HBase?

2012-08-05 Thread Lin Ma
to store sparse, large number of columns (with NULL for free). Any comments? regards, Lin On Mon, Aug 6, 2012 at 12:08 AM, Mohit Anchlia mohitanch...@gmail.comwrote: On Sun, Aug 5, 2012 at 6:04 AM, Lin Ma lin...@gmail.com wrote: Hi guys, I am wondering whether HBase is using column based

Re: column based or row based storage for HBase?

2012-08-05 Thread Lin Ma
://hadoop-hbase.blogspot.com/2011/12/introduction-to-hbase.html -- Lars - Original Message - From: Lin Ma lin...@gmail.com To: user@hbase.apache.org Cc: Sent: Sunday, August 5, 2012 6:04 AM Subject: column based or row based storage for HBase? Hi guys, I am wondering whether HBase