Video: The Underlying Technology of Facebook Messages

2011-01-07 Thread Nicolas Spiegelberg
For those interested, our engineering bloggers just posted the video of our tech talk about using HBase as the datastore behind Facebook messages. Thanks for being a great community! http://www.facebook.com/video/video.php?v=690851516105

Re: Compaction problems

2011-04-27 Thread Nicolas Spiegelberg
Note that we have a compaction recursive enqueue patch on our internal branches, but we wanted to give it some run time before contributing back to make sure it was safe. I'll port that to trunk. On 4/27/11 9:55 AM, "Jean-Daniel Cryans" wrote: >That make sense, would you mind opening a jira? >

HBase Users Group: Aug 22 @ FB

2011-08-08 Thread Nicolas Spiegelberg
http://www.meetup.com/hbaseusergroup/events/28518471/ so we can determine how much space we need. Hope to see you there! Nicolas Spiegelberg

Re: pre splitting tables

2011-10-24 Thread Nicolas Spiegelberg
Isn't a better strategy to create the HBase keys as Key = hash(MySQL_key) + MySQL_key That way you'll know your key distribution and can add new machines seamlessly. I'm assuming that your rows don't overlap between any 2 machines. If so, you could append the MACHINE_ID to the key (not prepend)

Re: pre splitting tables

2011-10-25 Thread Nicolas Spiegelberg
>According to my understanding, the way that HBase works is that on a >brand new system, all keys will start going to a single region i.e. a >single region server. Once that region >reaches a max region size, it will split and then move to another >region server, and so on and so forth. Basically,

Re: region size/count per regionserver

2011-11-01 Thread Nicolas Spiegelberg
Simple answer - 20 regions/server & <2000 regions/cluster is a good rule of thumb if you can't profile your workload yet. You really want to ensure that 1) You need to limits the regions/cluster so the master can have a reasonable startup time & can handle all the region state transit

Re: region size/count per regionserver

2011-11-02 Thread Nicolas Spiegelberg
nage more regions per regionserver? >With 20 regions per server, one would need 300G regions to just utilize >6T of drive space. > > >To utilize a regionserver/datanode with 24T drive space the region size >would be an insane 1T. > >-- Lars > > >

Re: Region Splits

2011-11-20 Thread Nicolas Spiegelberg
Sequential writes are also an argument for pre-splitting and using hash prefixing. In other words, presplit your table into N regions instead of the default of 1 & transform your keys into: new_key = md5(old_key) + old_key Using this method your sequential writes under the old_key are now spread

Re: Region Splits

2011-11-21 Thread Nicolas Spiegelberg
y understanding was flawed. > >In your example I am guessing the addition of old_key suffix is to >prevent against any possible collision. Is that correct? > >On 11/20/11 9:39 PM, Nicolas Spiegelberg wrote: >> Sequential writes are also an argument for pre-splitting and using hash &

Re: Region Splits

2011-11-22 Thread Nicolas Spiegelberg
No. The purpose of major compactions is to merge & dedupe within a region boundary. Compactions will not alter region boundaries, except in the case of splits where a compaction is necessary to filter out any Rows from the parent region that are no longer applicable to the daughter region. On 11

Re: Region Splits

2011-11-22 Thread Nicolas Spiegelberg
e increased the region size, say from current value >of 256 MB to a new value of 2GB? >Will existing regions continue to use only 256 MB space? > >Is there a way to reorganize the regions so that each regions grows to >2GB size? > >Thanks, >Srikanth > >-Original Messa

Re: Region Splits

2011-11-23 Thread Nicolas Spiegelberg
ought of something. >>>>> >>>>> In cases where the id is sequential couldn't one simply reverse the >>>>>id to >>>>> get more of a uniform distribution? >>>>> >>>>> 510911 =>119015 >>>>>

Re: major compaction and regionserver failure...

2011-12-11 Thread Nicolas Spiegelberg
Andy, Some detail on the current compaction algorithm. Major compactions can be triggered in 3 ways: 1) User requested : e.g. From the shell 2) Size based : compact when file <= sum(smaller_files) * 'hbase.hstore.compaction.ratio' This is the normal minor compaction logic. It will upgrade to a

Re: Question about HBase for OLTP

2012-01-09 Thread Nicolas Spiegelberg
1) Eventual Consistency isn't a problem here. HBase is a strict consistency system. Maybe you have us confused with other Dynamo-based Open Source projects? 2) MySQL and other traditional RDBMS systems are definitely a lot more solid, well-tested, and subtlety tuned than HBase. The vast majorit

Re: the occasion of the major compact?

2012-01-26 Thread Nicolas Spiegelberg
Yong, Can you please explain why you want to disable major compactions? What are the problems that you're currently seeing or what are you worried will happen if a major compaction is allowed to occur? Right now, there are only an extremely small subset of cases where you must explicitly disable

Re: the occasion of the major compact?

2012-01-26 Thread Nicolas Spiegelberg
gt;delete the data. After extracting the deleted data, I can issue major >compact by myself. > >Regards > >Yong > >On Thu, Jan 26, 2012 at 8:02 PM, Nicolas Spiegelberg > wrote: >> Yong, >> >> Can you please explain why you want to disable major compactions? Wha

Re: Set like functionality

2012-02-10 Thread Nicolas Spiegelberg
A lot of your design depends on your read/write rate & the amount of duplication in your inserts. For example, if your read rate is really low and your write rate is really high with a low dedupe, you could try: Row = USER_ID Column Qualifier = PRODUCT_ID MAX_VERSIONS = 1 Setting the max version

Re: Scans and Bloom Filter

2012-02-16 Thread Nicolas Spiegelberg
Bryan, Currently, ROW & ROWCOL Bloom Filters are only checked for explicit, single-row 'Get' scans. ROWCOL BFs are only checked when you're querying for explicit column qualifiers (vs getting the entire row). This is because multi-row scans & full-row scans are implicit queries. To clarify: W

Re: HBase and Data Locality

2012-02-21 Thread Nicolas Spiegelberg
>>Its recommended that you run major compactions yourself at down times. > >Can we change the `hbase.hregion.majorcompaction` value from 8640 to >-1 >along with the required code changes and make a note of it in the >hbase-default.xml? Also, the hbase.master.loadbalancer.class is not >specified

[proposal] HUG, March 27 @ SU

2012-03-07 Thread Nicolas Spiegelberg
Looking for +1s on a March 27th HBase Users Group. Just want to make sure there are no huge conflicts before we post the official meetup. StumbleUpon has shiny new office space, so it seems like a great spot to host this meetup. Also, if you would like to present or announce, email me (or Sta

HBase Users Group: March 27 @ StumbleUpon

2012-03-12 Thread Nicolas Spiegelberg
(or Stack) directly. Please RSVP at http://www.meetup.com/hbaseusergroup/events/56021562/ so we can determine how much space we need. Hope to see you there! Nicolas Spiegelberg