Re: Nosqls schema design

2012-11-08 Thread Ian Varley
Hi Nick, The key question to ask about this use case is the access pattern. Do you need real-time access to new information as it is created? (I.e. if someone reads an article, do your queries need to immediately reflect that?) If not, and a batch approach is fine (say, nightly processing) then

Re: Nosqls schema design

2012-11-08 Thread Ian Varley
Nick, Re: how to think about schemas coming from a SQL / Entity-Relationship background, there's a video of a talk I gave at HBaseCon this year on that subject, here: http://www.cloudera.com/content/cloudera/en/resources/library/hbasecon/video-hbasecon-2012-hbasecon-2012.html That's not the on

Re: how to store 100billion short text messages with hbase

2012-12-05 Thread Ian Varley
Tian, The best way to think about how to structure your data in HBase is to ask the question: "How will I access it?". Perhaps you could reply with the sorts of queries you expect to be able to do over this data? For example, retrieve any single conversation between two people in < 10 ms; or sh

Re: 答复: how to store 100billion short text messages with hbase

2012-12-05 Thread Ian Varley
lt;mailto:user-return-32247-guanhua.tian=ia.ac...@hbase.apache.org> [mailto:user-return-32247-guanhua.tian=ia.ac...@hbase.apache.org] 代表 Ian Varley 发送时间: 2012年12月6日 11:01 收件人: user@hbase.apache.org<mailto:user@hbase.apache.org> 主题: Re: how to store 100billion short text messages with hbase Tian, Th

Re: 答复: 答复: how to store 100billion short text messages with hbase

2012-12-06 Thread Ian Varley
cate much less message, I am really confused ,why ONE table is more suitable than multi table, Could you give me some help, Thank you - Tian Guanhua -邮件原件- 发件人: user-return-32251-guanhua.tian=ia.ac...@hbase.apache.org<mailto:user-return-32251-guanhua.tian=ia

Re: PROD/DR - Replication

2012-12-07 Thread Ian Varley
Juan, No; that would mean every single write to HBase has to wait for an ACK from a remote data center, which would decrease your cluster throughput dramatically. If you need that, consider other database solutions. Ian On Dec 7, 2012, at 12:14 PM, Juan P. wrote: I was reading up on HBase Rep

Re: PROD/DR - Replication

2012-12-07 Thread Ian Varley
uch as high-speed counter > aggregation" > > http://hbase.apache.org/book/architecture.html > > > Am I missing something ? > > - Sri > > > >> >> From: Ian Varley >> To: "user@hbase.apache.org" >&g

Re: PROD/DR - Replication

2012-12-07 Thread Ian Varley
Ha - that's what I get for trying to answer list emails from my phone. :) Ian On Dec 7, 2012, at 1:58 PM, sriraam h wrote: Thanks Ian. I DID miss the point. The person who started the chain is a different person :) - Sri From: Ian Varley mailto

Re: write throughput in cassandra, understanding hbase

2013-01-22 Thread Ian Varley
One notable design difference is that in Cassandra, every piece of data is handled by a quorum of independent peers, whereas in HBase, every piece of data is handled by a single process at any given time (the RegionServer). HBase deals with data replication by delegating the actual file storage

Re: How would you model this in Hbase?

2013-02-06 Thread Ian Varley
Alex, This might be an interesting use of the time dimension in HBase. Every value in HBase is uniquely represented by a set of coordinates: - table - row key - column family - column qualifier - timestamp So, you can have two different values that have all the same coordinates, except th

Re: How would you model this in Hbase?

2013-02-07 Thread Ian Varley
Overloading the time stamp aka the versions of the cell is really not a good idea. I agree in general, guys (and noted the dangers in my original post). I'd note, however, that this may be one of the rare cases where this actually *isn't* overloading the timestamp. If you look at the OP's questi

Re: How would you model this in Hbase?

2013-02-07 Thread Ian Varley
e. And when doing research, I wouldn't dare start with versioning unless it is absolutely clear that the original value is wrong, void and worthless. Cheers P.s. pardon for double posting an hour ago. Am 07.02.2013 14:36 schrieb "Ian Varley" mailto:ivar...@salesforce.com>>: Over

Re: HBase Client.

2013-03-20 Thread Ian Varley
Pradeep - One more to add to your list of clients is Phoenix: https://github.com/forcedotcom/phoenix It's a "SQL skin", built on top of the standard Java client with various optimizations; it exposes HBase via a standard JDBC interface, and thus might let you easily plug into other tools for t

Re: HBase Column Family TTL and cell deletions

2013-04-03 Thread Ian Varley
I don't know if it's right (haven't checked source just now) but according to this: http://hbase.apache.org/book/ttl.html Column family TTL is in seconds, not milliseconds. Could that be the problem? (If not, we should fix that page in the ref guide.) On Apr 3, 2013, at 5:19 PM, Ashish Nigam w

Re: Problem in filters

2013-04-17 Thread Ian Varley
Omkar, Have you considered using Phoenix (https://github.com/forcedotcom/phoenix), a SQL skin over HBase to execute your SQL directly? That'll save you from learning all the nuances of HBase filters and give you as good or better performance. Once you've downloaded and installed Phoenix, here'

Re: querying hbase

2013-05-22 Thread Ian Varley
Thanks for those links JM - hadn't seen any of those before. I think it's useful to have stuff like this, for new users to explore using HBase. Re: Phoenix, I don't think it's fundamentally any more involved than any of those, it's just a library. It exposes a JDBC driver interface, so GUI tools

Re: Key Design Question for list data

2012-04-03 Thread Ian Varley
Hi Derek, If I understand you correctly, you're ultimately trying to store triples in the form "user, valueid, value", right? E.g., something like: "user123, firstname, Paul", "user234, lastname, Smith" (But the usernames are fixed width, and the valueids are fixed width). And, your access pat

Re: Key Design Question for list data

2012-04-04 Thread Ian Varley
with our other engineers about reverting to a simpler implementation. --Derek On Tue, Apr 3, 2012 at 12:59 PM, Ian Varley mailto:ivar...@salesforce.com>> wrote: Hi Derek, If I understand you correctly, you're ultimately trying to store triples in the form "user, valueid, valu

Schema Updates: what do you do today?

2012-04-09 Thread Ian Varley
All: I'm doing a little research into various ways to apply schema modifications to an HBase cluster. Anybody care to share with the list what you currently do? E.g. - Connect via the HBase shell and manually issue commands ("create", "disable", "alter", etc.) - Write one-off scripts that do

Re: Schema Updates: what do you do today?

2012-04-09 Thread Ian Varley
butes without offlining, possibly even to add CFs. I wouldn't expect all admin actions could be accomplished without offlining. Best regards, - Andy Problems worthy of attack prove their worth by hitting back. - Piet Hein (via Tom White) - Original Message ----- From: Ian Varley m

Re: Is htable.delete(List) transactional?

2012-04-16 Thread Ian Varley
More complex answer: generally, nothing that involves more than a single row in HBase is transactional. :) It's possible that HBase might get some limited form of multi-row transactions in the future (see HBase-5229 for more on that) but even th

Re: Performance issues of prepending a table

2012-04-18 Thread Ian Varley
I would guess that this approach would be susceptible to the same kind of "hot spotting" as inserting sequential keys; if you're prepending globally (i.e. there's one global "first" row), then all activity will be taking place on the same region server, so you wouldn't be taking advantage of the

Re: More tables, or add a prefix to each row key?

2012-04-19 Thread Ian Varley
Tom, The overall tradeoff with "table vs prefix" is that the former adds some (small) amount of cluster management overhead for each new table, whereas the latter adds runtime overhead (memory, cpu, disk, etc) on every operation. In your case, since you're just talking about ~3 tables vs 1, my

Re: Overwriting qualifiers in an existing table

2012-04-24 Thread Ian Varley
Map/Reduce jobs are generally the best approach for working with every row in a table. You can read all about it here: http://hbase.apache.org/book.html#mapreduce I've never tried doing the specific scenario you're describing, but seems like it should work; you could change the example in 7.2.2

Re: hbase as a primary store, or is it more for "2nd class" data?

2012-05-14 Thread Ian Varley
Ahmed, Generally speaking, the intent of HBase IS to be a first class data store. It's a young data store (not even 1.0) so you have to take that into account; but there's been a lot of engineering put into making it fully safe, and known data safety issues are considered release blockers. (Thi

Re: getting real, does hbase need constant mothering or can a 1-man show use it?

2012-05-19 Thread Ian Varley
All things considered, I tend to see HBase as being a little more on the "industrial strength" side of things. It's designed to handle really large data volumes and run on tens or hundreds of machines, and very low-level control is given to the operator so that its usage can be tuned meticulousl

Re: key design

2012-05-21 Thread Ian Varley
Mete, Why separate tables per log type? Why not a single table with the key: That's roughly the approach used by OpenTSDB (with "metric id" instead of "log type", but same idea). OpenTSDB goes further by "bucketing" values into rows using a base timestamp in the row key and offset timestamps

Re: HBase logging only prints ulimit information

2012-05-24 Thread Ian Varley
How did you end up figuring that out, Kevin? Was there a more ominous message in the logs about this? Should have logged something like: "WARNING: Server foo has been rejected; Reported time is too far out of sync with master" FWIW, HBASE-5770 (J

Re: Efficient way to read a large number of files in S3 and upload their content to HBase

2012-05-24 Thread Ian Varley
This is a question I see coming up a lot. Put differently: what characteristics make it useful to use HBase on top of HDFS, as opposed to just flat files in HDFS directly? "Quantity" isn't really an answer, b/c HDFS does fine with quantity (better, even). The basic answers are that HBase is go

Re: Of hbase key distribution and query scalability, again.

2012-05-25 Thread Ian Varley
Dmitriy, If I understand you right, what you're asking about might be called "Read Hotspotting". For an obvious example, if I distribute my data nicely over the cluster but then say: for (int x = 0; x < 100; x++) { htable.get(new Get(Bytes.toBytes("row1"))); } Then naturally I'm onl

Re: Of hbase key distribution and query scalability, again.

2012-05-25 Thread Ian Varley
n HBase like that right now otherwise i > would've seen it in the book i suppose... > > Thanks. > -d > > On Fri, May 25, 2012 at 10:42 AM, Ian Varley wrote: >> Dmitriy, >> >> If I understand you right, what you're asking about might be called

Re: Of hbase key distribution and query scalability, again.

2012-05-26 Thread Ian Varley
istributed, you would hash > the key (md5 or SHA-1 as examples). > This works well if you're doing get()s and not a lot of scan()s. > > But on reads, how do you get 'hot spotting' ? > > Should those rows be cached in memory? > > So what am I missing? Bes

Re: HBase (BigTable) many to many with students and courses

2012-05-29 Thread Ian Varley
Em, What you're describing is a classic relational database nested loop or hash join; the only difference is that relational databases have this feature built in, and can do it very efficiently because they typically run on a single machine, not a distributed cluster. By moving to HBase, you're

Re: HBase (BigTable) many to many with students and courses

2012-05-29 Thread Ian Varley
A few more responses: On May 29, 2012, at 10:54 AM, Em wrote: > In fact, everything you model with a Key-Value-storage like HBase, > Cassandra etc. can be modeled as an RDMBS-scheme. > Since a lot of people, like me, are coming from that edge, we must > re-learn several basic things. > It starts

Re: HBase (BigTable) many to many with students and courses

2012-05-29 Thread Ian Varley
On May 29, 2012, at 1:24 PM, Em wrote: >> But you're trading time & space at write time for extremely fast >> speeds at write time. > You ment "extremely fast speeds at read time", don't you? Ha, yes, thanks. That's what I meant. > However this means that Sheldon has to do at least two requests

Re: HBase (BigTable) many to many with students and courses

2012-05-29 Thread Ian Varley
On May 29, 2012, at 3:25 PM, Em wrote: Yup, unless you denormalize the tweet bodies as well--then you just read the current user's record and you have everything you need (with the downside of massive data duplication). Well, I think this would be bad practice for editable stuff like tweets. Th

Re: Collation order of items

2012-06-08 Thread Ian Varley
Tom, another approach you could take would be to store an ASCII encoded version of the string as the row key or column qualifier, and then the full UTF-8 string elsewhere (e.g. in the cell value, or even later in the row key). That wouldn't work out the fine sorting (whether "è" sorts before or

Re: Addition to Apache HBase Reference Guide

2012-06-21 Thread Ian Varley
Mohammad, Absolutely - that's exactly how open source projects grow! :) You can open up a JIRA for your suggested edits, and then either post your edits directly in the JIRA ticket, or, ideally, make the docs change yourself and post a patch (the reference guide is also part of the HBase source

Re: Embedded table data model

2012-07-11 Thread Ian Varley
Hi Xiaobo - For HBase, this is doable; you could have a single table in HBase where each row is a customer (with the customerid as the rowkey), and columns for each of the 300 attributes that are directly part of the customer entity. This is sparse, so you'd only take up space for the attribute

Re: Embedded table data model

2012-07-11 Thread Ian Varley
s, at the end, etc.) Ian On Jul 11, 2012, at 11:27 PM, Xiaobo Gu wrote: but they are other writers insert new transactions into the table when customers do new transactions. On Thu, Jul 12, 2012 at 1:13 PM, Ian Varley mailto:ivar...@salesforce.com>> wrote: Hi Xiaobo - For HBase, this is

Re: Embedded table data model

2012-07-12 Thread Ian Varley
Jul 12, 2012, at 7:27 PM, "Cole" wrote: > I think this design has some question, please refer > http://hbase.apache.org/book/number.of.cfs.html > > 2012/7/12 Ian Varley > >> Yes, that's fine; you can always do a single column PUT into an existing >> row,

Re: Embedded table data model

2012-07-12 Thread Ian Varley
ach transaction will be created as a column inside the cf for transactions, and these columns are created dynamically as transactions occur? Regards, Xiaobo Gu On Fri, Jul 13, 2012 at 11:08 AM, Ian Varley mailto:ivar...@salesforce.com>> wrote: Column families are not the same thing as col

Re: Schema Design - Move second column family to new table

2012-08-20 Thread Ian Varley
Christian, Column families are really more "within" rows, not the other way around (they're really just a way to physically partition sets of columns in a table). In your example, then, it's more correct to say that table1 has millions / billions of rows, but only hundreds of them have any colu

Re: Choose the location of a record

2012-08-23 Thread Ian Varley
Blaise, Generally speaking, no. The distribution of row keys over regions is handled by HBase. This is as you would want, so that the failure of any given server is transparent to your application. There are ways to hack around this, but generally you shouldn't design in such a way as to requ

Re: md5 hash key and splits

2012-08-30 Thread Ian Varley
The Facebook devs have mentioned in public talks that they pre-split their tables and don't use automated region splitting. But as far as I remember, the reason for that isn't predictability of spreading load, so much as predictability of uptime & latency (they don't want an automated split to

Re: crafting your key - scan vs. get

2012-10-18 Thread Ian Varley
Hi Neil, Mike summed it up well, as usual. :) Your choices of where to describe this "dimension" of your data (a one-to-many between users and events) are: - one row per event - one row per user, with events as columns - one row per user, with events as versions on a single cell The first tw

Re: Regarding Indexing columns in HBASE

2013-06-04 Thread Ian Varley
Rams - you might enjoy this blog post from HBase committer Jesse Yates (from last summer): http://jyates.github.io/2012/07/09/consistent-enough-secondary-indexes.html Secondary Indexing doesn't exist in HBase core today, but there are various proposals and early implementations of it in flight.

Re: When to expand vertically vs. horizontally in Hbase

2013-07-05 Thread Ian Varley
Mike and I get into good discussions about ERD modeling and HBase a lot ... :) Mike's right that you should avoid a design that relies heavily on relationships when modeling data in HBase, because relationships are tricky (they're the first thing that gets throw out the window in a database that

Re: When to expand vertically vs. horizontally in Hbase

2013-07-05 Thread Ian Varley
t doing it right may mean that you are not going to get the most out of your system. On Jul 5, 2013, at 1:26 PM, Ian Varley mailto:ivar...@salesforce.com>> wrote: But, something just occurred to me: just because your physical implementation (HBase) doesn't support normalized entities

Re: When to expand vertically vs. horizontally in Hbase

2013-07-05 Thread Ian Varley
on't > be a hard or strong relationship between the customer table and the order > table. > > When you go to your ERD tool, you wouldn't show a strong coupling of the > data. > > Does that make sense? > > On Jul 5, 2013, at 1:56 PM, Ian Varley wrote:

Re: org.apache.hadoop.hbase.ZooKeeperConnectionException: java.io.IOException: Too many open files

2011-08-09 Thread Ian Varley
Hi Shuja, This question is mentioned in the HBase FAQ, here: http://wiki.apache.org/hadoop/Hbase/FAQ_Operations#A3 which points to the HBase book: http://hbase.apache.org/book.html#ulimit "HBase is a database. It uses a lot of files all at the same time. The default ulimit -n -- i.e. user fil

TTL for cell values

2011-08-13 Thread Ian Varley
Hi all, Quick clarification on TTL for cells. The concept makes sense (instead of "keep 3 versions" you say "keep versions more recent than time T"). But, if there's only 1 value in the cell, and that value is older than the TTL, will it also be deleted? If so, has there ever been discussion o

Re: TTL for cell values

2011-08-13 Thread Ian Varley
ra/browse/HBASE-4071. -- Lars ________ From: Ian Varley mailto:ivar...@salesforce.com>> To: "user@hbase.apache.org<mailto:user@hbase.apache.org>" mailto:user@hbase.apache.org>> Sent: Saturday, August 13, 2011 6:51 PM Subject: TTL for cell values Hi all

Re: TTL for cell values

2011-08-14 Thread Ian Varley
Best regards, > > > - Andy > > > Problems worthy of attack prove their worth by hitting back. - Piet Hein (via > Tom White) > > >> >> From: Ian Varley >> To: "user@hbase.apache.org" >> Sent: Saturday,

Re: TTL for cell values

2011-08-14 Thread Ian Varley
"I am slightly confused now. Time to live is used in networking , after n hops drop this packet. Also used I'm memcache , expire this data n seconds after insert. I do not know of any specific ttl features in rdbms so I do not understand why someone would expect ttl to he permanently durable." E

Re: schema help

2011-08-25 Thread Ian Varley
ould hbase know to stop scanning the entire table? How would a query actually look like, if my key was [fieldA time]? As a matter of fact, I can do 100% of my queries. I will leave the 5% out of my project/schema. On Thu, Aug 25, 2011 at 10:13 AM, Ian Varley mailto:ivar...@salesforce.com>

Re: Scan not working properly on composite keys

2011-08-29 Thread Ian Varley
Stuti, The rows are physically sorted on disk (and in memory) according to the row keys you define, and that's the only way that HBase can access them (unlike a relational database, it doesn't have built in indexes that allow you to access rows by something other than their physical sort order)

Re: How to debug and run hadoop/HBase source code in eclipse

2011-09-01 Thread Ian Varley
Vamshi, As with most online help forums, people can only be of help if you pass along the specific difficulty you're having. Typically, this would take a form like, "I tried doing X, but instead of seeing Y, I saw Z instead". The more specific you can be with what you already tried and the erro

Re: HBase Vs CitrusLeaf?

2011-09-07 Thread Ian Varley
Well said, Stack. :) Maybe HBase needs more celebrity endorsements? ;) Another important point you should mention to your manager is that (as far as I can see) CitrusLeaf is a closed-source, proprietary product. While there's no harm in this, it does introduce a dependency on Citrusleaf to fix i

Re: Should I use HBASE?

2011-09-14 Thread Ian Varley
That's an important point to make, Michael. Jumping to HBase (or any NoSQL store) from an RDBMS has pros and cons; the pros are generally that you can scale linearly on cheap(er) hardware as your data and usage grows, but the cons are that many things you take for granted in an RDBMS (like trans

Re: Should I use HBASE?

2011-09-14 Thread Ian Varley
Point well taken, Mike. :) It's a bad idea to assume we know the original poster's requirements well enough to suggest a direction, based on such a brief sketch. Original poster, let me be clear: a data set of your size may (or may not) be a good fit for doing in HBase; relational databases re

Re: HBase Stack

2011-11-14 Thread Ian Varley
Em, To add to what Joey said, consider that there are very significant trade-offs you make when building something on HBase (or any of the new generation of non-relational databases). For starters, you don't get: - A declarative query language like SQL that can build optimal physical access p

Re: Multiple tables vs big fat table

2011-11-21 Thread Ian Varley
One clarification; Michael, when you say: "If I do a scan(), I'm actually going to go through all of the rows in the table." That's if you're doing a *full* table scan, which you'd have to do if you wanted selectivity based on some attribute that isn't part of the key. This is to be avoided in

Re: Multiple tables vs big fat table

2011-11-21 Thread Ian Varley
Certainly, and that's all valid. I just wanted to make it clear to Mark (and others reading) that scans aren't inherently "bad" in HBase, and they don't need to scan the entire table (and usually shouldn't). Short, local scans are very efficient, provided your row keys are sorted in a way that's

Re: HBase and Consistency in CAP

2011-12-02 Thread Ian Varley
Mohit, Yeah, those are great places to go and learn. To fill in a bit more on this topic: "partition-tolerance" usually refers to the idea that you could have a complete disconnection between N sets of machines in your data center, but still be taking writes and serving reads from all the serv

Re: HBase and Consistency in CAP

2011-12-02 Thread Ian Varley
Dec 2, 2011 at 12:15 PM, Ian Varley mailto:ivar...@salesforce.com>> wrote: Mohit, Yeah, those are great places to go and learn. To fill in a bit more on this topic: "partition-tolerance" usually refers to the idea that you could have a complete disconnection between N sets of ma

Re: HBase for ad-hoc aggregate queries

2012-01-11 Thread Ian Varley
And in case no one else says it ... I'm taking a look at moving our datastore from Oracle to HBase This is a questionable project in the general case. HBase is not a relational store and lacks indexes, transactions, isolation, easy ad-hoc querying, and nearly everything else you get from Oracle

Re: does increasing region filesize followed by major compactions supposed to reduce number of regions?

2012-01-12 Thread Ian Varley
Vinod, The answers to your questions (and so many more!) are easily found in the HBase Reference Guide: http://hbase.apache.org/book.html#schema.versions "Excess versions are removed during major compactions." - Ian "Doug Meil" Varley ;) On Jan 12, 2012, at 10:59 AM, T Vinod Gupta wrote: Th

Re: How to Rank in HBase?

2012-01-29 Thread Ian Varley
Bing, HBase uses an approach to structuring its storage known as "Log Structured Merge Trees", which you can learn more about here: http://scholar.google.com/scholar?q=log+structured+merge+tree&hl=en&as_sdt=0&as_vis=1&oi=scholart As well as in Lars George's great book, here: http://shop.oreill

Re: hbasecon date at the website

2012-02-08 Thread Ian Varley
Submission deadline is 2/20. Conference is 5/22. Ian On Feb 8, 2012, at 8:03 PM, Dani Rayan wrote: Hi, Could someone correct the date at http://www.hbasecon.com/ ? Some of us are considering to reserve flight tickets :) Stack sent a mail with Feb 20th as the date, but that site says May 22nd.

Re: Book/RefGuide updated

2012-02-10 Thread Ian Varley
Thanks for doing such a great, consistent job providing explanations and documentation about HBase, Doug! On Feb 10, 2012, at 2:09 PM, Doug Meil wrote: Hi folks- The HBase Book/RefGuide has been updated http://hbase.apache.org/book.html In particular, there is now a description of the compact

Re: multiple partial scans in the row

2012-02-14 Thread Ian Varley
James, Are your orderIds ordered? You say "a range of orderIds", which implies that (i.e. they're sequential numbers like 001, 002, etc, not hashes or random values). If so, then a single scan can hit the rows for multiple contiguous orderIds (you'd set the start and stop rows based on a prefix

Re: Flushing to HDFS sooner

2012-02-19 Thread Ian Varley
Manuel, do you have the WAL disabled? If not, theoretically, what "should" have happened here was that the WAL would have been synced to disk when the row was written (flush or no flush), and on restart the system should have replayed that WAL to rebuild the in-memory state of the regions that w

Re: Corresponding table in Hbase

2012-02-22 Thread Ian Varley
Adarsh, HBase doesn't have the concept of a globally unique auto-incrementing "ID" column; that would require that all PUTs to any region of a table first go through some central ID authority to get a unique ID, and that sort of goes against the general HBase approach (in which operations on re

Re: Solr & HBase - Re: How is Data Indexed in HBase?

2012-02-22 Thread Ian Varley
One minor clarification: HBase is primarily built for retrieving a single row at a time based on a predetermined and known location (the key). Substitute that with: "HBase is primarily built for retrieving sets of contiguous sorted rows based on a predetermined and known location (the start key

Seeking HBase schema design examples

2012-02-22 Thread Ian Varley
All: I’m doing a study on HBase schema design, with a goal of contributing back a presentation or summary about how data modeling is practically done in HBase. I'd like to base it as much as possible on real world examples (i.e. things that are running in production today). I’ve got several exa

Re: Seeking HBase schema design examples

2012-02-23 Thread Ian Varley
n Wed, Feb 22, 2012 at 10:40 PM, Ian Varley wrote: All: I’m doing a study on HBase schema design, with a goal of contributing back a presentation or summary about how data modeling is practically done in HBase. I'd like to base it as much as possible on real world examples (i.e. things that a

Re: Scanning the last N rows

2012-03-02 Thread Ian Varley
Yes, you do have to worry about efficiency. If your rows aren't ordered in the table (by rowkey) according to the update date, the server will be having to scan the entire table. Your filter will enable it to not send all of those results to the client, but it's still having to read them from di

Re: HBase & BigTable + History: Can it run decently on a 512MB machine? What's the difference between the two?

2012-03-05 Thread Ian Varley
DS, HBase is an open source project, so you can read the source code and make that determination for yourself. It was first created based on the same ideas in the Bigtable paper (published by Google) but is only related based on the design goals and philosophy, not the actual implementation. B