Re: Schema Design Newbie Question

2013-12-23 Thread Kamal Bahadur
de (Category + Timestamp). > > -- Lars > > > > > From: Kamal Bahadur > To: user ; Dhaval Shah > > Sent: Monday, December 23, 2013 3:47 PM > Subject: Re: Schema Design Newbie Question > > > Hi Dhaval, > > Thanks for

Re: Schema Design Newbie Question

2013-12-23 Thread lars hofhansl
perform). Option #1 should be better. HBase will be smart just scanning the HFile necessary for the key range you provide (Category + Timestamp). -- Lars From: Kamal Bahadur To: user ; Dhaval Shah Sent: Monday, December 23, 2013 3:47 PM Subject: Re: Schema

Re: Schema Design Newbie Question

2013-12-23 Thread Kamal Bahadur
Hi Dhaval, Thanks for the quick response! Why do you think having more files is not a good idea? Is it because of OS restrictions? I get around 50 million records a day and each record contains ~25 columns. Values for each column are ~30 characters. Kamal On Mon, Dec 23, 2013 at 3:35 PM, Dha

Re: Schema Design Newbie Question

2013-12-23 Thread Dhaval Shah
A 1000 CFs with HBase does not sound like a good idea.  category + timestamp sounds like the better of the 2 options you have thought of.  Can you tell us a little more about your data?    Regards, Dhaval From: Kamal Bahadur To: user@hbase.apache.org Sent:

Re: Schema design for filters

2013-06-29 Thread Kristoffer Sjögren
In terms of scalability, yes, but we use HBase for other stuff aswell, timeseries, counters and few future ideas around analytics. So its nice if we can put everything in same deployment. We dont want users to care about the physical storage (keep them productive in Java land). The point here of b

Re: Schema design for filters

2013-06-28 Thread Michel Segel
This doesn't make sense in that the OP wants schema less structure, yet wants filtering on columns. The issue is that you do have a limited Schema, so Schema less is a misnomer. In order to do filtering, you need to enforce object type within a column which requires a Schema to be enforced. A

Re: Schema design for filters

2013-06-28 Thread Asaf Mesika
Yep. Other DBs like Mongo may have the stuff you need out of the box. Another option is to encode the whole class using Avro, and writing a filter on top of that. You basically use one column and store it there. Yes, you pay the penalty of loading your entire class and extract the fields you need t

Re: Schema design for filters

2013-06-28 Thread Otis Gospodnetic
Hi, I see. Btw. isn't HBase for < 1M rows an overkill? Note that Lucene is schemaless and both Solr and Elasticsearch can detect field types, so in a way they are schemaless, too. Otis -- Performance Monitoring -- http://sematext.com/spm On Fri, Jun 28, 2013 at 2:53 PM, Kristoffer Sjögren wr

Re: Schema design for filters

2013-06-28 Thread Kristoffer Sjögren
@Otis HBase is a natural fit for my usecase because its schemaless. Im building a configuration management system and there is no need for advanced filtering/querying capabilities, just basic predicate logic and pagination that scales to < 1 million rows with reasonable performance. Thanks for th

Re: Schema design for filters

2013-06-28 Thread Otis Gospodnetic
Kristoffer, You could also consider using something other than HBase, something that supports "secondary indices", like anything that is Lucene based - Solr and ElasticSearch for example. We recently compared how we aggregate data in HBase (see my signature) and how we would do it if we were to u

Re: Schema design for filters

2013-06-28 Thread Michael Segel
Why is it that if all you have is a hammer, everything looks like a nail? ;-) On Jun 27, 2013, at 8:55 PM, James Taylor wrote: > Hi Kristoffer, > Have you had a look at Phoenix (https://github.com/forcedotcom/phoenix)? You > could model your schema much like an O/R mapper and issue SQL querie

Re: Schema design for filters

2013-06-28 Thread Kristoffer Sjögren
Interesting. Im actually building something similar. A fullblown SQL implementation is bit overkill for my particular usecase and the query API is the final piece to the puzzle. But ill definitely have a look for some inspiration. Thanks! On Fri, Jun 28, 2013 at 3:55 AM, James Taylor wrote: >

Re: Schema design for filters

2013-06-27 Thread James Taylor
Hi Kristoffer, Have you had a look at Phoenix (https://github.com/forcedotcom/phoenix)? You could model your schema much like an O/R mapper and issue SQL queries through Phoenix for your filtering. James @JamesPlusPlus http://phoenix-hbase.blogspot.com On Jun 27, 2013, at 4:39 PM, "Kristoffer S

Re: Schema design for filters

2013-06-27 Thread Kristoffer Sjögren
Thanks for your help Mike. Much appreciated. I dont store rows/columns in JSON format. The schema is exactly that of a specific java class, where the rowkey is a unique object identifier with the class type encoded into it. Columns are the field names of the class and the values are that of the ob

Re: Schema design for filters

2013-06-27 Thread Michael Segel
Ok... If you want to do type checking and schema enforcement... You will need to do this as a coprocessor. The quick and dirty way... (Not recommended) would be to hard code the schema in to the co-processor code.) A better way... at start up, load up ZK to manage the set of known table s

Re: Schema design for filters

2013-06-27 Thread Kristoffer Sjögren
I see your point. Everything is just bytes. However, the schema is known and every row is formatted according to this schema, although some columns may not exist, that is, no value exist for this property on this row. So if im able to apply these "typed comparators" to the right cell values it ma

Re: Schema design for filters

2013-06-27 Thread Michael Segel
You have to remember that HBase doesn't enforce any sort of typing. That's why this can be difficult. You'd have to write a coprocessor to enforce a schema on a table. Even then YMMV if you're writing JSON structures to a column because while the contents of the structures could be the same, t

Re: Schema design for filters

2013-06-27 Thread Kristoffer Sjögren
I realize standard comparators cannot solve this. However I do know the type of each column so writing custom list comparators for boolean, char, byte, short, int, long, float, double seems quite straightforward. Long arrays, for example, are stored as a byte array with 8 bytes per item so a comp

Re: Schema design for filters

2013-06-27 Thread Michael Segel
Not an easy task. You first need to determine how you want to store the data within a column and/or apply a type constraint to a column. Even if you use JSON records to store your data within a column, does an equality comparator exist? If not, you would have to write one. (I kinda think tha

Re: Schema Design Question

2013-05-02 Thread Cameron Gandevia
HDFS > is far better. > > -- Lars > > > > > From: Michel Segel > To: "user@hbase.apache.org" > Cc: "user@hbase.apache.org" > Sent: Monday, April 29, 2013 6:52 AM > Subject: Re: Schema Design Question > > > I would have to

Re: Schema Design Question

2013-04-30 Thread lars hofhansl
. -- Lars From: Michel Segel To: "user@hbase.apache.org" Cc: "user@hbase.apache.org" Sent: Monday, April 29, 2013 6:52 AM Subject: Re: Schema Design Question I would have to agree. The use case doesn't make much sense for HB

Re: Schema Design Question

2013-04-29 Thread Michel Segel
I would have to agree. The use case doesn't make much sense for HBase and sounds a bit more like a problem for Hive. The OP indicated that the data was disposable after a round of processing. IMHO Hive is a better fit. Sent from a remote device. Please excuse any typos... Mike Segel On Apr

Re: Schema Design Question

2013-04-28 Thread Asaf Mesika
I actually don't see the benefit of saving the data into HBase if all you do is read per job id and purges it. Why not accumulate into HDFS per job id and then dump the file? The way I see it, HBase is good for querying parts of your data, even if it is only 10 rows. In your case your average is 1

Re: schema design: rows vs wide columns

2013-04-28 Thread Adrien Mogenet
gt; I think the main problem is that all CFs have to be flushed if one > >> gets > >>>>> large enough to require a flush. > >>>>> (Does anyone remember why exactly that is? And do we still need that > >> now > >>>>> that

Re: Schema Design Question

2013-04-26 Thread Enis Söztutar
Hi, Interesting use case. I think it depends on job many jobId's you expect to have. If it is on the order of thousands, I would caution against going the one table per jobid approach, since for every table, there is some master overhead, as well as file structures in hdfs. If jobId's are managabl

Re: Schema Design Question

2013-04-26 Thread Ted Yu
My understanding of your use case is that data for different jobIds would be continuously loaded into the underlying table(s). Looks like you can have one table per job. This way you drop the table after map reduce is complete. In the single table approach, you would delete many rows in the table

Re: schema design: rows vs wide columns

2013-04-16 Thread Michael Segel
>>>> >>>>> I think the main problem is that all CFs have to be flushed if one >> gets >>>>> large enough to require a flush. >>>>> (Does anyone remember why exactly that is? And do we still need that >> now >>>>>

Re: schema design: rows vs wide columns

2013-04-16 Thread Ted Yu
t the memstoreTS is stored in the HFiles?) > > > > > > > > > > > >So things are fine as long as all CFs have roughly the same size. But > if > > > >you have one that gets a lot of data and many others that are smaller, > > > >we'd end up with a lot of unnecessary and sm

Re: schema design: rows vs wide columns

2013-04-16 Thread Jean-Marc Spaggiari
any others that are smaller, > > >we'd end up with a lot of unnecessary and small store files from the > > >smaller CFs. > > > > > >Anything else known that is bad about many column families? > > > > > > > > >-- Lars > > &

Re: schema design: rows vs wide columns

2013-04-16 Thread Ted Yu
up with a lot of unnecessary and small store files from the > >smaller CFs. > > > >Anything else known that is bad about many column families? > > > > > >-- Lars > > > > > > > > > > From: Andrew Purtell > >To: "user@hbase.apac

Re: schema design: rows vs wide columns

2013-04-08 Thread Doug Meil
own that is bad about many column families? > > >-- Lars > > > > > From: Andrew Purtell >To: "user@hbase.apache.org" >Sent: Sunday, April 7, 2013 3:52 PM >Subject: Re: schema design: rows vs wide columns > >Is there a pointer to evidence/ex

Re: schema design: rows vs wide columns

2013-04-08 Thread Michael Segel
StAck, Just because FB does something doesn't mean its necessarily a good idea for others to do the same. FB designs specifically for their needs and their use cases may not match those of others. To your point though, I agree that Ted's number of 3 is more of a rule of thumb and not a hard

Re: schema design: rows vs wide columns

2013-04-07 Thread ramkrishna vasudevan
e files from the smaller CFs. > > Anything else known that is bad about many column families? > > > -- Lars > > > > > From: Andrew Purtell > To: "user@hbase.apache.org" > Sent: Sunday, April 7, 2013 3:52 PM > Subje

Re: schema design: rows vs wide columns

2013-04-07 Thread lars hofhansl
"user@hbase.apache.org" Sent: Sunday, April 7, 2013 3:52 PM Subject: Re: schema design: rows vs wide columns Is there a pointer to evidence/experiment backed analysis of this question? I'm sure there is some basis for this text in the book but I recommend we strike it. We could rep

Re: schema design: rows vs wide columns

2013-04-07 Thread ramkrishna vasudevan
I agree with Andrew here and also Stack's comment on FB usage with 15 CFs is interesting. Whenever people read that line from the doc, people used to ask why is it so and also i was thinking that one restriction of having max 3 CFs was one factor which sometimes made schema design a bit challengin

Re: schema design: rows vs wide columns

2013-04-07 Thread Viral Bajaria
I think this whole idea of don't go over a certain number of column families was a 2+ year old story. I remember hearing numbers like 5 or 6 (not 3) come up when talking at Hadoop conferences with engineers who were at companies that were heavy HBase users. I agree with Andrew's suggestion that we

Re: schema design: rows vs wide columns

2013-04-07 Thread Andrew Purtell
Is there a pointer to evidence/experiment backed analysis of this question? I'm sure there is some basis for this text in the book but I recommend we strike it. We could replace it with YCSB or LoadTestTool driven latency graphs for different workloads maybe. Although that would also be a big simpl

Re: schema design: rows vs wide columns

2013-04-07 Thread Stack
On Sun, Apr 7, 2013 at 3:27 PM, Ted Yu wrote: > From http://hbase.apache.org/book.html#number.of.cfs : > > HBase currently does not do well with anything above two or three column > families so keep the number of column families in your schema low. > We should add more to that section. FB run w

Re: schema design: rows vs wide columns

2013-04-07 Thread Ted Yu
>From http://hbase.apache.org/book.html#number.of.cfs : HBase currently does not do well with anything above two or three column families so keep the number of column families in your schema low. Cheers On Sun, Apr 7, 2013 at 3:04 PM, Stack wrote: > On Sun, Apr 7, 2013 at 11:58 AM, Ted wrote:

Re: schema design: rows vs wide columns

2013-04-07 Thread Stack
On Sun, Apr 7, 2013 at 11:58 AM, Ted wrote: > With regard to number of column families, 3 is the recommended maximum. > How did you come up w/ the number '3'? Is it a 'hard' 3? Or does it depend? If the latter, on what does it depend? Thanks, St.Ack

Re: schema design: rows vs wide columns

2013-04-07 Thread Ted
If you store service Id by month, how do you deal with time range in query that spans partial month(s) ? With regard to number of column families, 3 is the recommended maximum. Cheers On Apr 7, 2013, at 1:03 AM, shawn du wrote: > Hello, > > I am newer for hbase, but i have some experience o

Re: Schema Design - Move second column family to new table

2012-08-22 Thread Christian Schäfer
stian - Ursprüngliche Message - Von: Christian Schäfer An: "user@hbase.apache.org" CC: Gesendet: 22:54 Montag, 20.August 2012 Betreff: RE: Schema Design - Move second column family to new table  Thanks Pranav for the Schema Design resource...will check this soon. & Thanks Ia

RE: Schema Design - Move second column family to new table

2012-08-20 Thread Christian Schäfer
An: "user@hbase.apache.org" CC: Christian Schäfer Gesendet: 16:37 Montag, 20.August 2012 Betreff: Re: Schema Design - Move second column family to new table Christian, Column families are really more "within" rows, not the other way around (they're really just a way to physi

Re: Schema Design - Move second column family to new table

2012-08-20 Thread Ian Varley
Christian, Column families are really more "within" rows, not the other way around (they're really just a way to physically partition sets of columns in a table). In your example, then, it's more correct to say that table1 has millions / billions of rows, but only hundreds of them have any colu

Re: Schema Design - Move second column family to new table

2012-08-20 Thread Pranav Modi
This might be useful - http://java.dzone.com/videos/hbase-schema-design-things-you On Mon, Aug 20, 2012 at 5:17 PM, Christian Schäfer wrote: > Currently I'm about to design HBase tables. > > In my case there is table1 with CF1 holding millions/billions of rows and > CF2 with hundreds of rows. > R

Re: Schema design question - Hot Key concerns

2011-11-20 Thread Michel Segel
Hi, OK... First a caveat... I haven't seen your initial normalized schema, so take what I say with a grain of salt... The problem you are trying to solve is one which can be solved better on an RDBMS platform and does not fit well in a NoSQL space. Your scalability issue would probably be bet

Re: Schema design question - Hot Key concerns

2011-11-18 Thread Suraj Varma
're probably going to want to split your data in to two different > tables and then write some ACID compliance at your APP level. > > Just a quick thought before I pop out for lunch... > > >> Date: Fri, 18 Nov 2011 10:02:54 -0800 >> Subject: Re: Schema design quest

RE: Schema design question - Hot Key concerns

2011-11-18 Thread Michael Segel
write some ACID compliance at your APP level. Just a quick thought before I pop out for lunch... > Date: Fri, 18 Nov 2011 10:02:54 -0800 > Subject: Re: Schema design question - Hot Key concerns > From: selek...@yahoo.com > To: user@hbase.apache.org > > One of the concerns I

Re: Schema design question - Hot Key concerns

2011-11-18 Thread Sam Seigal
One of the concerns I see with this schema is if one of the shows becomes hot. Since you are maintaining your bookings at the column level, a hot "row" cannot be partitioned across regions. Hbase is atomic at the row level. Therefore, different clients updating to the same SHOW_ID will compete with

Re: Schema design question

2011-04-18 Thread Ted Dunning
I think that your mileage will definitely vary on this point. Your design may work very well. Or not. I would worry just a bit if your data points are large enough to create a really massive row (greater than about a megabyte). On Sun, Apr 17, 2011 at 11:48 PM, Yves Langisch wrote: > So I wond

Re: Schema design question

2011-04-17 Thread Yves Langisch
Yes, you're right. They have a row for each 10 minute period. Inside a row they work with offsets in seconds within this 10 minute period. This leads to a maximum of 10*60 columns per row. Normally you have less columns as you don't have a datapoint for each second. So I wonder if the query per

Re: Schema design question

2011-04-16 Thread Ted Dunning
TsDB has more columns than it appears at first glance. They store all of the observations for a relatively long time interval in a single row. You may have spotted that right off (I didn't). On Sat, Apr 16, 2011 at 1:27 AM, Yves Langisch wrote: > As I'm about to plan a similar app I have studi

Re: Schema Design

2011-01-02 Thread Sean Bigdatafun
Not sure if the secondary index helps his use case or not. Anyone has experience on that? On Sun, Jan 2, 2011 at 12:34 AM, Hari Sreekumar wrote: > Ultimately it depends on how you will be accessing your data. If you need > to > query on the contract time frequently, then this approach wouldn't

Re: Schema Design

2011-01-02 Thread Hari Sreekumar
Ultimately it depends on how you will be accessing your data. If you need to query on the contract time frequently, then this approach wouldn't be great. You have to identify the frequent queries and design schema according to that. What are your frequent queries like? Hari On Sun, Jan 2, 2011 at

Re: Schema Design

2011-01-01 Thread Sean Bigdatafun
I think so. Unless you have some way to index the contract time (in HBase, the only way doing so is to encode that information into your row-key), you have to MapReduce to examine item by item. On Tue, Dec 28, 2010 at 4:46 PM, Valter Nogueira wrote: > And what about searching such contents? > > H

Re: Schema Design: Query by time; more on rows versus columns

2011-01-01 Thread Ted Yu
If there're (many) other tables besides table A, the data may not be evenly distributed across cluster. See https://issues.apache.org/jira/browse/HBASE-3373 On Sat, Jan 1, 2011 at 2:46 AM, Eric wrote: > I have little experience with HBase so far, but my feeling says it should > not matter how m

Re: Schema Design: Query by time; more on rows versus columns

2011-01-01 Thread Eric
I have little experience with HBase so far, but my feeling says it should not matter how much rows you store and that it's better to save on cpu time and bandwidth. HBase will distribute the data evenly over your cluster and should be very good at making rows accessible quickly by key because it's

Re: Schema Design

2010-12-28 Thread Ted Yu
Consider using TableInputFormat. For serialization, one more choice is Avro. On Tue, Dec 28, 2010 at 4:46 PM, Valter Nogueira wrote: > And what about searching such contents? > > How to search for overdued contracts? > > I could read every contract thru map-reduce, select overdued contracts and

Re: Schema Design

2010-12-28 Thread Valter Nogueira
And what about searching such contents? How to search for overdued contracts? I could read every contract thru map-reduce, select overdued contracts and build a table with such contracts - is that the right approach? Valter 2010/12/28 Sean Bigdatafun > I'd suggest json object, or xml, or any

Re: Schema Design

2010-12-28 Thread Sean Bigdatafun
I'd suggest json object, or xml, or any binary protocol buffer such as Google PB, Facebook Thrift PB. If you use any of those, you will have much better control over version upgrade On Tue, Dec 28, 2010 at 4:16 PM, Valter Nogueira wrote: > Since contract has attributes such NUMBER, TOTAL, ACC

Re: Schema Design

2010-12-28 Thread Valter Nogueira
Since contract has attributes such NUMBER, TOTAL, ACCOUNT and soon When doing the follow: row_key || CF: Contract - valter|| 'C11' | info_for_11 | 'C12' | info_for_12 ---

Re: Schema Design

2010-12-28 Thread Sean Bigdatafun
1. customer_table: row_key --> column_family : (customer --> contract) An example row, row_key || CF: Contract - valter|| 'C11' | info_for_11 | 'C12' | info_for_12 ---

Re: Schema Design

2010-12-28 Thread Ted Dunning
Another approach is to denormalize everything into the customer table. On Tue, Dec 28, 2010 at 3:26 PM, Valter Nogueira wrote: > I have a small JAVA system using relational database. > > Basically, the app have 3 entities: CUSTOMER has many CONTRACTs and each > CONTRACT has many INSTALLMENTS > >

RE: Schema design, one-to-many question

2010-11-30 Thread Michael Segel
> From: jg...@fb.com > To: user@hbase.apache.org > Subject: RE: Schema design, one-to-many question > Date: Tue, 30 Nov 2010 16:11:14 + > > I'm not sure I agree that "you can not think of relationships". > > There is in fact a one-to-ma

RE: Schema design, one-to-many question

2010-11-30 Thread Jonathan Gray
ater. With column-orientation, you can have the user as the row and stuff all of his relations into that same row. JG > -Original Message- > From: Michael Segel [mailto:michael_se...@hotmail.com] > Sent: Tuesday, November 30, 2010 5:32 AM > To: user@hbase.apache.org > Subje

RE: Schema design, one-to-many question

2010-11-30 Thread Michael Segel
I'm sorry if this has already been answered, but I'll share my $0.02 anyway... First, you and everyone have to stop thinking of hbase in terms of a relational model. Because Hbase doesn't have the concept of joins, you can not think of relationships. If you have two tables where the primary ke

RE: Schema design, one-to-many question

2010-11-29 Thread Jonathan Gray
rya...@gmail.com] > Sent: Monday, November 29, 2010 5:13 PM > To: user@hbase.apache.org > Subject: Re: Schema design, one-to-many question > > I am using 0.89 currently, does it include those optimizations set for > 0.90? If so, great news, the wide table approach is what I preferred

Re: Schema design, one-to-many question

2010-11-29 Thread Bryan Keller
I am using 0.89 currently, does it include those optimizations set for 0.90? If so, great news, the wide table approach is what I preferred. On Nov 29, 2010, at 4:14 PM, Jonathan Gray wrote: > Hey Bryan, > > All of these approaches could work and seem sane. > > My preference these days would b

RE: Schema design, one-to-many question

2010-11-29 Thread Jonathan Gray
Hey Bryan, All of these approaches could work and seem sane. My preference these days would be the wide-table approach (#2, 3, 4) rather than the tall table. Previously #1 was more efficient but in 0.90 and beyond the same optimizations exist for both tall and wide tables. For #2, I would pro

Re: Schema design, one-to-many question

2010-11-29 Thread Chen Xinli
we have a ssimilar usecase, millions of user and each user with different number of goods, from one to tens of thousands. we use approach 2 Bryan Keller 编写: I have read comments on modeling one-to-many relationships in HBase and wanted to get some feedback. I have millions of customers, and each