Re: question on column families

2018-11-14 Thread Stack
On Tue, Nov 13, 2018 at 7:20 PM Antonio Si wrote: > Thanks Allan. > > Then, why is it a problem of having too many column families? If there are > column > families with no data, would that cause any issues? > > Thanks. > > `We have this note in the refguide [1]. It i

Re: question on column families

2018-11-13 Thread Antonio Si
Thanks Allan. Then, why is it a problem of having too many column families? If there are column families with no data, would that cause any issues? Thanks. Antonio. On Tue, Nov 13, 2018 at 7:09 PM Allan Yang wrote: > No, Every column family has its own memstore. Each one is 128MB in y

Re: question on column families

2018-11-13 Thread Allan Yang
7:34写道: > Hi, > > I would like to confirm my understand. > > Let's say I have 13 column families in a hbase table. 11 of those column > families have no data, which 2 column families have large amount of data. > > My understanding is that the size of memstore,

question on column families

2018-11-13 Thread Antonio Si
Hi, I would like to confirm my understand. Let's say I have 13 column families in a hbase table. 11 of those column families have no data, which 2 column families have large amount of data. My understanding is that the size of memstore, which is 128M in my env, will be shared across all column

Re: On the number of column families

2018-08-27 Thread Lars Francke
Stack, sorry for the late answer. Took me a while to get to this. On Thu, Aug 2, 2018 at 6:30 PM, Stack wrote: > On Thu, Jul 12, 2018 at 4:31 AM Lars Francke > wrote: > > > > I've got a question on the number of column families. I've told everyone > > for years tha

Re: On the number of column families

2018-08-02 Thread Stack
On Thu, Jul 12, 2018 at 4:31 AM Lars Francke wrote: > > I've got a question on the number of column families. I've told everyone > for years that you shouldn't use more than maybe 3-10 column families. > > Our book still says the following: > "HBase currently does not do w

Re: On the number of column families

2018-08-01 Thread Lars Francke
se layer, which produces some space and >> query time benefits (and has some tradeoffs). So where I work the ideal is >> one CF, although because we have legacy tables it is not universally >> applied. >> >> >> On Thu, Jul 12, 2018 at 4:31 AM Lars Francke >> wrote

Re: On the number of column families

2018-07-16 Thread Lars Francke
I work the ideal is > one CF, although because we have legacy tables it is not universally > applied. > > > On Thu, Jul 12, 2018 at 4:31 AM Lars Francke > wrote: > > > I've got a question on the number of column families. I've told everyone > > for years that yo

Re: On the number of column families

2018-07-13 Thread Andrew Purtell
duces some space and query time benefits (and has some tradeoffs). So where I work the ideal is one CF, although because we have legacy tables it is not universally applied. On Thu, Jul 12, 2018 at 4:31 AM Lars Francke wrote: > I've got a question on the number of column families. I've

On the number of column families

2018-07-12 Thread Lars Francke
I've got a question on the number of column families. I've told everyone for years that you shouldn't use more than maybe 3-10 column families. Our book still says the following: "HBase currently does not do well with anything above two or three column families so keep the number of c

Re: If possible read families from tables and (more important) qualifiers?

2017-10-05 Thread Sean Busbey
wrote: > I get: > return result->Value(family, qualifier).value() > > result is optional, OK - it works. > But sometimes I must read unknown structure of table, or more often, I know > families but I don't know qualifiers. > > P.S. At long last I succeed building HBase clien

If possible read families from tables and (more important) qualifiers?

2017-10-04 Thread Andrzej
I get: return result->Value(family, qualifier).value() result is optional, OK - it works. But sometimes I must read unknown structure of table, or more often, I know families but I don't know qualifiers. P.S. At long last I succeed building HBase client out of Docker, probably I can e

Re: Multiple column families - scan performance

2017-08-22 Thread Partha
saying we still need to use addColumnFamily to limit scan to 1 c/f? Here is the code for that test, should addColumnFamily (or addColumn??) be used here, or it will read all column families? return new Scan(startInclusive, endExclusive) .setFilter(new FilterList(FilterList.Operator.MUST_PASS_ALL, new

Re: Multiple column families - scan performance

2017-08-22 Thread ramkrishna vasudevan
In HBase even if you say keyOnlyFilter there is a column family involved. In this case if the scan does not specify addfamily() then I think all the column families will be loaded. Regards Ram On Tue, Aug 22, 2017 at 6:47 PM, Partha <parthaema...@gmail.com> wrote: > One other observati

Re: Multiple column families - scan performance

2017-08-22 Thread Partha
cribe 'TABLE1' > Table TABLE1 is ENABLED > TABLE1 > COLUMN FAMILIES DESCRIPTION > {NAME => 'cf1', BLOOMFILTER => 'ROW', VERSIONS => '1', IN_MEMORY => > 'false', KEEP_DELETED_CELLS => 'FALSE', DATA_BLOCK_ENCODING => 'FAST_DIFF', > TTL => 'FOREVER', COMPRESSION =&g

Re: Multiple column families - scan performance

2017-08-21 Thread ramkrishna vasudevan
don't read other families at all. (with or without encoding). Regards Ram On Tue, Aug 22, 2017 at 10:49 AM, ramkrishna vasudevan < ramkrishna.s.vasude...@gmail.com> wrote: > Can you try one more thing - instead of addFamily try using > addColumn(byte[] fam, byte[] qual). Since

Re: Multiple column families - scan performance

2017-08-21 Thread ramkrishna vasudevan
to be sure - are you sure that the 4 CF table has only one qualifier? Regards Ram On Tue, Aug 22, 2017 at 8:17 AM, Partha <parthaema...@gmail.com> wrote: > hbase(main):001:0> describe 'TABLE1' > Table TABLE1 is ENABLED > TABLE1 > COLUMN FAMILIES DESCRIPTION > {NAME =>

Re: Multiple column families - scan performance

2017-08-21 Thread Partha
hbase(main):001:0> describe 'TABLE1' Table TABLE1 is ENABLED TABLE1 COLUMN FAMILIES DESCRIPTION {NAME => 'cf1', BLOOMFILTER => 'ROW', VERSIONS => '1', IN_MEMORY => 'false', KEEP_DELETED_CELLS => 'FALSE', DATA_BLOCK_ENCODING => 'FAST_DIFF', TTL => 'FOREVER', COMPRESSION =&g

Re: Multiple column families - scan performance

2017-08-21 Thread Partha
> > > > -Anoop- > > > > On Sun, Aug 20, 2017 at 4:36 AM, Partha <parthaema...@gmail.com> wrote: > >> Anoop, > >> > >> Yes, each column family (in both tables) uses the same encoding > >> (fast-diff) > >> and same compression (gzip). > >> > >> I suggest you to just try the simple test as my case and see if you > notice > >> a > >> similar drop in performance (almost linear to the # of column families) > > > > > > >

Re: Multiple column families - scan performance

2017-08-21 Thread Partha
>> and same compression (gzip). >> >> I suggest you to just try the simple test as my case and see if you notice >> a >> similar drop in performance (almost linear to the # of column families) > >

Re: Multiple column families - scan performance

2017-08-21 Thread Partha
Will send across table statement and the test code. Pls let me know if you find anything from your test given the inputs so far. Note that column family has only 1 qualifier with json payload value of size 15KB. The column families use fastdiff encoding and gzip compression. Added user

Re: Multiple column families - scan performance

2017-08-17 Thread ramkrishna vasudevan
7 at 4:42 PM, Partha <parthaema...@gmail.com> wrote: > > I have 2 HBase tables - one with a single column family, and other has 4 > > column families. Both tables are keyed by same rowkey, and the column > > families all have a single column qualifier each, with a json string as

Re: Multiple column families - scan performance

2017-08-17 Thread Anoop John
Scan(); s.setStartRow s.setStopRow s.addFamily(cf) Correct? -Anoop- On Thu, Aug 17, 2017 at 4:42 PM, Partha <parthaema...@gmail.com> wrote: > I have 2 HBase tables - one with a single column family, and other has 4 > column families. Both tables are keyed by same rowkey, an

Multiple column families - scan performance

2017-08-17 Thread Partha
I have 2 HBase tables - one with a single column family, and other has 4 column families. Both tables are keyed by same rowkey, and the column families all have a single column qualifier each, with a json string as value (each json payload is about 10-20K in size). All column families use fast

Multiple column families - scan performance

2017-08-14 Thread ps0618
1 down vote favorite I have 2 HBase tables - one with a single column family, and other has 4 column families. Both tables are keyed by same rowkey, and the column families all have a single column qualifier each, with a json string as value (each json payload is about 10-20K in size). All column

Multiple column families - scan performance

2017-08-14 Thread Partha Sarathy
I have 2 HBase tables - one with a single column family, and other has 4 column families. Both tables are keyed by same rowkey, and the column families all have a single column qualifier each, with a json string as value (each json payload is about 10-20K in size). All column families use fast

Re: Column families

2017-06-23 Thread Ted Yu
t; > > On Thu, Jun 22, 2017 at 4:06 PM, Ted Yu <yuzhih...@gmail.com> wrote: > > > bq. HBase doesn't do well with more than 2-3 column families > > > > The above is out of date - we have per column family flush which would > > reduce the number of small hfiles.

Re: Column families

2017-06-23 Thread Alexander Ilyin
Brian, Ted, thank you for your answers. Ted, could you point out the HBase version where per column family flush first appeared? On Thu, Jun 22, 2017 at 4:06 PM, Ted Yu <yuzhih...@gmail.com> wrote: > bq. HBase doesn't do well with more than 2-3 column families > > The above is o

Re: Column families

2017-06-22 Thread Ted Yu
bq. HBase doesn't do well with more than 2-3 column families The above is out of date - we have per column family flush which would reduce the number of small hfiles. bq. Why can't we just create several tables instead? Currently hbase doesn't provide transaction across region boundary

Re: Column families

2017-06-22 Thread Brian Jeltema
One use-case that applies to my tables is that I have a table with a set of columns that have data that is always processed with MR jobs, but other rather large columns that are generally only accessed through a UI. By separating those into two column families, MR jobs that do a full table scan

Column families

2017-06-22 Thread Alexander Ilyin
Hi, A general question regarding column families. It is said in the doc that HBase doesn't do well with more than 2-3 column families because flushing and compactions are done on a per region basis which should be addressed in the future: http://hbase.apache.org/book.html#number.of.cfs

HBase Tables and Column Families and bulk loading

2016-02-08 Thread Cameron, David A
Hi, I'm working on a project where we have a strange use case. First off, we use bulk loading exclusively. We never use the put or bulk put interface to load data into tables. We have drivers that make me want to segregate data by tables and column families. Our data is clearly delineated

Re: HBase Tables and Column Families and bulk loading

2016-02-08 Thread Ted Yu
roject where we have a strange use case. > > First off, we use bulk loading exclusively. We never use the put or bulk > put interface to load data into tables. > > We have drivers that make me want to segregate data by tables and column > families. Our data is clearly deline

Re: scan column families with different time ranges

2015-08-03 Thread Ted Yu
is based on using essential column family (column family A in your case) to guide whether the remaining column families should be loaded. To be specific, if outside the TimeRange you specify (last day), your filter returns ReturnCode.INCLUDE_AND_SEEK_NEXT_ROW. What do you think

Re: scan column families with different time ranges

2015-08-03 Thread Dave Latham
families should be loaded. To be specific, if outside the TimeRange you specify (last day), your filter returns ReturnCode.INCLUDE_AND_SEEK_NEXT_ROW. What do you think ? Cheers On Sat, Aug 1, 2015 at 8:06 PM, Dave Latham lat...@davelink.net wrote: Thanks

Re: scan column families with different time ranges

2015-08-03 Thread Dave Latham
, Ted Yu yuzhih...@gmail.com wrote: Dave: I wonder if Filter response can be enhanced in the following manner: http://pastebin.com/sb6apTPm My approach is based on using essential column family (column family A in your case) to guide whether the remaining column families should

Re: scan column families with different time ranges

2015-08-03 Thread Dave Latham
can be enhanced in the following manner: http://pastebin.com/sb6apTPm My approach is based on using essential column family (column family A in your case) to guide whether the remaining column families should be loaded. To be specific, if outside the TimeRange you specify (last day), your

Re: scan column families with different time ranges

2015-08-02 Thread Ted Yu
Dave: I wonder if Filter response can be enhanced in the following manner: http://pastebin.com/sb6apTPm My approach is based on using essential column family (column family A in your case) to guide whether the remaining column families should be loaded. To be specific, if outside the TimeRange

Re: scan column families with different time ranges

2015-08-02 Thread Ted Yu
(column family A in your case) to guide whether the remaining column families should be loaded. To be specific, if outside the TimeRange you specify (last day), your filter returns ReturnCode.INCLUDE_AND_SEEK_NEXT_ROW. What do you think ? Cheers On Sat, Aug 1, 2015 at 8:06 PM, Dave Latham lat

scan column families with different time ranges

2015-08-01 Thread Dave Latham
I have a table with 2 column families, call them A and B, with new data regularly being added. They are very different sizes: B is 100x the size of A. Among other uses for this data, I have a MapReduce job that needs to read all of A, but only recent data from B (e.g. last day). Here are some

Re: scan column families with different time ranges

2015-08-01 Thread Ted Yu
Can you achieve your goal with two scans ? The first scan specifies TimeRange corresponding to last day. This scan returns both column families. The other scan specifies TimeRange excluding last day. This scan returns column family A. Cheers On Sat, Aug 1, 2015 at 8:35 AM, Dave Latham lat

Re: scan column families with different time ranges

2015-08-01 Thread Andrew Purtell
Hi Dave, Would HBase be willing to accept updating Scan to have different TimeRange's for each column families? We could try it. I'm not sure how familiar you are with the relevant code. I'm guessing some? Look at ScanQueryMatcher. This and related concerns govern how we search through store

Re: scan column families with different time ranges

2015-08-01 Thread Ted Yu
Have you considered using essential column family feature (through Filter) ? In your case A would be the essential column family. Within TimeRange for recent data, the filter would return both column families. Outside the TimeRange, only family A is returned. Cheers On Sat, Aug 1, 2015 at 7:17

Re: scan column families with different time ranges

2015-08-01 Thread Dave Latham
families the filter operates on (essential seems an odd name). If any data from those column families passes the filter, then the scan loads and includes data from the remaining families without filtering it. In my case, it's not clear from a row's family A whether or not family B for that row

Re: scan column families with different time ranges

2015-08-01 Thread Dave Latham
Thanks for brainstorming, Ted. That sounds like option 2 I listed using a separate scanner for A vs B which adds complexity to the job and gives up the atomicity/consistency guarantees as new writes hit both column families. On Sat, Aug 1, 2015 at 9:07 AM, Ted Yu yuzhih...@gmail.com wrote: Can

Re: scan column families with different time ranges

2015-08-01 Thread Andrew Purtell
smaller than in A, I do not understand where is a source of IO bottleneck? On Aug 1, 2015 9:16 AM, Andrew Purtell apurt...@apache.org wrote: Hi Dave, Would HBase be willing to accept updating Scan to have different TimeRange's for each column families? We could try it. I'm not sure how

Re: Wide Rows vs Multiple column families

2014-09-29 Thread Nishanth S
Hey Ted, I was in the process of comparing insert throughputs which we discussed using ycsb.What I could find is that when I split the data into multiple column families the insert through is coming down to half when compared to persisting into a single column family.Do you think

Re: Wide Rows vs Multiple column families

2014-09-29 Thread Ted Yu
Can you give a bit more detail, such as: the release of HBase you're using number of column families where slowdown is observed size of cluster release of hadoop you're using Thanks On Mon, Sep 29, 2014 at 9:43 AM, Nishanth S nishanth.2...@gmail.com wrote: Hey Ted, I was in the process

Re: Wide Rows vs Multiple column families

2014-09-29 Thread Nishanth S
Hbase Release: 0.96.1 Number of column families at which issue is observed is 2.Earlier I had one single column family where all the data was persisted.In the new case I was storing all meta data into column family 1(less than 1k) and a blob on second column family(around 7Kb). We have 9 node

Re: Wide Rows vs Multiple column families

2014-09-29 Thread Ted Yu
. Cheers On Mon, Sep 29, 2014 at 10:00 AM, Nishanth S nishanth.2...@gmail.com wrote: Hbase Release: 0.96.1 Number of column families at which issue is observed is 2.Earlier I had one single column family where all the data was persisted.In the new case I was storing all meta data into column family

Wide Rows vs Multiple column families

2014-09-25 Thread Nishanth S
I am trying to answer the below questions in this scenario. 1.Would seperating to multiple column families affect hbase write performance? 2. How would if affect my read performance considering both the read cases? 3.Is there any advantage that I am gaining by seperating into multiple cfs? I

Re: Wide Rows vs Multiple column families

2014-09-25 Thread Ted Yu
and this huge data chunk).In general I am trying to answer the below questions in this scenario. 1.Would seperating to multiple column families affect hbase write performance? 2. How would if affect my read performance considering both the read cases? 3.Is there any advantage that I am gaining

Re: Wide Rows vs Multiple column families

2014-09-25 Thread Nishanth S
seperating to multiple column families affect hbase write performance? 2. How would if affect my read performance considering both the read cases? 3.Is there any advantage that I am gaining by seperating into multiple cfs? I would really appreciate if any one could point me

Re: Wide Rows vs Multiple column families

2014-09-25 Thread Ted Yu
There should not be impact to hbase write performance for two column families. Cheers On Thu, Sep 25, 2014 at 10:53 AM, Nishanth S nishanth.2...@gmail.com wrote: Thank you Ted.No I do not plan to use bulk loading since the data is incremental in nature. On Thu, Sep 25, 2014 at 11:36 AM

Re: Wide Rows vs Multiple column families

2014-09-25 Thread Nishanth S
Thank you Ted. -Nishan On Thu, Sep 25, 2014 at 11:56 AM, Ted Yu yuzhih...@gmail.com wrote: There should not be impact to hbase write performance for two column families. Cheers On Thu, Sep 25, 2014 at 10:53 AM, Nishanth S nishanth.2...@gmail.com wrote: Thank you Ted.No I do not plan

Multiple column families vs Multiple tables

2014-08-19 Thread Wei Liu
We are doing schema design for our application, One thing we are not so clear about is multiple column families (more than 3, probably 4 - 5) vs multiple tables. In our use case, we will have the same number of rows in all these column families, but some column families may be modified more often

RE: Question on the number of column families

2014-08-06 Thread innowireless TaeYun Kim
Kim [mailto:taeyun@innowireless.co.kr] Sent: Wednesday, August 06, 2014 1:48 PM To: user@hbase.apache.org Subject: RE: Question on the number of column families Thank you. The 'dummy' column will always hold the value '1' (or even an empty string), that only signifies that this row exists

Re: Question on the number of column families

2014-08-06 Thread Qiang Tian
1:48 PM To: user@hbase.apache.org Subject: RE: Question on the number of column families Thank you. The 'dummy' column will always hold the value '1' (or even an empty string), that only signifies that this row exists. (And the real value is in the other 'big' column family) The value

Re: Question on the number of column families

2014-08-06 Thread Ted Yu
be used to minimize the scan cost. Thank you. -Original Message- From: innowireless TaeYun Kim [mailto:taeyun@innowireless.co.kr] Sent: Wednesday, August 06, 2014 1:48 PM To: user@hbase.apache.org Subject: RE: Question on the number of column families Thank you. The 'dummy

RE: Question on the number of column families

2014-08-06 Thread innowireless TaeYun Kim
that's a subclass of the RowFilter. - In that filter class, override isFamilyEssential() method to return true only when the name of the 'dummy' column family is passed as an argument. Now, HBase calls isFamilyEssential() method of my filter object for all the column families including

RE: Question on the number of column families

2014-08-06 Thread innowireless TaeYun Kim
Hi Qiang, thank you for your help. 1. Regarding HBASE-5416, I think it's purpose is simple. Avoid loading column families that is irrelevant to filtering while scanning. So, it can be applied to my 'dummy CF' case. That is, a dummy CF can act like an 'relevant' CF to filtering, provided

Re: Question on the number of column families

2014-08-06 Thread Ted Yu
the column families including the 'dummy' column family, and in result only loads the 'dummy' column family and happily filters rowkey using the KeyValue objects from the 'dummy' column family HFile(s). Am I right? BTW, it would be nice to have a method like 'setEssentialColumnFamilies(byte

Re: Question on the number of column families

2014-08-06 Thread Qiang Tian
Hi TaeYun, thanks for explain. On Thu, Aug 7, 2014 at 12:50 PM, innowireless TaeYun Kim taeyun@innowireless.co.kr wrote: Hi Qiang, thank you for your help. 1. Regarding HBASE-5416, I think it's purpose is simple. Avoid loading column families that is irrelevant to filtering while

Question on the number of column families

2014-08-05 Thread innowireless TaeYun Kim
Hi, According to http://hbase.apache.org/book/number.of.cfs.html, having more than 2~3 column families are strongly discouraged. BTW, in my case, records on a table have the following characteristics: - The table is read-only. It is bulk-loaded once. When a new data is ready, A new

RE: Question on the number of column families

2014-08-05 Thread innowireless TaeYun Kim
To: user@hbase.apache.org Subject: Question on the number of column families Hi, According to http://hbase.apache.org/book/number.of.cfs.html, having more than 2~3 column families are strongly discouraged. BTW, in my case, records on a table have the following characteristics

Re: Question on the number of column families

2014-08-05 Thread Alok Kumar
of column families Hi, According to http://hbase.apache.org/book/number.of.cfs.html, having more than 2~3 column families are strongly discouraged. BTW, in my case, records on a table have the following characteristics: - The table is read-only. It is bulk-loaded once. When a new data

RE: Question on the number of column families

2014-08-05 Thread innowireless TaeYun Kim
the values for the area that is displayed on the screen. -Original Message- From: Alok Kumar [mailto:alok...@gmail.com] Sent: Tuesday, August 05, 2014 8:24 PM To: user@hbase.apache.org Subject: Re: Question on the number of column families Hi, Hbase creates HFile per column-family

RE: Question on the number of column families

2014-08-05 Thread innowireless TaeYun Kim
cache, unless the columns are separated by individual column family. -Original Message- From: innowireless TaeYun Kim [mailto:taeyun@innowireless.co.kr] Sent: Tuesday, August 05, 2014 8:36 PM To: user@hbase.apache.org Subject: RE: Question on the number of column families Thank you

Re: Question on the number of column families

2014-08-05 Thread Alok Kumar
cache, unless the columns are separated by individual column family. -Original Message- From: innowireless TaeYun Kim [mailto:taeyun@innowireless.co.kr] Sent: Tuesday, August 05, 2014 8:36 PM To: user@hbase.apache.org Subject: RE: Question on the number of column families Thank

Re: Question on the number of column families

2014-08-05 Thread Ted Yu
As Alok mentioned previously, once columns are grouped into several column families, you would be able to leverage essential column family feature introduced by this JIRA: HBASE-5416 Improve performance of scans with some kind of filters Cheers On Tue, Aug 5, 2014 at 5:26 AM, Alok Kumar alok

Re: Question on the number of column families

2014-08-05 Thread Michael Segel
that make sense? In that example, you have 4 column families. There are other examples, but that should help you put column families in perspective. HTH -Mike On Aug 5, 2014, at 11:52 AM, Ted Yu yuzhih...@gmail.com wrote: As Alok mentioned previously, once columns are grouped into several

Re: Question on the number of column families

2014-08-05 Thread Alok Singh
One way to model the data would be to use a composite key that is made up of the RDMS primary_key + . + field_name. Then just have a single column that contains the value of the field. Individual field lookups will be a simple get and to get all of fields of a record, you would do a scan with

RE: Question on the number of column families

2014-08-05 Thread innowireless TaeYun Kim
Thank you all. Facts learned: - Having 130 column families is too much. Don't do that. - While scanning, an entire row will be read for filtering, unless HBASE-5416 technique is applied which makes only relevant column family is loaded. (But it seems that still one can't load just a column

Re: Question on the number of column families

2014-08-05 Thread Ted Yu
, you can look at the unit test (TestJoinedScanners) from HBASE-5416. You would understand this feature better. Cheers On Tue, Aug 5, 2014 at 9:21 PM, innowireless TaeYun Kim taeyun@innowireless.co.kr wrote: Thank you all. Facts learned: - Having 130 column families is too much. Don't do

RE: Question on the number of column families

2014-08-05 Thread innowireless TaeYun Kim
: Question on the number of column families bq. add a 'dummy' column family and apply HBASE-5416 technique Adding dummy column family is not the way to utilize essential column family support - what would this dummy column family hold ? bq. since I have not read the filtering section of the book I'm

Re: Best practice for writing to HFileOutputFormat(2) with multiple Column Families

2014-08-01 Thread Arun Allamsetty
...@gmail.com wrote: I need to generate from a 2TB dataset and exploded it to 4 Column Families. The result dataset is likely to be 20TB or more. I'm currently using Spark so I sorted the (rk, cf, cq) myself. It's huge and I'm considering how to optimize it. My question is: Should I sort

Re: Best practice for writing to HFileOutputFormat(2) with multiple Column Families

2014-08-01 Thread Jianshi Huang
about it because HBase sorts the row keys on its own but lexicographically. Cheers, Arun Sent from a mobile device. Please don't mind the typos. On Jul 30, 2014 9:02 PM, Jianshi Huang jianshi.hu...@gmail.com wrote: I need to generate from a 2TB dataset and exploded it to 4 Column Families

Re: Best practice for writing to HFileOutputFormat(2) with multiple Column Families

2014-08-01 Thread Nick Dimiduk
...@gmail.com wrote: I need to generate from a 2TB dataset and exploded it to 4 Column Families. The result dataset is likely to be 20TB or more. I'm currently using Spark so I sorted the (rk, cf, cq) myself. It's huge and I'm considering how to optimize it. My question is: Should I sort and write each

Best practice for writing to HFileOutputFormat(2) with multiple Column Families

2014-07-30 Thread Jianshi Huang
I need to generate from a 2TB dataset and exploded it to 4 Column Families. The result dataset is likely to be 20TB or more. I'm currently using Spark so I sorted the (rk, cf, cq) myself. It's huge and I'm considering how to optimize it. My question is: Should I sort and write each column family

Re: How many column families in one table ?

2013-08-05 Thread Pablo Medina
Lars, when you say 'when one memstore needs to be flushed all other column families are flushed', are you referring to other column families of the same table, right? 2013/8/4 Rohit Kelkar rohitkel...@gmail.com Regarding slow scan- only fetch the columns /qualifiers that you need. It may

Re: How many column families in one table ?

2013-08-05 Thread Kevin O'dell
Pablo, That is correct. On Mon, Aug 5, 2013 at 10:00 AM, Pablo Medina pablomedin...@gmail.comwrote: Lars, when you say 'when one memstore needs to be flushed all other column families are flushed', are you referring to other column families of the same table, right? 2013/8/4 Rohit

Re: How many column families in one table ?

2013-08-04 Thread Vimal Jain
Hi, I have tested read performance after reducing number of column families from 14 to 3 and yes there is improvement. Meanwhile i was going through the paper published by google on BigTable. It says It is our intent that the number of distinct column families in a table be small (in the hundreds

Re: How many column families in one table ?

2013-08-04 Thread Inder Pall
. On Aug 4, 2013 2:29 AM, Vimal Jain vkj...@gmail.com wrote: Hi, I have tested read performance after reducing number of column families from 14 to 3 and yes there is improvement. Meanwhile i was going through the paper published by google on BigTable. It says It is our intent

Re: How many column families in one table ?

2013-08-04 Thread Kevin O'dell
read performance after reducing number of column families from 14 to 3 and yes there is improvement. Meanwhile i was going through the paper published by google on BigTable. It says It is our intent that the number of distinct column families in a table be small (in the hundreds

Re: How many column families in one table ?

2013-08-04 Thread Inder Pall
Vimal, It really depends on your usage pattern but HBase != Bigtable. On Aug 4, 2013 2:29 AM, Vimal Jain vkj...@gmail.com wrote: Hi, I have tested read performance after reducing number of column families from 14 to 3 and yes there is improvement. Meanwhile i

Re: How many column families in one table ?

2013-08-04 Thread lars hofhansl
columns such that a scan is often limited to a single Column Family, you'll get huge benefit by using more Column Families. The main consideration for many Column Families and that each has its own store files, and hence scanning involves more seeking for each Column Families included in a scan

Re: How many column families in one table ?

2013-08-04 Thread Rohit Kelkar
Family, you'll get huge benefit by using more Column Families. The main consideration for many Column Families and that each has its own store files, and hence scanning involves more seeking for each Column Families included in a scan. They are also flushed together; when one memstore (which

Re: How many column families in one table ?

2013-07-01 Thread Vimal Jain
Thanks Dhaval/Michael/Ted/Otis for your replies. Actually , i asked this question because i am seeing some performance degradation in my production Hbase setup. I have configured Hbase in pseudo distributed mode on top of HDFS. I have created 17 Column families :( . I am actually using 14 out

Re: How many column families in one table ?

2013-07-01 Thread Viral Bajaria
When you did the scan, did you check what the bottleneck was ? Was it I/O ? Did you see any GC locks ? How much RAM are you giving to your RS ? -Viral On Mon, Jul 1, 2013 at 1:44 AM, Vimal Jain vkj...@gmail.com wrote: To completely scan the table for all 140 columns , it takes around 30-40

Re: How many column families in one table ?

2013-07-01 Thread Vimal Jain
I scanned it during normal traffic hours.There was no I/O load on the server. I dont see any GC locks too. Also i have given 1.5G to RS , 512M to each Master and Zookeeper. One correction in the post above : Actual time to scan whole table is even more , it takes 10 mins to scan 0.1 million rows

Re: How many column families in one table ?

2013-07-01 Thread Vimal Jain
Can someone please reply ? Also what is the typical read/write speed of hbase and how much deviation would be there in my scenario mentioned above (14 cf , total 140 columns ) ? I am asking this because i am not simply printing out the scanned values , instead i am applying some logic on the data

Re: How many column families in one table ?

2013-07-01 Thread lars hofhansl
Subject: Re: How many column families in one table ? Can someone please reply ? Also what is  the typical read/write speed of hbase and how much deviation would be there in my scenario mentioned above (14 cf , total 140 columns ) ? I am asking this because i am not simply printing out the scanned

Re: How many column families in one table ?

2013-07-01 Thread Vimal Jain
? Otherwise each call to next() is a RPC roundtrip and you are basically measuring your networks RTT. -- Lars From: Vimal Jain vkj...@gmail.com To: user@hbase.apache.org Sent: Monday, July 1, 2013 4:11 AM Subject: Re: How many column families in one table ? Can

Re: How many column families in one table ?

2013-07-01 Thread Ted Yu
this question because i am seeing some performance degradation in my production Hbase setup. I have configured Hbase in pseudo distributed mode on top of HDFS. I have created 17 Column families :( . I am actually using 14 out of these 17 column families. Each column family has around on average 8

Re: How many column families in one table ?

2013-07-01 Thread Vimal Jain
. Actually , i asked this question because i am seeing some performance degradation in my production Hbase setup. I have configured Hbase in pseudo distributed mode on top of HDFS. I have created 17 Column families :( . I am actually using 14 out of these 17 column families. Each column family

Re: How many column families in one table ?

2013-07-01 Thread lars hofhansl
, 2013 4:44 AM Subject: Re: How many column families in one table ? Hi, We had some hardware constraints along with the fact that our total data size was in GBs. Thats why to start with Hbase ,  we first began  with pseudo distributed mode and thought if required we would upgrade to fully

Re: How many column families in one table ?

2013-07-01 Thread Vimal Jain
column families in one table ? Hi, We had some hardware constraints along with the fact that our total data size was in GBs. Thats why to start with Hbase , we first began with pseudo distributed mode and thought if required we would upgrade to fully distributed mode. On Mon, Jul 1

Re: How many column families in one table ?

2013-07-01 Thread Viral Bajaria
On Mon, Jul 1, 2013 at 10:06 AM, Vimal Jain vkj...@gmail.com wrote: Sorry for the typo .. please ignore previous mail.. Here is the corrected one.. 1)I have around 140 columns for each row , out of 140 , around 100 columns hold java primitive data type , remaining 40 columns contain

Re: How many column families in one table ?

2013-06-28 Thread Ted Yu
Vimal: Please also refer to: http://search-hadoop.com/m/qOx8l15Z1q42/column+families+fbsubj=Re+HBase+Column+Family+Limit+Reasoning On Fri, Jun 28, 2013 at 1:37 PM, Michel Segel michael_se...@hotmail.comwrote: Short answer... As few as possible. 14 CF doesn't make too much sense. Sent from

Re: How many column families in one table ?

2013-06-28 Thread Vimal Jain
Hi All , Thanks for your replies. Ted, Thanks for the link, but its not working . :( On Fri, Jun 28, 2013 at 5:57 PM, Ted Yu yuzhih...@gmail.com wrote: Vimal: Please also refer to: http://search-hadoop.com/m/qOx8l15Z1q42/column+families+fbsubj=Re+HBase+Column+Family+Limit+Reasoning

  1   2   3   >