Re: Two questions about the maximum number of versions of a column family

2016-02-22 Thread Anil Gupta
If its possible to make the timestamps as a suffix of your rowkey(assuming the rowkey is composite) then you would not run into read/write hotspots. Have a look at open tsdb data model that scales really really well. Sent from my iPhone > On Feb 21, 2016, at 10:28 AM, Stephen Durfey

Re: Two questions about the maximum number of versions of a column family

2016-02-21 Thread Stephen Durfey
I personally don't deal with time series data, so I'm not going to make a statement on which is better. I would think from a scanning viewpoint putting the time stamp in the row key is easier, but that will introduce scanning performance bottlenecks due to the row keys being stored

Re: Two questions about the maximum number of versions of a column family

2016-02-21 Thread Daniel
Thanks for your sharing, Stephen and Ted. The reference guide recommends "rows" over "versions" concerning time series data. Are there advantages of using "reversed timestamps" in row keys over the built-in "versions" with regard to scanning performance? -- Original

Re: Two questions about the maximum number of versions of a column family

2016-02-21 Thread Ted Yu
Thanks for sharing, Stephen. bq. scan performance on the region servers needing to scan over all that data you may not need When number of versions is large, try to utilize Filters (where appropriate) which implements: public Cell getNextCellHint(Cell currentKV) { See MultiRowRangeFilter for

Re: Two questions about the maximum number of versions of a column family

2016-02-21 Thread Stephen Durfey
Someone please correct me if I am wrong.  I've looked into this recently due to some performance reasons with my tables in a production environment. Like the books says, I don't recommend keeping this many versions around unless you really need them. Telling HBase to keep around a very large

Two questions about the maximum number of versions of a column family

2016-02-21 Thread Daniel
Hi, I have two questions about the maximum number of versions of a column family: (1) Is it OK to set a very large (>100,000) maximum number of versions for a column family? The reference guide says "It is not recommended setting the number of max versions to an exceedingly high level (e.g.,