Really informative thread, thank you! We had a secondary index trauma a while ago, and since then we knew it was not a good idea for most of the cases, but now it's even more clear why.
On Thu, May 29, 2014 at 5:31 PM, Robert Coli <rc...@eventbrite.com> wrote: > On Thu, May 29, 2014 at 1:08 PM, DuyHai Doan <doanduy...@gmail.com> wrote: > >> Hello Robert >> >> There are some maths involved when considering the performance of >> secondary index in C* >> > > Yes, these are the maths which are behind my FIXMEs in the original post. > I merely have not had time to explicitly describe them in the context of > that draft post. > > Thank you for doing so! When I reference them in my eventual post, I will > be sure to credit you. > > >> Because of its distributed nature, finding a *good* use-case for 2nd >> index is quite tricky, partly because it depends on the query pattern but >> also on the cluster size and data distribution. >> > > Yep, and if you're doing this tricky thing, you probably want less opacity > and more explicit understanding of what is happening under the hood and you > want to be sure you won't run into a bug in the implementation, hence > manual "secondary index" CFs. > > >> Apart from the performance aspect, secondary index column families use >> SizeTiered compaction so for an use case with a lot of update you'll have >> plenty of tombstones... I'm not sure how end user can switch to Leveled >> Compaction for 2nd index... >> > > Per Aleksey, secondary index column families actually use the compaction > strategy of the column family they index. I agree that this seems weird, > and is likely just another implementation detail you relinquish control of > for the convenience of 2i. > > =Rob > -- *Paulo Motta* Chaordic | *Platform* *www.chaordic.com.br <http://www.chaordic.com.br/>* +55 48 3232.3200