I'm not a Cassandra dev, so take what I say with a lot of salt, but
AFAICT, there is a certain amount of overhead in maintaining a CF, so
when you have large numbers of CFs, this adds up. From a layperson's
perspective, this observation sounds reasonable, since zero-cost CFs
would be tantamount to being able to implement secondary indexes by
just adding CFs. So instead of paying the for the overhead (or
ineffectiveness of high-cardinaility secondary indexes, which ever way
you want to look at it), you are expecting a free lunch by just
scaling out in terms on new CFs. I would imagine that under the
covers, the layout of Cassandra has a sweet spot of a smallish number
of CFs (i.e. 10s),  but these can practically have as many rows as you
like.

On Mon, Oct 8, 2012 at 11:02 AM, Vanger <disc...@gmail.com> wrote:
> So what solution should be for cassandra architecture when we need to make
> Hadoop M\R jobs and not be restricted by number of CF?
> What we have now is fair amount of CFs  (> 2K) and this number is slowly
> growing so we already planing to merge partitioned CFs. But our next goal is
> to run hadoop tasks on those CFs. All we have is plain Hector and custom ORM
> on top of it. As far as i understand VirtualKeyspace doesn't help in our
> case.
> Also i dont understand why not implement support for many CF ( or build-in
> partitioning ) on cassandra side. Anybody can explain why this can or cannot
> be done in cassandra?
>
> Just in case:
> We're using cassandra 1.0.11 on 30 nodes (planning upgrade on 1.1.* soon).
>
> --
> W/ best regards,
> Sergey.
>
> On 04.10.2012 0:10, Hiller, Dean wrote:
>
> Okay, so it only took me two solid days not a week.  PlayOrm in master
> branch now supports virtual CF's or virtual tables in ONE CF, so you can
> have 1000's or millions of virtual CF's in one CF now.  It works with all
> the Scalable-SQL, works with the joins, and works with the PlayOrm command
> line tool.
>
> Two ways to do it, if you are using the ORM half, you just annotate
>
> @NoSqlEntity("MyVirtualCfName")
> @NoSqlVirtualCf(storedInCf="sharedCf")
>
> So it's stored in sharedCf with the table name of MyVirtualCfName(in command
> line tool, use MyVirtualCfName to query the table).
>
> Then if you don't know your meta data ahead of time, you need to create
> DboTableMeta and DboColumnMeta objects and save them for every table you
> create and can use TypedRow to read and persist (which is what we have a
> project doing).
>
> If you try it out let me know.  We usually get bug fixes in pretty fast if
> you run into anything.  (more and more questions are forming on stack
> overflow as well ;) ).
>
> Later,
> Dean
>
>
>
>

Reply via email to