Well, I think I know the direction we may follow so we can 1. Have Virtual CF's 2. Be able to map/reduce ONE Virtual CF
Well, not map/reduce exactly but really really close. We use PlayOrm with it's partitioning so I am now thinking what we will do is have a compute grid where we can have each node doing a findAll query into the partitions it is responsible for. In this way, I think we can 1000's of virtual CF's inside ONE CF and then PlayOrm does it's query and retrieves the rows for that partition of one virtual CF. Anyone know of a computer grid we can dish out work to? That would be my only missing piece (well, that and the PlayOrm virtual CF feature but I can add that within a week probably though I am on vacation this Thursday to monday). Later, Dean On 10/2/12 6:35 AM, "Hiller, Dean" <dean.hil...@nrel.gov> wrote: >So basically, with moving towards the 1000's of CF all being put in one >CF, our performance is going to tank on map/reduce, correct? I mean, from >what I remember we could do map/reduce on a single CF, but by stuffing >1000's of virtual Cf's into one CF, our map/reduce will have to read in >all 999 virtual CF's rows that we don't want just to map/reduce the ONE >CF. > >Map/reduce VERY VERY SLOW when reading in 1000 times more rows :( :(. > >Is this correct? This really sounds like highly undesirable behavior. >There needs to be a way for people with 1000's of CF's to also run >map/reduce on any one CF. Doing Map/reduce on 1000 times the number of >rows will be 1000 times slowerÅ .and of course, we will most likely get up >to 20,000 tables from my most recent projectionsÅ .our last test load, we >ended up with 8k+ CF's. Since I kept two other keyspaces, cassandra >started getting really REALLY slow when we got up to 15k+ CF's in the >system though I didn't look into why. > >I don't mind having 1000's of virtual CF's in ONE CF, BUT I need to >map/reduce "just" the virtual CF!!!!! Ugh. > >Thanks, >Dean > >On 10/1/12 3:38 PM, "Ben Hood" <0x6e6...@gmail.com> wrote: > >>On Mon, Oct 1, 2012 at 9:38 PM, Brian O'Neill <b...@alumni.brown.edu> >>wrote: >>> Its just a convenient way of prefixing: >>> >>>http://hector-client.github.com/hector/build/html/content/virtual_keyspa >>>c >>>es.html >> >>So given that it is possible to use a CF per tenant, should we assume >>that there at sufficient scale that there is less overhead to prefix >>keys than there is to manage multiple CFs? >> >>Ben >