A few lines of Java in a partitioning or rack aware strategy might be able to achieve this.
--Joe -- Typed with big fingers on a small keyboard. On Apr 8, 2011, at 13:17, Patrick Julien <pjul...@gmail.com> wrote: > We have a pilot project running where all our historical data > worldwide would be stored using cassandra. So far, we have been > successful at getting the write and read throughput we need, in fact, > coming in over 27% over our needed capacity and well beyond what we > were able to achieve with mysql, very impressive. > > However, one thing that escapes me is how we should organize different > data center access. > > The scenario is the following: > > - We have data centers in North America, London, Tokyo and so on. > - The relative cost of data centers is very different, e.g., TCO for > one server in Tokyo is about the same than 5 such computers in New > York. > - We want to have access to all the data from North America, hence we > would run Hadoop/Pig queries from the New York/North America data > center only. > > The problem is this: we would like the historical data from Tokyo to > stay in Tokyo and only be replicated to New York. The one in London > to be in London and only be replicated to New York and so on for all > data centers. > > Is this currently possible with Cassandra? I believe we would need to > run multiple clusters and migrate data manually from data centers to > North America to achieve this. Also, any suggestions would also be > welcomed.