Just noticed you'd sent this to the dev list, this is a question for only the user list, and please do not send questions of this type to the developer list.
On Thu, Jan 8, 2015 at 8:33 AM, Ryan Svihla <r...@foundev.pro> wrote: > The nature of replication factor is such that writes will go wherever > there is replication. If you're wanting responses to be faster, and not > involve the REST data center in the spark job for response I suggest using > a cql driver and LOCAL_ONE or LOCAL_QUORUM consistency level (look at the > spark cassandra connector here > https://github.com/datastax/spark-cassandra-connector ) . While write > traffic will still be replicated to the REST service data center, because > you do want those results available, you will not be waiting on the remote > data center to respond "successful". > > Final point, bulk loading sends a copy per replica across the wire, so > lets say you have RF3 in each data center that means bulk loading will send > out 6 copies from that client at once, with normal mutations via thrift or > cql writes between data centers go out as 1 copy, then that node will > forward on to the other replicas. This means intra data center traffic in > this case would be 3x more with the bulk loader than with using a > traditional cql or thrift based client. > > > > On Wed, Jan 7, 2015 at 6:32 PM, Benyi Wang <bewang.t...@gmail.com> wrote: > >> I set up two virtual data centers, one for analytics and one for REST >> service. The analytics data center sits top on Hadoop cluster. I want to >> bulk load my ETL results into the analytics data center so that the REST >> service won't have the heavy load. I'm using CQLTableInputFormat in my >> Spark Application, and I gave the nodes in analytics data center as >> Intialial address. >> >> However, I found my jobs were connecting to the REST service data center. >> >> How can I specify the data center? >> > > > > -- > > Thanks, > Ryan Svihla > > -- Thanks, Ryan Svihla