Spark on DSE Cassandra with multiple data centers

Simone Franzini Wed, 11 May 2016 09:16:42 -0700

I am running Spark on DSE Cassandra with multiple analytics data centers.
It is my understanding that with this setup you should have a CFS file
system for each data center. I was able to create an additional CFS file
system as described here:
http://docs.datastax.com/en/latest-dse/datastax_enterprise/ana/anaCFS.html
I verified that the additional CFS file system is created properly.


I am now following the instructions here to configure Spark on the second
data center to use its own CFS:
http://docs.datastax.com/en/latest-dse/datastax_enterprise/spark/sparkConfHistoryServer.html
However, running:
dse hadoop fs -mkdir <additional_cfs_name>:/spark/events
fails with:
WARN You are going to access CFS keyspace: cfs in data center:
<second_analytics_datacenter>. It will not work because the replication
factor for this keyspace in this data center is 0.
....
Bad connection to FS. command aborted. exception: UnavailableException()

That is, it appears that the <additional_cfs_name> in the hadoop command is
being ignored and it is trying to connect to cfs: rather than
additional_cfs.

Anybody else ran into this?


Simone Franzini, PhD

http://www.linkedin.com/in/simonefranzini

Spark on DSE Cassandra with multiple data centers

Reply via email to