Dennis, You'll have better chances to get an answer on the spark-cassandra-connector mailing list https://groups.google.com/a/lists.datastax.com/forum/#!forum/spark-connector-user or on IRC #spark-cassandra-connector
On Wed, Jan 13, 2016 at 4:17 AM, Dennis Birkholz <birkh...@pubgrade.com> wrote: > Hi together, > > we Cassandra to log event data and process it every 15 minutes with Spark. > We are using the Cassandra Java Connector for Spark. > > Randomly our Spark runs produce too few output records because no data is > returned from Cassandra for a several minutes window of input data. When > querying the data (with cqlsh), after multiple tries, the data eventually > becomes available. > > To solve the problem, we tried to use consistency=ALL when reading the > data in Spark. We use the > CassandraJavaUtil.javafunctions().cassandraTable() method and have set > "spark.cassandra.input.consistency.level"="ALL" on the config when creating > the Spark context. The problem persists but according to > http://stackoverflow.com/a/25043599 using a consistency level of ONE on > the write side (which we use) and ALL on the READ side should be sufficient > for data consistency. > > I would really appreciate if someone could give me a hint how to fix this > problem, thanks! > > Greets, > Dennis > > P.s.: > some information about our setup: > Cassandra 2.1.12 in a two Node configuration with replication factor=2 > Spark 1.5.1 > Cassandra Java Driver 2.2.0-rc3 > Spark Cassandra Java Connector 2.10-1.5.0-M2 > -- Bests, Alex Popescu | @al3xandru Sen. Product Manager @ DataStax