In summary, it appears that the use of the DirectAPI was intended specifically to enable exactly-once semantics. This can be achieved for idempotent transformations and with transactional processing using the database to guarantee an "onto" mapping of results based on inputs. For the latter, you need to store your offsets in the database of record.
If you as a developer do not necessarily need exactly-once semantics, then you can probably get by fine using the receiver API. The hope is that one day the Direct API could be augmented with Spark-abstracted offset storage (with zookeeper, kafka, or something else outside of the Spark checkpoint), since this would allow developers to easily take advantage of the Direct API's performance benefits and simplification of parallelism. I think it would be worth adding, even if it were to come with some "buyer beware" caveats. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Maintaining-Kafka-Direct-API-Offsets-tp24246p24273.html Sent from the Apache Spark User List mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org