In summary, it appears that the use of the DirectAPI was intended
specifically to enable exactly-once semantics. This can be achieved for
idempotent transformations and with transactional processing using the
database to guarantee an "onto" mapping of results based on inputs. For the
latter, you need to store your offsets in the database of record.

If you as a developer do not necessarily need exactly-once semantics, then
you can probably get by fine using the receiver API. 

The hope is that one day the Direct API could be augmented with
Spark-abstracted offset storage (with zookeeper, kafka, or something else
outside of the Spark checkpoint), since this would allow developers to
easily take advantage of the Direct API's performance benefits and
simplification of parallelism. I think it would be worth adding, even if it
were to come with some "buyer beware" caveats.



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Maintaining-Kafka-Direct-API-Offsets-tp24246p24273.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Reply via email to