Thanks for the feedback.

On Wed, Jun 8, 2016 at 12:35 PM, Jesse Anderson <[email protected]>
wrote:

> The second thing is about the JavaDoc on the read method. I think the
> JavaDocs should talk about the differences between this and a default
> KafkaConsumer. Since there isn't a group.id required to be set, I think
> the JavaDoc should mention the implications of this. It would say something
> about always starting from the latest data. Also, it should mention that
> offsets won't be kept and restarting the processing will not start at the
> last offset; it will start at the latest data again.
>

JavaDoc currently states this
<https://github.com/apache/incubator-beam/blob/master/sdks/java/io/kafka/src/main/java/org/apache/beam/sdk/io/kafka/KafkaIO.java#L156>
about
starting from latest offset :

 When the pipeline starts for the first time without any checkpoint, the
source starts
  consuming from the <em>latest</em> offsets. You can override this
behavior to consume from the
  beginning by setting appropriate appropriate properties in {@link
ConsumerConfig}, through
 {@link Read#updateConsumerProperties(Map)}

It says by default it reads from the latest position. I kind of left very
Kafka specific details (e.g. actual Kafka configuration you need to set to
start from the beginning, or to enable kafka auto_commit which would
provide an (external) soft checkpoint of read input location). KafkaIO
gives full control on configuration you can pass to KafkaConsumer (see
"Advanced
Kafka Configuration
<https://github.com/apache/incubator-beam/blob/master/sdks/java/io/kafka/src/main/java/org/apache/beam/sdk/io/kafka/KafkaIO.java#L198>"
section in JavaDoc).

I didn't explicitly mention implications of 'starting a job' vs 'updating a
job', which is a Beam/Dataflow feature common to all sources. Update starts
from checkpoint (ensures KakfaIO resumes reading from where it left off),
but fresh start does not have any checkpointed state, so KafkaIO reads from
latest offset (by default). It would be better to call this out explicitly
as many new users may not aware of it.

Please suggest any other improvements.

Reply via email to