We implemented <https://github.com/GoogleCloudPlatform/DataflowJavaSDK/pull/121> a Kafka connector for Google Dataflow (streaming). We manually assign partitions to each split. Dataflow SDK lets sources report their backlog, I didn't see any way to find out about latest offset using 0.9 consumer. One option is to create a new consumer, seek to latest. The latency might quite high with new client to establish connections, etc.
Also any comments on pull request are welcome. Btw, do you know of any implementation of 0.9 consumer interface using 0.8 consumer (a partial implementation using SimpleConsumer to support manual assignment is fine too). This might be a decent way to support 0.8 servers. Thanks, Raghu. [ this is my second attempt sending this message to the mailing list. apologies if you see this multiple times. ]