Hi,

I'm looking into Kafka's transactions API as proposed in KIP-98. I've read
both this KIP-98 document and I looked into the code that is on the master
branch. I would like to use it to implement some two phase commit mechanism
on top of the Kafka's transactions, that would allow me to tie multiple
systems (some of them might not be Kafka) in one transaction.

Maybe I'm missing something but the problem is I don't see a way to
implement it using proposed Kafka's transactions API. Even if I have just
two processes writing to Kafka topics, I don't know how can I guarantee
that if one's transaction is committed, the other will also eventually be
committed. This is because if first KafkaProducer successfully commits, but
the second one fails before committing it's data, after restart the second
one's "initTransactions" call will (according to my understanding of the
API) abort previously non completed transactions.

Usually transactional systems expose API like this
<http://hlinnaka.iki.fi/2013/04/11/how-to-write-a-java-transaction-manager-that-works-with-postgresql/>.
Namely there is a known identifier for a transaction and you can pre-commit
it (void prepare(...) method in before mentioned example) and then commit
or you can abort this transaction. Usually pre-commit involves flushing
stuff to some temporary files and commit move those files to the final
directory. In case of machine/process failure, if it was before
"pre-commit", we can just rollback all transactions from all of the
processes. However once every process acknowledge that it completed
"pre-commit", each process should call "commit". If some process fails at
that stage, after restarting this process, I would expect to be able to
restore it's "pre-committed" transaction (having remembered transaction's
id) and re attempt to commit it - which should be guaranteed to eventually
succeed.

In other words, it seems to me like the missing features of this API for me
are:
1. possibility to resume transactions after machine/process crash. At least
I would expect to be able to commit "flushed"/"pre-committed" data for such
transactions.
2. making sure that committing already committed transactions doesn't brake
anything

Or maybe there is some other way to integrate Kafka into such two phase
commit system that I'm missing?

Thanks, Piotrek

Reply via email to