[jira] [Created] (KAFKA-14402) Transactions Server Side Defense

Justine Olshan (Jira) Fri, 18 Nov 2022 13:14:06 -0800

Justine Olshan created KAFKA-14402:
--------------------------------------

             Summary: Transactions Server Side Defense
                 Key: KAFKA-14402
                 URL: https://issues.apache.org/jira/browse/KAFKA-14402
             Project: Kafka
          Issue Type: Task
            Reporter: Justine Olshan
            Assignee: Justine Olshan

We have seen hanging transactions in Kafka where the last stable offset (LSO)
does not update, we can’t clean the log (if the topic is compacted), and
read_committed consumers get stuck.

This can happen when a message gets stuck or delayed due to networking issues
or a network partition, the transaction aborts, and then the delayed message
finally comes in. The delayed message case can also violate EOS if the delayed
message comes in after the next addPartitionsToTxn request comes in.
Effectively we may see a message from a previous (aborted) transaction become
part of the next transaction.

Another way hanging transactions can occur is that a client is buggy and may
somehow try to write to a partition before it adds the partition to the
transaction. In both of these cases, we want the server to have some control to
prevent these incorrect records from being written and either causing hanging
transactions or violating Exactly once semantics (EOS) by including records in
the wrong transaction.

The best way to avoid this issue is to:
# *Uniquely identify transactions by bumping the producer epoch after every
commit/abort marker. That way, each transaction can be identified by (producer
id, epoch).*

# {*}Remove the addPartitionsToTxn call and implicitly just add partitions to
the transaction on the first produce request during a transaction{*}.

We avoid the late arrival case because the transaction is uniquely identified
and fenced AND we avoid the buggy client case because we remove the need for
the client to explicitly add partitions to begin the transaction.

Of course, 1 and 2 require client-side changes, so for older clients, those
approaches won’t apply.

3. *To cover older clients, we will ensure a transaction is ongoing before we
write to a transaction. We can do this by querying the transaction coordinator
and caching the result.*

See KIP-890 for more information: **
https://cwiki.apache.org/confluence/display/KAFKA/KIP-890%3A+Transactions+Server-Side+Defense

--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (KAFKA-14402) Transactions Server Side Defense

Reply via email to