Hi Artem,

I had a debug of Flink 1.17.1 (running CsvFilesystemBatchITCase) and I see
the same behaviour. It's the same on master too. Jackson flushes [1] the
underlying stream after every `writeValue` call. I experimented with
disabling the flush by disabling Jackson's FLUSH_PASSED_TO_STREAM [2]
feature but this broke the Integration tests. This is because Jackson wraps
the stream in it's own Writer that buffers data. We depend on the flush to
flush the jackson writer and eventually write the bytes to the stream.

One workaround I found [3] is to wrap the stream in an implementation that
ignores flush calls, and pass that to Jackson. So Jackson will flush it's
writer buffers and write the bytes to the underlying stream, then try to
flush the underlying stream but it will be a No-Op. The CsvBulkWriter will
continues to flush/sync the underlying stream. Unfortunately this required
code changes in Flink CSV so might not be helpful for you.

1.
https://github.com/FasterXML/jackson-dataformats-text/blob/8700b5489090f81b4b8d2636f9298ac47dbf14a3/csv/src/main/java/com/fasterxml/jackson/dataformat/csv/CsvGenerator.java#L504
2.
https://fasterxml.github.io/jackson-core/javadoc/2.13/com/fasterxml/jackson/core/JsonGenerator.Feature.html#FLUSH_PASSED_TO_STREAM
3.
https://github.com/robobario/flink/commit/ae3fdb1ca9de748df791af232bba57d6d7289a79

Rob Young

Reply via email to