Hi Flavio,
others might have better ideas to solve this but I'll give it a try: Have
you considered extending FileOutputFormat to achieve what you need? That
approach (which is discussed in [1]) sounds like something you could do.
Another pointer I want to give is the DefaultRollingPolicy [2]. It looks
like it partially does what you're looking for. I'm adding Kostas to this
conversation as he worked on the RollingPolicy. Maybe, he has some more
insights.

I hope that helps.

Best,
Matthias

[1]
http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/output-writer-td2296.html
[2]
https://github.com/apache/flink/blob/5ff96966b59e0d9a7b55ebae9e252b1c9aafd4ea/flink-connectors/flink-file-sink-common/src/main/java/org/apache/flink/streaming/api/functions/sink/filesystem/rollingpolicies/DefaultRollingPolicy.java#L40

On Fri, Nov 27, 2020 at 11:07 AM Flavio Pompermaier <pomperma...@okkam.it>
wrote:

> Hello guys,
> I have to write my batch data (Dataset<Row>) to a file format. Actually
> what I need to do is:
>
>    1. split the data if it exceeds some size threshold  (by line count or
>    max MB)
>    2. compress the output data (possibly without converting to the hadoop
>    format)
>
> Are there any suggestions / recommendations about that?
>
> Best,
> Flavio
>

Reply via email to