Hi,
I have some questions about the new StreamingFileSink in 1.6.
My usecase is pretty simple.
I have a cassandra table with 150Millions of lines.
They are partitioned by buckets of 100 000 lines.
My job is to export each "bucket" to a file (1 bucket = 1 file), so the job
is degined like this:
Hi Benoit,
Thanks for using the StreamingFileSink. My answers/explanations are inlined.
In most of your observations, you are correct.
> On Aug 21, 2018, at 11:45 PM, Benoit MERIAUX wrote:
>
> Hi,
>
> I have some questions about the new StreamingFileSink in 1.6.
>
> My usecase is pretty simpl
Hi Kostas,
Sorry for jumping in on this discussion :)
What you suggest for finite sources and waiting for checkpoints is pretty
ugly in many cases. Especially if you would otherwise read from a finite
source (a file for instance) and want to end the job asap.
Would it make sense to not discard a
Thanks for the detailed answer.
The actual behavior is correct and due to the legacy which do not make a
difference between success and failure when closing the sink.
So the workaround is to use a short bucket interval to commit the last
received data and wait for the next checkpoint (how do I do i