Re: Problem reading a CSV file with pyflink datastream in k8s with Flink operator

2024-06-17 Thread Robert Young
Hi Gwenael, >From the logs I thought it was a JVM module opens/exports issue, but I found it had a similar issue using a java8 base image too. I think the issue is it's not permitted for PythonCsvUtils to call the package-private constructor of CsvReaderFormat across class loaders. One

Re: CSV format and hdfs

2024-04-25 Thread Robert Young
Hi Artem, I had a debug of Flink 1.17.1 (running CsvFilesystemBatchITCase) and I see the same behaviour. It's the same on master too. Jackson flushes [1] the underlying stream after every `writeValue` call. I experimented with disabling the flush by disabling Jackson's FLUSH_PASSED_TO_STREAM [2]

Re: Understanding checkpoint/savepoint storage requirements

2024-04-02 Thread Robert Young
eifan Wang wrote: > >> Hi Robert : >> >> Your understanding are right ! >> Add some more information : JobManager not only responsible for cleaning >> old checkpoints, but also needs to write metadata file to checkpoint >> storage after all taskmanagers have tak

Understanding checkpoint/savepoint storage requirements

2024-03-27 Thread Robert Young
Hi all, I have some questions about checkpoint and savepoint storage. >From what I understand a distributed, production-quality job with a lot of state should use durable shared storage for checkpoints and savepoints. All job managers and task managers should access the same volume. So typically