Hi team, I have some questions about the format when I process the files
In-progress / Pending: part-<uid>-<partFileIndex>.inprogress.uid Finished: part-<uid>-<partFileIndex>
Can you explain more about the partFileIndex since the format of the files is quite weird. It produces two files (I wonder it related to the parallelism which we have set is 2). part-6a13c70e-638d-4b10-820c-d7577e949e89-0-191 part-6a13c70e-638d-4b10-820c-d7577e949e89-1-190
If so, what happens if that our data is huge but the the commit time (checkpoint around 1000ms) is small. Does it write into another files just like this part-6a13c70e-638d-4b10-820c-d7577e949e89-0-192 by increasing the final number ? Or they have the diffirent format.
Another question is that since we are using the table API, does any option that we can have to limit the files size or time that the files should closed as it stated in the doc since I see that there is the option called ‘sink.rolling-policy.file-size’ and ‘sink.partition-commit.delay’. Does this relevant to what we want ?
One more question, although we have set 3 parallelism but at the end is still be 1. Can you explain a bit about this case for me ?
Thanks team.
Best, Quynh. Sent from Mail for Windows |