Hi team, I have some questions about the format when I process the files

In-progress / Pending: part-<uid>-<partFileIndex>.inprogress.uid
Finished: part-<uid>-<partFileIndex>

Can you explain more about the partFileIndex since the format of the files is quite weird. It produces two files (I wonder it related to the parallelism which we have set is 2).  
part-6a13c70e-638d-4b10-820c-d7577e949e89-0-191  
part-6a13c70e-638d-4b10-820c-d7577e949e89-1-190

If so, what happens if that our data is huge but the the commit time (checkpoint around 1000ms) is small. Does it write into another files just like this part-6a13c70e-638d-4b10-820c-d7577e949e89-0-192  by increasing the final number ? Or they have the diffirent format.

Another question is that since we are using the table API, does any option that we can have to limit the files size or time that the files should closed  as it stated in the doc since I see that there is the option called ‘sink.rolling-policy.file-size’ and ‘sink.partition-commit.delay’. Does this relevant to what we want ?

One more question, although we have set 3 parallelism but at the end is still be 1. Can you explain a bit about this case for me ?



Thanks team.

Best,
Quynh.

 

 

Sent from Mail for Windows

 

Reply via email to