Hi all I have a beam pipeline running with cloud dataflow that produces avro files on GCS. Window duration is 1 minute and currently the job is running with 64 cores (16 * n1-standard-4). Per minute the data produced is around 2GB.
Is there any recommendation on the number of avro files to specify? Currently I'm using 64 (to match with the number of cores). Will a very high number help in increasing the write throughput? I saw that BigqueryIO with FILE_LOADS is using a default value of 1000 files. I tried some random values, but couldn't infer a pattern when is it more performant. Any suggestion is hugely appreciated. Best Ziyad