I’m not an expert with MergeRecord, but looking at your screenshots, I’d guess that your setup is taking that long to reach one of the defined “maximum” settings, e.g. 2GB, 5,000,000 records, or 3600 seconds (1 hour).
How large (number of records and content size in bytes) are the typical FlowFiles you’re sending to Merge Record, and how often do they arrive? For example, if you’re getting results form Elasticsearch in chunks of 10,000 records per response and that’s 10MB in size every 1 second, it’s going to take a long time to meet any of the defined maximums. How have you configured the Elasticsearch processor and what are you trying to combine together in your flow? Is it that you’re outputting a single FlowFile per Response (the default setting for “Search Results Split”), then trying to merge together all responses from a single query into one FlowFile? If so, I’d suggest changing the Elasticsearch processor’s “Search Results Split” to be “Per Query” instead, and increase the “Size” setting (leaving this blank will use the Elasticsearch default page size, which is often set as “10”). You might then be able to avoid the need for MergeRecord at all and the conversion of JSON to Parquet could be done with a ConvertRecord processor instead, for example. Cheers, --- Chris Sampson IT Consultant [email protected] > On 12 Mar 2024, at 08:38, edi mari <[email protected]> wrote: > > Hi, > My task is to query Elastic and save the results in a Parquet file. > I'm querying Elastic using the PaginatedJsonQueryElasticsearch processor. > The files coming from Elastic are in JSON format, and I'm using the > MergeRecord processor to convert the JSON to Parquet format and merge the > result into one file . > The Record Reader uses the JsonTreeReader and Record Writer uses the > ParquetRecordSetWriter controller, which uses AVRO schema (Schema Text) to > help with the conversion task. > > The process works fine, the only problem is that it takes too much time. > Converting 5 MB takes 30 minutes. > > Do you have any idea how to enhance the process? > > <image.png> > > <image.png> > <image.png>
