I’m not an expert with MergeRecord, but looking at your screenshots, I’d guess 
that your setup is taking that long to reach one of the defined “maximum” 
settings, e.g. 2GB, 5,000,000 records, or 3600 seconds (1 hour).

How large (number of records and content size in bytes) are the typical 
FlowFiles you’re sending to Merge Record, and how often do they arrive? For 
example, if you’re getting results form Elasticsearch in chunks of 10,000 
records per response and that’s 10MB in size every 1 second, it’s going to take 
a long time to meet any of the defined maximums.

How have you configured the Elasticsearch processor and what are you trying to 
combine together in your flow? Is it that you’re outputting a single FlowFile 
per Response (the default setting for “Search Results Split”), then trying to 
merge together all responses from a single query into one FlowFile? If so, I’d 
suggest changing the Elasticsearch processor’s “Search Results Split” to be 
“Per Query” instead, and increase the “Size” setting (leaving this blank will 
use the Elasticsearch default page size, which is often set as “10”). You might 
then be able to avoid the need for MergeRecord at all and the conversion of 
JSON to Parquet could be done with a ConvertRecord processor instead, for 
example.


Cheers,

---
Chris Sampson
IT Consultant
[email protected]


> On 12 Mar 2024, at 08:38, edi mari <[email protected]> wrote:
> 
> Hi, 
> My task is to query Elastic and save the results in a Parquet file.
> I'm querying Elastic using the PaginatedJsonQueryElasticsearch processor. 
> The files coming from Elastic are in JSON format, and I'm using the 
> MergeRecord processor to convert the JSON to Parquet format and merge the 
> result into one file . 
> The Record Reader uses the JsonTreeReader and Record Writer uses the 
> ParquetRecordSetWriter controller, which uses AVRO schema (Schema Text) to 
> help with the conversion task. 
> 
> The process works fine, the only problem is that it takes too much time. 
> Converting 5 MB takes 30 minutes. 
> 
> Do you have any idea how to enhance the process?  
> 
> <image.png>
> 
> <image.png>
> <image.png>

Reply via email to