Adding the readlock option reveals a camelLock file so I assume it processes the file asynchronous.
Any takers on this?
/M
Hi
Has there been a strategic change to the way the File component processes multiple files in one directory in version 3?
It seems that it process them in parallel which in our situation creates a memory issue.
Code:
from(file("{{esma.full.path}}")
.delete(true)
.sortBy("${file:name}"))
.description("Full Import", "Imports FUL files and persists in database", "en")
.autoStartup("{{esma.full.startup}}")
.streamCaching()
.log("Processing file ${file:name}")
.unmarshal()
.zipFile()
.split()
.tokenizeXML("RefData")
.streaming()
.parallelProcessing(true)
.bean(XmlToSqlBean.class)
.choice()
.when(body().isNotNull())
.to(jdbc("default"))
.to(log("Full Import").level(LoggingLevel.INFO.toString())
.groupInterval(60_000L)
.groupActiveOnly(true))
.when(simple("${header.CamelSplitComplete} == true"))
.log("Number of records split: ${header.CamelSplitSize}")
.log("Importing complete: ${header.CamelFileName}")
.endChoice()
.end();
This route processes several zip files, unmarshals them and adds records to a database.
The logs seems to reveal this scenario:
It logs every file in the directory like as if it was processing them in parallel and then the ThroughputLogger starts printing every minute. This logger is using one thread.
2021-10-25 08:31:58.106 [Camel (FIRDSDatabase) thread #5 - file://FIRDS/input/full] INFO Full Import.log - Processing file FULINS_C_20211023_01of01.zip
2021-10-25 08:32:00.273 [Camel (FIRDSDatabase) thread #5 - file://FIRDS/input/full] INFO Full Import.log - Processing file FULINS_D_20211023_01of03.zip
2021-10-25 08:32:10.922 [Camel (FIRDSDatabase) thread #5 - file://FIRDS/input/full] INFO Full Import.log - Processing file FULINS_D_20211023_02of03.zip
2021-10-25 08:32:19.126 [Camel (FIRDSDatabase) thread #5 - file://FIRDS/input/full] INFO Full Import.log - Processing file FULINS_D_20211023_03of03.zip
2021-10-25 08:32:19.762 [Camel (FIRDSDatabase) thread #5 - file://FIRDS/input/full] INFO Full Import.log - Processing file FULINS_E_20211023_01of02.zip
2021-10-25 08:32:25.621 [Camel (FIRDSDatabase) thread #5 - file://FIRDS/input/full] INFO Full Import.log - Processing file FULINS_F_20211023_01of01.zip
2021-10-25 08:32:26.911 [Camel (FIRDSDatabase) thread #5 - file://FIRDS/input/full] INFO Full Import.log - Processing file FULINS_H_20211023_01of01.zip
2021-10-25 08:32:31.961 [Camel (FIRDSDatabase) thread #5 - file://FIRDS/input/full] INFO Full Import.log - Processing file FULINS_J_20211023_01of01.zip
2021-10-25 08:32:36.249 [Camel (FIRDSDatabase) thread #5 - file://FIRDS/input/full] INFO Full Import.log - Processing file FULINS_O_20211023_01of02.zip
2021-10-25 08:32:41.654 [Camel (FIRDSDatabase) thread #5 - file://FIRDS/input/full] INFO Full Import.log - Processing file FULINS_O_20211023_02of02.zip
2021-10-25 08:32:44.830 [Camel (FIRDSDatabase) thread #5 - file://FIRDS/input/full] INFO Full Import.log - Processing file FULINS_R_20211023_01of06.zip
2021-10-25 08:32:49.406 [Camel (FIRDSDatabase) thread #3 - ThroughputLogger] INFO Full Import.log - Received: 35392 new messages, with total 35392 so far. Last group took: 48977 millis which is: 722.625 messages per second. average: 722.625
2021-10-25 08:32:49.724 [Camel (FIRDSDatabase) thread #5 - file://FIRDS/input/full] INFO Full Import.log - Processing file FULINS_R_20211023_02of06.zip
2021-10-25 08:32:54.880 [Camel (FIRDSDatabase) thread #5 - file://FIRDS/input/full] INFO Full Import.log - Processing file FULINS_R_20211023_03of06.zip
2021-10-25 08:33:00.867 [Camel (FIRDSDatabase) thread #5 - file://FIRDS/input/full] INFO Full Import.log - Processing file FULINS_R_20211023_04of06.zip
2021-10-25 08:33:06.265 [Camel (FIRDSDatabase) thread #5 - file://FIRDS/input/full] INFO Full Import.log - Processing file FULINS_R_20211023_05of06.zip
2021-10-25 08:33:11.222 [Camel (FIRDSDatabase) thread #5 - file://FIRDS/input/full] INFO Full Import.log - Processing file FULINS_R_20211023_06of06.zip
2021-10-25 08:33:14.923 [Camel (FIRDSDatabase) thread #5 - file://FIRDS/input/full] INFO Full Import.log - Processing file FULINS_S_20211023_01of02.zip
2021-10-25 08:33:20.119 [Camel (FIRDSDatabase) thread #5 - file://FIRDS/input/full] INFO Full Import.log - Processing file FULINS_S_20211023_02of02.zip
We are using parallel processing for each zip file content (XML) but not for the files themselves.
If I don't use StreamCaching it will create a havoc on the server with OutMemoryException and stuff.
This runs Spring Boot 2.5.6 and Camel 3.11.3
Maybe I have done it in a wrong way but file processing is a bread and butter EIP so it shouldn't be a concern but still…
The files are around 15MB zipped, unzipped one XML file of size 0,5 GB. Each XML file contains around 500K records to split on. This is critical memory issue, I know, but it wouldn't be if the files are processed sequentially.
Looking at the database connections (using a hikariCP Pool) I see 12 connections active, assuming this is equivalent to the amount of threads in the split. It performs around 800 records / second.
Please advise
/M