On Sun, Nov 22, 2015 at 3:29 AM, David Hoffer <dhoff...@gmail.com> wrote: > I'm not sure how to block the polling. > > Here is what seems like an ideal approach...the SFTP polling always runs on > schedule and downloads files with single thread to a folder. This won't > use much memory as its just copying one file at a time to the folder. Then > I'd have X threads take those files and start the decrypting/processing. > Since this part uses a lot of memory it seems I'd want to limit the number > of threads that can do this task so the max memory is contained. >
If they pickup from a local file system or shared file system using regular java io file, then you can have a route the consumes from a directory. And then use threads to use parallel processing. You can configure the from-to number of threads. And then you can also set that queue size to 0 to not have in memory tasks waiting. However those tasks will be small as its just a handle to the file (not read in memory and the file is still on disk). However if you want a dynamic pool that grows/shrink depending on free memory then this gets more tricky. > However I don't know how to do this as I'm new to Camel. > > Yes I'd really like to use streaming instead of byte[] at every step of the > processing but no idea if that's possible in my use case. Sounds like it > worked in yours. > Camel supports working with files as java.io.File / FileInputStream / InputStream. Only if you attempt to read the file as String / byte[] etc then the file is read into memory. There is also stream caching http://camel.apache.org/stream-caching.html that can offload the data to a file, but since the original data is already on file, then it makes less sense to use. However stream caching ensures the stream is re-readable. > -Dave > > On Sat, Nov 21, 2015 at 10:22 AM, mailingl...@j-b-s.de <mailingl...@j-b-s.de >> wrote: > >> I guess you need to block the polling while you process files in parallel. >> A seda queue with a capacity limit will at least block the consumer. As I >> do not know what exactly you are doing with the files, if always the same >> amount of mem per file is required it's hard to tell what mem settings to >> use. Always providing more mem is not a solution from my point of view, >> because you hit the same limit just later. >> >> Limiting messages, use of streaming / splitting will keep mem usage low >> (at least in our env it works that way and we reduced mem usage from 1G to >> 128M per VM). But if this may something for you...don't know >> >> >> Jens >> >> Von meinem iPhone gesendet >> >> > Am 21.11.2015 um 16:40 schrieb David Hoffer <dhoff...@gmail.com>: >> > >> > Yes when the sftp read thread stops it was still processing files it had >> > previously downloaded. And since we can get so many files on each poll >> > (~1000) and we have to do a lot of decrypting of these files in >> subsequent >> > routes that its possible that the processing of the 1000 files is not >> done >> > before the next poll where we get another 1000 files. Eventually the >> SFTP >> > endpoint will have less/no files and the rest of the routes can catch up. >> > All the rest of the routes are file based (except the very last) so there >> > is no harm if intermediate folders get backed up with files. >> > >> > We only have one SFTP connection for reading in this case. >> > >> > Do you think the seda approach is right for this case? I can look into >> > it. Note my previous post that in my dev environment the reason it >> stopped >> > was out of memory error...i doubt that is the same case in production as >> > the rest of the routes do not stop. >> > >> > -Dave >> > >> > On Sat, Nov 21, 2015 at 1:36 AM, mailingl...@j-b-s.de < >> mailingl...@j-b-s.de> >> > wrote: >> > >> >> Hi! >> >> >> >> when your sftp read threads stopps the files are still in process? In >> our >> >> env we had something similar in conjunction with splitting large files >> >> because the initial message is pending until all processing is >> completed. >> >> We solved it using a seda queue (limited in size) in betweeen our sfpt >> >> consumer and processing route and "parallel" execution. >> >> >> >> one sftp consumer -> seda (size limit) -> processing route (with dsl >> >> parallel) >> >> >> >> and this works without any problems. >> >> >> >> Maybe you have to many sftp connections? Maybe its entirely independent >> >> from camel and you reached a file handle limit? >> >> >> >> Jens >> >> >> >> >> >> Von meinem iPhone gesendet >> >> >> >>> Am 20.11.2015 um 23:09 schrieb David Hoffer <dhoff...@gmail.com>: >> >>> >> >>> This part I'm not clear on and it raises more questions. >> >>> >> >>> When using the JDK one generally uses the Executors factory methods to >> >>> create either a Fixed, Single or Cached thread tool. These will use a >> >>> SynchronousQueue for Cached pools and LinkedBlockingQueue for Fixed or >> >>> Single pools. In the case of SynchronousQueue there is no size...it >> >> simply >> >>> hands the new request off to either a thread in the pool or it creates >> a >> >>> new one. And in the case of LinkedBlockingQueue it uses an unbounded >> >> queue >> >>> size. Now it is possible to create a hybrid, e.g. LinkedBlockingQueue >> >> with >> >>> a max size but its not part of the factory methods or common. Another >> >>> option is the ArrayBlockingQueue which does use a max size but none of >> >> the >> >>> factory methods use this type. >> >>> >> >>> So what type of thread pool does Camel create for the default thread >> >> pool? >> >>> Since its not fixed size I assumed it would use SynchronousQueue and >> not >> >>> have a separate worker queue. However if Camel is creating a hybrid >> >> using >> >>> a LinkedBlockingQueue or ArrayBlockingQueue is there a way I can change >> >>> that to be a SynchronousQueue so no queue? Or is there a compelling >> >> reason >> >>> to use LinkedBlockingQueue in a cached pool? >> >>> >> >>> Now this gets to the problem I am trying to solve. We have a Camel app >> >>> that deals with files, lots of them...e.g. all the routes deal with >> >> files. >> >>> It starts with an sftp URL that gets files off a remote server and then >> >>> does a lot of subsequent file processing. The problem is that if the >> >> SFTP >> >>> server has 55 files (example) and I start the Camel app it processes >> them >> >>> fine until about 14 or 15 files are left and then it just stops. The >> >>> thread that does the polling of the server stops (at least it appears >> to >> >>> have stopped) and the processing of the 55 files stops, e.g. it does >> not >> >>> continue to process all of the original 55 files, it stops with 14-15 >> >> left >> >>> to process (and it never picks them up again on the next poll). And I >> >> have >> >>> a breakpoint on my custom SftpChangedExclusiveReadLockStrategy and it >> >> never >> >>> is called again. >> >>> >> >>> Now getting back to the default thread pool and changing it I would >> like >> >> to >> >>> change it so it uses more threads and no worker queue (like a standard >> >>> Executors cached thread pool) but I'm not certain that would even help >> as >> >>> in the debugger & thread dumps I see that it looks like the SFTP >> endpoint >> >>> uses a Scheduled Thread Pool instead which makes sense since its a >> >> polling >> >>> (every 60 seconds in my case) operation. So is there another default >> >> pool >> >>> that I can configure for Camel's scheduled threads? >> >>> >> >>> All that being said why would the SFTP endpoint just quit? I don't see >> >> any >> >>> blocked threads and no deadlock. I'm new to Camel and just don't know >> >>> where to look for possible causes of this. >> >>> >> >>> Thanks, >> >>> -Dave >> >>> >> >>> >> >>>> On Thu, Nov 19, 2015 at 11:40 PM, Claus Ibsen <claus.ib...@gmail.com> >> >> wrote: >> >>>> >> >>>> Yes its part of JDK as it specifies the size of the worker queue, of >> >>>> the thread pool (ThreadPoolExecutor) >> >>>> >> >>>> For more docs see >> >>>> http://camel.apache.org/threading-model.html >> >>>> >> >>>> Or the Camel in Action books >> >>>> >> >>>> >> >>>>> On Fri, Nov 20, 2015 at 12:22 AM, David Hoffer <dhoff...@gmail.com> >> >> wrote: >> >>>>> I'm trying to understand the default Camel Thread Pool and how the >> >>>>> maxQueueSize is used, or more precisely what's it for? >> >>>>> >> >>>>> I can't find any documentation on what this really is or how it's >> used. >> >>>> I >> >>>>> understand all the other parameters as they match what I'd expect >> from >> >>>> the >> >>>>> JDK...poolSize is the minimum threads to keep in the pool for new >> tasks >> >>>> and >> >>>>> maxPoolSize is the maximum number of the same. >> >>>>> >> >>>>> So how does maxQueueSize fit into this? This isn't part of the JDK >> >>>> thread >> >>>>> pool so I don't know how Camel uses this. >> >>>>> >> >>>>> The context of my question is that we have a from sftp route that >> seems >> >>>> to >> >>>>> be getting thread starved. E.g. the thread that polls the sftp >> >>>> connection >> >>>>> is slowing/stopping at times when it is busy processing other files >> >> that >> >>>>> were previously downloaded. >> >>>>> >> >>>>> We are using the default camel thread pool that I see has only a max >> of >> >>>> 20 >> >>>>> threads yet a maxQueueSize of 1000. That doesn't make any sense to >> me >> >>>>> yet. I would think one would want a much larger pool of threads (as >> we >> >>>> are >> >>>>> processing lots of files) but no queue at all...but not sure on that >> >> as I >> >>>>> don't understand how the queue is used. >> >>>>> >> >>>>> -Dave >> >>>> >> >>>> >> >>>> >> >>>> -- >> >>>> Claus Ibsen >> >>>> ----------------- >> >>>> http://davsclaus.com @davsclaus >> >>>> Camel in Action 2: https://www.manning.com/ibsen2 >> >> >> -- Claus Ibsen ----------------- http://davsclaus.com @davsclaus Camel in Action 2: https://www.manning.com/ibsen2