Re: Camel ThreadPool maxQueueSize question

Claus Ibsen Sun, 22 Nov 2015 00:34:03 -0800

On Sun, Nov 22, 2015 at 3:29 AM, David Hoffer <dhoff...@gmail.com> wrote:
> I'm not sure how to block the polling.
>
> Here is what seems like an ideal approach...the SFTP polling always runs on
> schedule and downloads files with single thread to a folder.  This won't
> use much memory as its just copying one file at a time to the folder.  Then
> I'd have X threads take those files and start the decrypting/processing.
> Since this part uses a lot of memory it seems I'd want to limit the number
> of threads that can do this task so the max memory is contained.
>


If they pickup from a local file system or shared file system using
regular java io file,
then you can have a route the consumes from a directory.
And then use threads to use parallel processing.

You can configure the from-to number of threads. And then you can also
set that queue size to 0 to not have in memory tasks waiting.
However those tasks will be small as its just a handle to the file
(not read in memory and the file is still on disk).

However if you want a dynamic pool that grows/shrink depending on free
memory then this gets more tricky.





> However I don't know how to do this as I'm new to Camel.
>
> Yes I'd really like to use streaming instead of byte[] at every step of the
> processing but no idea if that's possible in my use case.  Sounds like it
> worked in yours.
>

Camel supports working with files as java.io.File / FileInputStream /
InputStream.

Only if you attempt to read the file as String / byte[] etc then the
file is read into memory.

There is also stream caching
http://camel.apache.org/stream-caching.html

that can offload the data to a file, but since the original data is
already on file, then it makes less sense to use.
However stream caching ensures the stream is re-readable.





> -Dave
>
> On Sat, Nov 21, 2015 at 10:22 AM, mailingl...@j-b-s.de <mailingl...@j-b-s.de
>> wrote:
>
>> I guess you need to block the polling while you process files in parallel.
>> A seda queue with a capacity limit will at least block the consumer. As I
>> do not know what exactly you are doing with the files, if always the same
>> amount of mem per file is required it's hard to tell what mem settings to
>> use. Always providing more mem is not a solution from my point of view,
>> because you hit the same limit just later.
>>
>> Limiting messages, use of streaming / splitting will keep mem usage low
>> (at least in our env it works that way and we reduced mem usage from 1G to
>> 128M per VM). But if this may something for you...don't know
>>
>>
>> Jens
>>
>> Von meinem iPhone gesendet
>>
>> > Am 21.11.2015 um 16:40 schrieb David Hoffer <dhoff...@gmail.com>:
>> >
>> > Yes when the sftp read thread stops it was still processing files it had
>> > previously downloaded.  And since we can get so many files on each poll
>> > (~1000) and we have to do a lot of decrypting of these files in
>> subsequent
>> > routes that its possible that the processing of the 1000 files is not
>> done
>> > before the next poll where we get another 1000 files.  Eventually the
>> SFTP
>> > endpoint will have less/no files and the rest of the routes can catch up.
>> > All the rest of the routes are file based (except the very last) so there
>> > is no harm if intermediate folders get backed up with files.
>> >
>> > We only have one SFTP connection for reading in this case.
>> >
>> > Do you think the seda approach is right for this case?  I can look into
>> > it.  Note my previous post that in my dev environment the reason it
>> stopped
>> > was out of memory error...i doubt that is the same case in production as
>> > the rest of the routes do not stop.
>> >
>> > -Dave
>> >
>> > On Sat, Nov 21, 2015 at 1:36 AM, mailingl...@j-b-s.de <
>> mailingl...@j-b-s.de>
>> > wrote:
>> >
>> >> Hi!
>> >>
>> >> when your sftp read threads stopps the files are still in process? In
>> our
>> >> env we had something similar in conjunction with splitting large files
>> >> because the initial message is pending until all processing is
>> completed.
>> >> We solved it using a seda queue (limited in size) in betweeen our sfpt
>> >> consumer and processing route and "parallel" execution.
>> >>
>> >> one sftp consumer -> seda  (size limit) -> processing route (with dsl
>> >> parallel)
>> >>
>> >> and this works without any problems.
>> >>
>> >> Maybe you have to many sftp connections? Maybe its entirely independent
>> >> from camel and you reached a file handle limit?
>> >>
>> >> Jens
>> >>
>> >>
>> >> Von meinem iPhone gesendet
>> >>
>> >>> Am 20.11.2015 um 23:09 schrieb David Hoffer <dhoff...@gmail.com>:
>> >>>
>> >>> This part I'm not clear on and it raises more questions.
>> >>>
>> >>> When using the JDK one generally uses the Executors factory methods to
>> >>> create either a Fixed, Single or Cached thread tool.  These will use a
>> >>> SynchronousQueue for Cached pools and LinkedBlockingQueue for Fixed or
>> >>> Single pools.  In the case of SynchronousQueue there is no size...it
>> >> simply
>> >>> hands the new request off to either a thread in the pool or it creates
>> a
>> >>> new one.  And in the case of LinkedBlockingQueue it uses an unbounded
>> >> queue
>> >>> size.  Now it is possible to create a hybrid, e.g. LinkedBlockingQueue
>> >> with
>> >>> a max size but its not part of the factory methods or common.  Another
>> >>> option is the ArrayBlockingQueue which does use a max size but none of
>> >> the
>> >>> factory methods use this type.
>> >>>
>> >>> So what type of thread pool does Camel create for the default thread
>> >> pool?
>> >>> Since its not fixed size I assumed it would use SynchronousQueue and
>> not
>> >>> have a separate worker queue.  However if Camel is creating a hybrid
>> >> using
>> >>> a LinkedBlockingQueue or ArrayBlockingQueue is there a way I can change
>> >>> that to be a SynchronousQueue so no queue?  Or is there a compelling
>> >> reason
>> >>> to use LinkedBlockingQueue in a cached pool?
>> >>>
>> >>> Now this gets to the problem I am trying to solve.  We have a Camel app
>> >>> that deals with files, lots of them...e.g. all the routes deal with
>> >> files.
>> >>> It starts with an sftp URL that gets files off a remote server and then
>> >>> does a lot of subsequent file processing.  The problem is that if the
>> >> SFTP
>> >>> server has 55 files (example) and I start the Camel app it processes
>> them
>> >>> fine until about 14 or 15 files are left and then it just stops.  The
>> >>> thread that does the polling of the server stops (at least it appears
>> to
>> >>> have stopped) and the processing of the 55 files stops, e.g. it does
>> not
>> >>> continue to process all of the original 55 files, it stops with 14-15
>> >> left
>> >>> to process (and it never picks them up again on the next poll).  And I
>> >> have
>> >>> a breakpoint on my custom SftpChangedExclusiveReadLockStrategy and it
>> >> never
>> >>> is called again.
>> >>>
>> >>> Now getting back to the default thread pool and changing it I would
>> like
>> >> to
>> >>> change it so it uses more threads and no worker queue (like a standard
>> >>> Executors cached thread pool) but I'm not certain that would even help
>> as
>> >>> in the debugger & thread dumps I see that it looks like the SFTP
>> endpoint
>> >>> uses a Scheduled Thread Pool instead which makes sense since its a
>> >> polling
>> >>> (every 60 seconds in my case) operation.  So is there another default
>> >> pool
>> >>> that I can configure for Camel's scheduled threads?
>> >>>
>> >>> All that being said why would the SFTP endpoint just quit?  I don't see
>> >> any
>> >>> blocked threads and no deadlock.  I'm new to Camel and just don't know
>> >>> where to look for possible causes of this.
>> >>>
>> >>> Thanks,
>> >>> -Dave
>> >>>
>> >>>
>> >>>> On Thu, Nov 19, 2015 at 11:40 PM, Claus Ibsen <claus.ib...@gmail.com>
>> >> wrote:
>> >>>>
>> >>>> Yes its part of JDK as it specifies the size of the worker queue, of
>> >>>> the thread pool (ThreadPoolExecutor)
>> >>>>
>> >>>> For more docs see
>> >>>> http://camel.apache.org/threading-model.html
>> >>>>
>> >>>> Or the Camel in Action books
>> >>>>
>> >>>>
>> >>>>> On Fri, Nov 20, 2015 at 12:22 AM, David Hoffer <dhoff...@gmail.com>
>> >> wrote:
>> >>>>> I'm trying to understand the default Camel Thread Pool and how the
>> >>>>> maxQueueSize is used, or more precisely what's it for?
>> >>>>>
>> >>>>> I can't find any documentation on what this really is or how it's
>> used.
>> >>>> I
>> >>>>> understand all the other parameters as they match what I'd expect
>> from
>> >>>> the
>> >>>>> JDK...poolSize is the minimum threads to keep in the pool for new
>> tasks
>> >>>> and
>> >>>>> maxPoolSize is the maximum number of the same.
>> >>>>>
>> >>>>> So how does maxQueueSize fit into this?  This isn't part of the JDK
>> >>>> thread
>> >>>>> pool so I don't know how Camel uses this.
>> >>>>>
>> >>>>> The context of my question is that we have a from sftp route that
>> seems
>> >>>> to
>> >>>>> be getting thread starved.  E.g. the thread that polls the sftp
>> >>>> connection
>> >>>>> is slowing/stopping at times when it is busy processing other files
>> >> that
>> >>>>> were previously downloaded.
>> >>>>>
>> >>>>> We are using the default camel thread pool that I see has only a max
>> of
>> >>>> 20
>> >>>>> threads yet a maxQueueSize of 1000.  That doesn't make any sense to
>> me
>> >>>>> yet.  I would think one would want a much larger pool of threads (as
>> we
>> >>>> are
>> >>>>> processing lots of files) but no queue at all...but not sure on that
>> >> as I
>> >>>>> don't understand how the queue is used.
>> >>>>>
>> >>>>> -Dave
>> >>>>
>> >>>>
>> >>>>
>> >>>> --
>> >>>> Claus Ibsen
>> >>>> -----------------
>> >>>> http://davsclaus.com @davsclaus
>> >>>> Camel in Action 2: https://www.manning.com/ibsen2
>> >>
>>



-- 
Claus Ibsen
-----------------
http://davsclaus.com @davsclaus
Camel in Action 2: https://www.manning.com/ibsen2

Re: Camel ThreadPool maxQueueSize question

Reply via email to