I found the cause of the SFTP endpoint failures, or rather I found a
solution.  We are using Camel 2.8.2 which uses JSCH 0.1.41 and that was the
problem, when I updated JSCH to 0.1.51 it resolved the problem.

To be specific, the problem we were able to confirm in production was that
the 2.8.2/0.1.41 combination would work fine as long as the number of files
on the SFTP server was small.  As soon as this number rises then what
happens is that the first poll finds all files on the server and it passes
each of those files to the exclusiveReadLockStrategy bean properly however
on the second poll and the exclusiveReadLockStrategy returns true to obtain
an exclusive lock...then Camel processing stops.  The file is not
downloaded and the next poll does not happen

I'd like to know why that version combination causes that error and why
just changing JSCH to 0.1.51 fixed this problem.

-Dave

On Sun, Nov 22, 2015 at 1:33 AM, Claus Ibsen <claus.ib...@gmail.com> wrote:

> On Sun, Nov 22, 2015 at 3:29 AM, David Hoffer <dhoff...@gmail.com> wrote:
> > I'm not sure how to block the polling.
> >
> > Here is what seems like an ideal approach...the SFTP polling always runs
> on
> > schedule and downloads files with single thread to a folder.  This won't
> > use much memory as its just copying one file at a time to the folder.
> Then
> > I'd have X threads take those files and start the decrypting/processing.
> > Since this part uses a lot of memory it seems I'd want to limit the
> number
> > of threads that can do this task so the max memory is contained.
> >
>
> If they pickup from a local file system or shared file system using
> regular java io file,
> then you can have a route the consumes from a directory.
> And then use threads to use parallel processing.
>
> You can configure the from-to number of threads. And then you can also
> set that queue size to 0 to not have in memory tasks waiting.
> However those tasks will be small as its just a handle to the file
> (not read in memory and the file is still on disk).
>
> However if you want a dynamic pool that grows/shrink depending on free
> memory then this gets more tricky.
>
>
>
>
>
> > However I don't know how to do this as I'm new to Camel.
> >
> > Yes I'd really like to use streaming instead of byte[] at every step of
> the
> > processing but no idea if that's possible in my use case.  Sounds like it
> > worked in yours.
> >
>
> Camel supports working with files as java.io.File / FileInputStream /
> InputStream.
>
> Only if you attempt to read the file as String / byte[] etc then the
> file is read into memory.
>
> There is also stream caching
> http://camel.apache.org/stream-caching.html
>
> that can offload the data to a file, but since the original data is
> already on file, then it makes less sense to use.
> However stream caching ensures the stream is re-readable.
>
>
>
>
>
> > -Dave
> >
> > On Sat, Nov 21, 2015 at 10:22 AM, mailingl...@j-b-s.de <
> mailingl...@j-b-s.de
> >> wrote:
> >
> >> I guess you need to block the polling while you process files in
> parallel.
> >> A seda queue with a capacity limit will at least block the consumer. As
> I
> >> do not know what exactly you are doing with the files, if always the
> same
> >> amount of mem per file is required it's hard to tell what mem settings
> to
> >> use. Always providing more mem is not a solution from my point of view,
> >> because you hit the same limit just later.
> >>
> >> Limiting messages, use of streaming / splitting will keep mem usage low
> >> (at least in our env it works that way and we reduced mem usage from 1G
> to
> >> 128M per VM). But if this may something for you...don't know
> >>
> >>
> >> Jens
> >>
> >> Von meinem iPhone gesendet
> >>
> >> > Am 21.11.2015 um 16:40 schrieb David Hoffer <dhoff...@gmail.com>:
> >> >
> >> > Yes when the sftp read thread stops it was still processing files it
> had
> >> > previously downloaded.  And since we can get so many files on each
> poll
> >> > (~1000) and we have to do a lot of decrypting of these files in
> >> subsequent
> >> > routes that its possible that the processing of the 1000 files is not
> >> done
> >> > before the next poll where we get another 1000 files.  Eventually the
> >> SFTP
> >> > endpoint will have less/no files and the rest of the routes can catch
> up.
> >> > All the rest of the routes are file based (except the very last) so
> there
> >> > is no harm if intermediate folders get backed up with files.
> >> >
> >> > We only have one SFTP connection for reading in this case.
> >> >
> >> > Do you think the seda approach is right for this case?  I can look
> into
> >> > it.  Note my previous post that in my dev environment the reason it
> >> stopped
> >> > was out of memory error...i doubt that is the same case in production
> as
> >> > the rest of the routes do not stop.
> >> >
> >> > -Dave
> >> >
> >> > On Sat, Nov 21, 2015 at 1:36 AM, mailingl...@j-b-s.de <
> >> mailingl...@j-b-s.de>
> >> > wrote:
> >> >
> >> >> Hi!
> >> >>
> >> >> when your sftp read threads stopps the files are still in process? In
> >> our
> >> >> env we had something similar in conjunction with splitting large
> files
> >> >> because the initial message is pending until all processing is
> >> completed.
> >> >> We solved it using a seda queue (limited in size) in betweeen our
> sfpt
> >> >> consumer and processing route and "parallel" execution.
> >> >>
> >> >> one sftp consumer -> seda  (size limit) -> processing route (with dsl
> >> >> parallel)
> >> >>
> >> >> and this works without any problems.
> >> >>
> >> >> Maybe you have to many sftp connections? Maybe its entirely
> independent
> >> >> from camel and you reached a file handle limit?
> >> >>
> >> >> Jens
> >> >>
> >> >>
> >> >> Von meinem iPhone gesendet
> >> >>
> >> >>> Am 20.11.2015 um 23:09 schrieb David Hoffer <dhoff...@gmail.com>:
> >> >>>
> >> >>> This part I'm not clear on and it raises more questions.
> >> >>>
> >> >>> When using the JDK one generally uses the Executors factory methods
> to
> >> >>> create either a Fixed, Single or Cached thread tool.  These will
> use a
> >> >>> SynchronousQueue for Cached pools and LinkedBlockingQueue for Fixed
> or
> >> >>> Single pools.  In the case of SynchronousQueue there is no size...it
> >> >> simply
> >> >>> hands the new request off to either a thread in the pool or it
> creates
> >> a
> >> >>> new one.  And in the case of LinkedBlockingQueue it uses an
> unbounded
> >> >> queue
> >> >>> size.  Now it is possible to create a hybrid, e.g.
> LinkedBlockingQueue
> >> >> with
> >> >>> a max size but its not part of the factory methods or common.
> Another
> >> >>> option is the ArrayBlockingQueue which does use a max size but none
> of
> >> >> the
> >> >>> factory methods use this type.
> >> >>>
> >> >>> So what type of thread pool does Camel create for the default thread
> >> >> pool?
> >> >>> Since its not fixed size I assumed it would use SynchronousQueue and
> >> not
> >> >>> have a separate worker queue.  However if Camel is creating a hybrid
> >> >> using
> >> >>> a LinkedBlockingQueue or ArrayBlockingQueue is there a way I can
> change
> >> >>> that to be a SynchronousQueue so no queue?  Or is there a compelling
> >> >> reason
> >> >>> to use LinkedBlockingQueue in a cached pool?
> >> >>>
> >> >>> Now this gets to the problem I am trying to solve.  We have a Camel
> app
> >> >>> that deals with files, lots of them...e.g. all the routes deal with
> >> >> files.
> >> >>> It starts with an sftp URL that gets files off a remote server and
> then
> >> >>> does a lot of subsequent file processing.  The problem is that if
> the
> >> >> SFTP
> >> >>> server has 55 files (example) and I start the Camel app it processes
> >> them
> >> >>> fine until about 14 or 15 files are left and then it just stops.
> The
> >> >>> thread that does the polling of the server stops (at least it
> appears
> >> to
> >> >>> have stopped) and the processing of the 55 files stops, e.g. it does
> >> not
> >> >>> continue to process all of the original 55 files, it stops with
> 14-15
> >> >> left
> >> >>> to process (and it never picks them up again on the next poll).
> And I
> >> >> have
> >> >>> a breakpoint on my custom SftpChangedExclusiveReadLockStrategy and
> it
> >> >> never
> >> >>> is called again.
> >> >>>
> >> >>> Now getting back to the default thread pool and changing it I would
> >> like
> >> >> to
> >> >>> change it so it uses more threads and no worker queue (like a
> standard
> >> >>> Executors cached thread pool) but I'm not certain that would even
> help
> >> as
> >> >>> in the debugger & thread dumps I see that it looks like the SFTP
> >> endpoint
> >> >>> uses a Scheduled Thread Pool instead which makes sense since its a
> >> >> polling
> >> >>> (every 60 seconds in my case) operation.  So is there another
> default
> >> >> pool
> >> >>> that I can configure for Camel's scheduled threads?
> >> >>>
> >> >>> All that being said why would the SFTP endpoint just quit?  I don't
> see
> >> >> any
> >> >>> blocked threads and no deadlock.  I'm new to Camel and just don't
> know
> >> >>> where to look for possible causes of this.
> >> >>>
> >> >>> Thanks,
> >> >>> -Dave
> >> >>>
> >> >>>
> >> >>>> On Thu, Nov 19, 2015 at 11:40 PM, Claus Ibsen <
> claus.ib...@gmail.com>
> >> >> wrote:
> >> >>>>
> >> >>>> Yes its part of JDK as it specifies the size of the worker queue,
> of
> >> >>>> the thread pool (ThreadPoolExecutor)
> >> >>>>
> >> >>>> For more docs see
> >> >>>> http://camel.apache.org/threading-model.html
> >> >>>>
> >> >>>> Or the Camel in Action books
> >> >>>>
> >> >>>>
> >> >>>>> On Fri, Nov 20, 2015 at 12:22 AM, David Hoffer <
> dhoff...@gmail.com>
> >> >> wrote:
> >> >>>>> I'm trying to understand the default Camel Thread Pool and how the
> >> >>>>> maxQueueSize is used, or more precisely what's it for?
> >> >>>>>
> >> >>>>> I can't find any documentation on what this really is or how it's
> >> used.
> >> >>>> I
> >> >>>>> understand all the other parameters as they match what I'd expect
> >> from
> >> >>>> the
> >> >>>>> JDK...poolSize is the minimum threads to keep in the pool for new
> >> tasks
> >> >>>> and
> >> >>>>> maxPoolSize is the maximum number of the same.
> >> >>>>>
> >> >>>>> So how does maxQueueSize fit into this?  This isn't part of the
> JDK
> >> >>>> thread
> >> >>>>> pool so I don't know how Camel uses this.
> >> >>>>>
> >> >>>>> The context of my question is that we have a from sftp route that
> >> seems
> >> >>>> to
> >> >>>>> be getting thread starved.  E.g. the thread that polls the sftp
> >> >>>> connection
> >> >>>>> is slowing/stopping at times when it is busy processing other
> files
> >> >> that
> >> >>>>> were previously downloaded.
> >> >>>>>
> >> >>>>> We are using the default camel thread pool that I see has only a
> max
> >> of
> >> >>>> 20
> >> >>>>> threads yet a maxQueueSize of 1000.  That doesn't make any sense
> to
> >> me
> >> >>>>> yet.  I would think one would want a much larger pool of threads
> (as
> >> we
> >> >>>> are
> >> >>>>> processing lots of files) but no queue at all...but not sure on
> that
> >> >> as I
> >> >>>>> don't understand how the queue is used.
> >> >>>>>
> >> >>>>> -Dave
> >> >>>>
> >> >>>>
> >> >>>>
> >> >>>> --
> >> >>>> Claus Ibsen
> >> >>>> -----------------
> >> >>>> http://davsclaus.com @davsclaus
> >> >>>> Camel in Action 2: https://www.manning.com/ibsen2
> >> >>
> >>
>
>
>
> --
> Claus Ibsen
> -----------------
> http://davsclaus.com @davsclaus
> Camel in Action 2: https://www.manning.com/ibsen2
>

Reply via email to