Re: stream one large file, only once

2016-11-14 Thread Joe Witt
OOM errors can often show you the symptom more readily than the cause. If you have SplitText after it then what Andrew mentioned is almost certainly the cause. If RouteText will meet the need I think you'll find that yields far better behavior. The way I'd do what it sounds like you're doing is:

Re: stream one large file, only once

2016-11-14 Thread Raf Huys
Thanks for making this clear! I was distracted because I do have a `java.lang.OutOfMemoryError` on the GetFile processor itself (and a matching `bytes read` spike corresponding to the file size). On Mon, Nov 14, 2016 at 2:23 PM, Joe Witt wrote: > The pattern you want for this is > > 1) GetFile

Re: stream one large file, only once

2016-11-14 Thread Joe Witt
The pattern you want for this is 1) GetFile or (ListFile + FetchFile) 2) RouteText 3) PublishKafka As Andrew points out GetFile and FetchFile do *not* read the file contents into memory. The whole point of NiFi's design in general is to take advantage of the content repository rather than forcin

Re: stream one large file, only once

2016-11-14 Thread Andrew Grande
Neither GetFile nor FetchFile read the file into memory, they only deal with the file handle and pass the contents via a handle to the content repository (NiFi streams data into and reads as a stream). What you will face, however, is an issue with a SplitText when you try to split it in 1 transact

stream one large file, only once

2016-11-14 Thread Raf Huys
I would like to read in a large (several gigs) of logdata, and route every line to a (potentially different) Kafka topic. - I don't want this file to be in memory - I want it to be read once, not more using `GetFile` takes the whole file in memory. Same with `FetchFile` as far as I can see. I al