Sudhindra The current ListFile processor scans through the configured directory including any subdirectories and looks for files. It does this by generating a listing, comparing it to what it has seen already (largely based on mod time) then sending out resulting listings. These can be sent to a FetchFile process which pulls the files.
We do not offer a facility to look for the presence of a given special 'success' file. We could, and probably at this point should since it is a common ask, have a JIRA to add a filter to only select files in a folder if we see a file meeting a certain name such as 'success'. Thanks Joe On Mon, Jul 30, 2018 at 6:34 PM, Sudhindra Tirupati Nagaraj <sutir...@tetrationanalytics.com> wrote: > Hi, > > > > We just came across NIFI as a possible option for backing up our data lake > periodically into S3. We have our pipelines that dump batches of data at > some granularity. For example, our one-minute dumps are of the form > “201807210617”, “201807210618”, “201807210619” etc. We are looking for a > simple configuration based solution that reads these incoming batches > periodically and creates a workflow for backing these up. Also, these > batches have a “success” marker inside them that indicates that the batches > are full and ready to be backed up. We came across the ListHDFS processor > that can do this, without duplication, but we are not sure if it has the > ability to only copy batches that have a particular state (that is, like > having a success marker in them). We are not sure if it also works on > “folders” and not files directly. > > > > Can I get some recommendations on whether NIFI can be used at for such a > ingestion use-case/alternative? Thank you. > > > > Kind Regards, > > Sudhindra.