Sudhindra

The current ListFile processor scans through the configured directory
including any subdirectories and looks for files.  It does this by
generating a listing, comparing it to what it has seen already
(largely based on mod time) then sending out resulting listings.
These can be sent to a FetchFile process which pulls the files.

We do not offer a facility to look for the presence of a given special
'success' file.  We could, and probably at this point should since it
is a common ask, have a JIRA to add a filter to only select files in a
folder if we see a file meeting a certain name such as 'success'.

Thanks
Joe

On Mon, Jul 30, 2018 at 6:34 PM, Sudhindra Tirupati Nagaraj
<sutir...@tetrationanalytics.com> wrote:
> Hi,
>
>
>
> We just came across NIFI as a possible option for backing up our data lake
> periodically into S3. We have our pipelines that dump batches of data at
> some granularity. For example, our one-minute dumps are of the form
> “201807210617”, “201807210618”, “201807210619” etc. We are looking for a
> simple configuration based solution that reads these incoming batches
> periodically and creates a workflow for backing these up. Also, these
> batches have a “success” marker inside them that indicates that the batches
> are full and ready to be backed up. We came across the ListHDFS processor
> that can do this, without duplication, but we are not sure if it has the
> ability to only copy batches that have a particular state (that is, like
> having a success marker in them). We are not sure if it also works on
> “folders” and not files directly.
>
>
>
> Can I get some recommendations on whether NIFI can be used at for such a
> ingestion use-case/alternative? Thank you.
>
>
>
> Kind Regards,
>
> Sudhindra.

Reply via email to