Hi Martijn, The request for the "list" processors to support incoming flow files comes up frequently. The issue is that the list processors are meant to continuously watch a given directory/bucket and maintain state about what has been seen and only find newer stuff. So if you let the processor support incoming flow files then it means the directory can potentially be different on every execution of the processor, which then makes it problematic for maintaining state... how do we know if there will ever be another flow file indicating the same directory and whether we need to keep the state around? how much state can actually store? etc.
I don't know exactly what you're use case is, but I think it would be reasonable to support a variation of each "list" processor that supports incoming flow files, but does NOT maintain state. Meaning, it would be used to perform a one-time listing based on the incoming flow file, and if another flow file came in later with the same directory/bucket, it would have no knowledge of the previous execution and thus list everything again. -Bryan On Tue, Sep 25, 2018 at 2:02 AM Martijn Dekkers <mart...@dekkers.org.uk> wrote: > > Hi Koji, > > Thanks, that is exactly the path we took in the end. This is a repeating > pattern for us, and we would have preferred to keep it all contained in an > ISP. Since the output of the listing is very large, we run into some memory > issues at the SplitText step, so we use a few of those in sequence, which is > all a bit hacky. When we have some time we will get back to this, and > hopefully get it done "correctly". > > I am trying to work out what the reasoning is for none of the List-type > processors to accept incoming connections, we use them frequently and have to > resort to all kinds of acrobatics to work around this. In this instance we > use an external script, in some others we have to set up infrastructure > outside of NiFi to set parameters via the API. It would be a lot easier and > smoother if we could simply accept an incoming connection and use attributes. > > Thanks, > > Martijn > > On Tue, 25 Sep 2018 at 02:37, Koji Kawamura <ijokaruma...@gmail.com> wrote: >> >> Hi Martijn, >> >> I'm not an expert on Jython, but if you already have a python script >> using boto3 working fine, then I'd suggest using ExecuteStreamCommand >> instead. >> For example: >> - you can design the python script to print out JSON formatted string >> about listed files >> - then connect the outputs to SplitJson >> - and use EvaluateJsonPath to extract required values to FlowFile attribute >> - finally, use FetchS3Object >> >> Thanks, >> Koji