Typically I would not expect to schedule dataflows in NiFi as it's not the ideal place for data to stay sitting. For running scheduled batch jobs as you describe I would expect the data to be constantly flowing to date/time based directories on HDFS. This allows data to be stored in a place meant for storing data and allows jobs to run for specified time periods with any data that arrived during that period.
In the past I have used a directory structure of year/month/day/hour. Eg. 2018/09/24/12. Any data arriving during that time will be placed in those directories. Depending on your requirements you can bucket files into these directories based on collected date/time or arrival time (when it's received by NiFi). The scheduled batch jobs can then be configured to use the directory structure. Let us know if this helps at all. Nathan On 9/24/18, 6:13 AM, "Vos, Walter" <walter....@ns.nl> wrote: Hi, I don't know what the etiquette on a mailing list is for this, but I'd like to bump my original question. Perhaps it's good to add that many of our flows are batch loads and therefore depend on a schedule to run, once. Does anyone have experience with remote scheduling in NiFi or do you think you have a smart take on this? Please let me know :) Cheers, Walter -----Oorspronkelijk bericht----- Van: Vos, Walter Verzonden: woensdag 5 september 2018 10:02 Aan: users@nifi.apache.org Onderwerp: A sensible approach to scheduling via the API? Hi, In our big data environment one of the architectural principles is to schedule jobs with Azure Automation (runbooks). A scheduling database is used to decide when to start which jobs. NiFi flows however are currently being scheduled in NiFi itself. We're looking for a good approach to move this over to runbooks. I see a couple of options: * Have each flow start with a timer driven processor, where the run schedule is an hour or so. This processor will be stopped by default, and can be turned on via the API. It is then stopped at some point before the run schedule ends, preventing the processor from running twice. * Use a ListenHTTP processor that we can POST a message to that specifies which flow to start. Do something like RouteOnAttribute to choose the right flow. I imagine this as being one ListenHTTP processor that is connected to all flows. * Translate the schedule from the scheduling database to a ChronTrigger expression. Check if the CRON schedule on the processor is indeed set to that schedule. If not, stop the processor, change the schedule and start it again. If it is, do nothing and assume it'll run. This one seems convoluted on the one hand, but requires the least architecture within NiFi itself I imagine. What do you think? Has anyone had to deal with something like this? How did you solve it? I can't find much information about this on the web, although I could be using the wrong terms. Kind regards, Walter Vos ________________________________ Deze e-mail, inclusief eventuele bijlagen, is uitsluitend bestemd voor (gebruik door) de geadresseerde. De e-mail kan persoonlijke of vertrouwelijke informatie bevatten. Openbaarmaking, vermenigvuldiging, verspreiding en/of verstrekking van (de inhoud van) deze e-mail (en eventuele bijlagen) aan derden is uitdrukkelijk niet toegestaan. Indien u niet de bedoelde geadresseerde bent, wordt u vriendelijk verzocht degene die de e-mail verzond hiervan direct op de hoogte te brengen en de e-mail (en eventuele bijlagen) te vernietigen. Informatie vennootschap<http://www.ns.nl/emaildisclaimer> ________________________________ Deze e-mail, inclusief eventuele bijlagen, is uitsluitend bestemd voor (gebruik door) de geadresseerde. De e-mail kan persoonlijke of vertrouwelijke informatie bevatten. Openbaarmaking, vermenigvuldiging, verspreiding en/of verstrekking van (de inhoud van) deze e-mail (en eventuele bijlagen) aan derden is uitdrukkelijk niet toegestaan. Indien u niet de bedoelde geadresseerde bent, wordt u vriendelijk verzocht degene die de e-mail verzond hiervan direct op de hoogte te brengen en de e-mail (en eventuele bijlagen) te vernietigen. Informatie vennootschap<http://www.ns.nl/emaildisclaimer>