Typically I would not expect to schedule dataflows in NiFi as it's not the 
ideal place for data to stay sitting. For running scheduled batch jobs as you 
describe I would expect the data to be constantly flowing to date/time based 
directories on HDFS. This allows data to be stored in a place meant for storing 
data and allows jobs to run for specified time periods with any data that 
arrived during that period. 

In the past I have used a directory structure of year/month/day/hour. Eg. 
2018/09/24/12. Any data arriving during that time will be placed in those 
directories. Depending on your requirements you can bucket files into these 
directories based on collected date/time or arrival time (when it's received by 
NiFi). The scheduled batch jobs can then be configured to use the directory 
structure.

Let us know if this helps at all.
Nathan


On 9/24/18, 6:13 AM, "Vos, Walter" <walter....@ns.nl> wrote:

    Hi,
    
    I don't know what the etiquette on a mailing list is for this, but I'd like 
to bump my original question.
    
    Perhaps it's good to add that many of our flows are batch loads and 
therefore depend on a schedule to run, once.
    
    Does anyone have experience with remote scheduling in NiFi or do you think 
you have a smart take on this? Please let me know :)
    
    Cheers,
    
    Walter
    
    -----Oorspronkelijk bericht-----
    Van: Vos, Walter
    Verzonden: woensdag 5 september 2018 10:02
    Aan: users@nifi.apache.org
    Onderwerp: A sensible approach to scheduling via the API?
    
    Hi,
    
    In our big data environment one of the architectural principles is to 
schedule jobs with Azure Automation (runbooks). A scheduling database is used 
to decide when to start which jobs. NiFi flows however are currently being 
scheduled in NiFi itself. We're looking for a good approach to move this over 
to runbooks. I see a couple of options:
    
    * Have each flow start with a timer driven processor, where the run 
schedule is an hour or so. This processor will be stopped by default, and can 
be turned on via the API. It is then stopped at some point before the run 
schedule ends, preventing the processor from running twice.
    * Use a ListenHTTP processor that we can POST a message to that specifies 
which flow to start. Do something like RouteOnAttribute to choose the right 
flow. I imagine this as being one ListenHTTP processor that is connected to all 
flows.
    * Translate the schedule from the scheduling database to a ChronTrigger 
expression. Check if the CRON schedule on the processor is indeed set to that 
schedule. If not, stop the processor, change the schedule and start it again. 
If it is, do nothing and assume it'll run. This one seems convoluted on the one 
hand, but requires the least architecture within NiFi itself I imagine.
    
    What do you think? Has anyone had to deal with something like this? How did 
you solve it? I can't find much information about this on the web, although I 
could be using the wrong terms.
    
    Kind regards,
    
    Walter Vos
    
    
    ________________________________
    
    Deze e-mail, inclusief eventuele bijlagen, is uitsluitend bestemd voor 
(gebruik door) de geadresseerde. De e-mail kan persoonlijke of vertrouwelijke 
informatie bevatten. Openbaarmaking, vermenigvuldiging, verspreiding en/of 
verstrekking van (de inhoud van) deze e-mail (en eventuele bijlagen) aan derden 
is uitdrukkelijk niet toegestaan. Indien u niet de bedoelde geadresseerde bent, 
wordt u vriendelijk verzocht degene die de e-mail verzond hiervan direct op de 
hoogte te brengen en de e-mail (en eventuele bijlagen) te vernietigen.
    
    Informatie vennootschap<http://www.ns.nl/emaildisclaimer>
    
    ________________________________
    
    Deze e-mail, inclusief eventuele bijlagen, is uitsluitend bestemd voor 
(gebruik door) de geadresseerde. De e-mail kan persoonlijke of vertrouwelijke 
informatie bevatten. Openbaarmaking, vermenigvuldiging, verspreiding en/of 
verstrekking van (de inhoud van) deze e-mail (en eventuele bijlagen) aan derden 
is uitdrukkelijk niet toegestaan. Indien u niet de bedoelde geadresseerde bent, 
wordt u vriendelijk verzocht degene die de e-mail verzond hiervan direct op de 
hoogte te brengen en de e-mail (en eventuele bijlagen) te vernietigen.
    
    Informatie vennootschap<http://www.ns.nl/emaildisclaimer>
    


Reply via email to