The only thing missing is to kick off a job, in case the ask is to use resources the batch way "use and terminate once done". An operator that keeps an eye and has ability to kick off a job suffices. Kicking off a batch job can be done via any of the following
1. Files -> Start post all data arrival. Usually a .done file in a dir, which triggers entire dir to be processed -> Start asap and end on .done 2. Message (a start message) I think batch use cases are mainly #1. This technically is not a batch vs stream use case, just a scheduler (Oozie like) part of batch. Thks Amol E:[email protected] | M: 510-449-2606 | Twitter: @*amolhkekre* www.datatorrent.com On Tue, Jun 13, 2017 at 11:47 PM, Ganelin, Ilya <[email protected] > wrote: > I think it's a very relevant use case. In the Apex formulation this would > work as follows. An operator runs continuously and maintains an internal > state that tracks process files or an offset (e.g. In Kafka). As more data > becomes available, the operator performs the appropriate operation and then > returns to waiting. In this fashion, batched data is processed as soon as > it becomes available but the process overall is still a batch process since > it's limited by the production of the source batches. > > There are a couple of examples of this in Malhar, for example the > AbstractFileInputOperator. > > Your earlier comment with regards to your motivation is interesting. Can > you elaborate on the load reduction you get with your approach? A number of > batched small writes to a DB may prove to be more efficient from a latency > or database utilization standpoint when compared with infrequent large > batch writes particularly if they involve index updates. > > > > > ------------------------------ > *From:* [email protected] <[email protected]> > *Sent:* Tuesday, June 13, 2017 6:36:29 PM > *To:* [email protected]; [email protected] > *Subject:* Re: Is there a way to schedule an operator? > > I have input operators that reach out to Google, Facebook, Bing, Yahoo > etc. once a day or an hour and download marketing spend statistics. Apex > promises batch and streaming to be equal class citizens. How is this > equality achieved if there's no scheduler for batch jobs to rely on? If > want the dag to take data stream from batch pipeline and affect streaming > pipelines running alongside. Do you not see this as a valid use case? > > Sent from Yahoo Mail on Android > <https://overview.mail.yahoo.com/mobile/?.src=Android> > > On Tue, Jun 13, 2017 at 5:29 PM, Guilherme Hott > <[email protected]> wrote: > Hi guys, > > Is there a way to schedule an operator? I need an operator start the DAG > once a day at 00am. > > Best > > -- > *Guilherme Hott* > *Software Engineer* > Skype: guilhermehott > @guilhermehott > https://www.linkedin.com/in/guilhermehott > > > ------------------------------ > > The information contained in this e-mail is confidential and/or > proprietary to Capital One and/or its affiliates and may only be used > solely in performance of work or services for Capital One. The information > transmitted herewith is intended only for use by the individual or entity > to which it is addressed. If the reader of this message is not the intended > recipient, you are hereby notified that any review, retransmission, > dissemination, distribution, copying or other use of, or taking of any > action in reliance upon this information is strictly prohibited. If you > have received this communication in error, please contact the sender and > delete the material from your computer. >
