The only thing missing is to kick off a job, in case the ask is to use
resources the batch way "use and terminate once done". An operator that
keeps an eye and has ability to kick off a job suffices. Kicking off a
batch job can be done via any of the following

1. Files
   -> Start post all data arrival. Usually a .done file in a dir, which
triggers entire dir to be processed
   -> Start asap and end on .done
2. Message (a start message)

I think batch use cases are mainly #1. This technically is not a batch vs
stream use case, just a scheduler (Oozie like) part of batch.

Thks
Amol



E:[email protected] | M: 510-449-2606 | Twitter: @*amolhkekre*

www.datatorrent.com


On Tue, Jun 13, 2017 at 11:47 PM, Ganelin, Ilya <[email protected]
> wrote:

> I think it's a very relevant use case. In the Apex formulation this would
> work as follows. An operator runs continuously and maintains an internal
> state that tracks process files or an offset (e.g. In Kafka). As more data
> becomes available, the operator performs the appropriate operation and then
> returns to waiting. In this fashion, batched data is processed as soon as
> it becomes available but the process overall is still a batch process since
> it's limited by the production of the source batches.
>
> There are a couple of examples of this in Malhar, for example the
> AbstractFileInputOperator.
>
> Your earlier comment with regards to your motivation is interesting. Can
> you elaborate on the load reduction you get with your approach? A number of
> batched small writes to a DB may prove to be more efficient from a latency
> or database utilization standpoint when compared with infrequent large
> batch writes particularly if they involve index updates.
>
>
>
>
> ------------------------------
> *From:* [email protected] <[email protected]>
> *Sent:* Tuesday, June 13, 2017 6:36:29 PM
> *To:* [email protected]; [email protected]
> *Subject:* Re: Is there a way to schedule an operator?
>
> I have input operators that reach out to Google, Facebook, Bing, Yahoo
> etc. once a day or an hour and download marketing spend statistics. Apex
> promises batch and streaming to be equal class citizens. How is this
> equality achieved if there's no scheduler for batch jobs to rely on? If
> want the dag to take data stream from batch pipeline and affect streaming
> pipelines running alongside. Do you not see this as a valid use case?
>
> Sent from Yahoo Mail on Android
> <https://overview.mail.yahoo.com/mobile/?.src=Android>
>
> On Tue, Jun 13, 2017 at 5:29 PM, Guilherme Hott
> <[email protected]> wrote:
> Hi guys,
>
> Is there a way to schedule an operator? I need an operator start the DAG
> once a day at 00am.
>
> Best
>
> --
> *Guilherme Hott*
> *Software Engineer*
> Skype: guilhermehott
> @guilhermehott
> https://www.linkedin.com/in/guilhermehott
>
>
> ------------------------------
>
> The information contained in this e-mail is confidential and/or
> proprietary to Capital One and/or its affiliates and may only be used
> solely in performance of work or services for Capital One. The information
> transmitted herewith is intended only for use by the individual or entity
> to which it is addressed. If the reader of this message is not the intended
> recipient, you are hereby notified that any review, retransmission,
> dissemination, distribution, copying or other use of, or taking of any
> action in reliance upon this information is strictly prohibited. If you
> have received this communication in error, please contact the sender and
> delete the material from your computer.
>

Reply via email to