I think it's a very relevant use case. In the Apex formulation this would work 
as follows. An operator runs continuously and maintains an internal state that 
tracks process files or an offset (e.g. In Kafka). As more data becomes 
available, the operator performs the appropriate operation and then returns to 
waiting. In this fashion, batched data is processed as soon as it becomes 
available but the process overall is still a batch process since it's limited 
by the production of the source batches.

There are a couple of examples of this in Malhar, for example the 
AbstractFileInputOperator.

Your earlier comment with regards to your motivation is interesting. Can you 
elaborate on the load reduction you get with your approach? A number of batched 
small writes to a DB may prove to be more efficient from a latency or database 
utilization standpoint when compared with infrequent large batch writes 
particularly if they involve index updates.




________________________________
From: [email protected] <[email protected]>
Sent: Tuesday, June 13, 2017 6:36:29 PM
To: [email protected]; [email protected]
Subject: Re: Is there a way to schedule an operator?

I have input operators that reach out to Google, Facebook, Bing, Yahoo etc. 
once a day or an hour and download marketing spend statistics. Apex promises 
batch and streaming to be equal class citizens. How is this equality achieved 
if there's no scheduler for batch jobs to rely on? If want the dag to take data 
stream from batch pipeline and affect streaming pipelines running alongside. Do 
you not see this as a valid use case?

Sent from Yahoo Mail on 
Android<https://overview.mail.yahoo.com/mobile/?.src=Android>

On Tue, Jun 13, 2017 at 5:29 PM, Guilherme Hott
<[email protected]> wrote:
Hi guys,

Is there a way to schedule an operator? I need an operator start the DAG once a 
day at 00am.

Best

--
Guilherme Hott
Software Engineer
Skype: guilhermehott
@guilhermehott
https://www.linkedin.com/in/guilhermehott

________________________________________________________

The information contained in this e-mail is confidential and/or proprietary to 
Capital One and/or its affiliates and may only be used solely in performance of 
work or services for Capital One. The information transmitted herewith is 
intended only for use by the individual or entity to which it is addressed. If 
the reader of this message is not the intended recipient, you are hereby 
notified that any review, retransmission, dissemination, distribution, copying 
or other use of, or taking of any action in reliance upon this information is 
strictly prohibited. If you have received this communication in error, please 
contact the sender and delete the material from your computer.

Reply via email to