For something to add to Airflow itself: I would love a more flexible
mapping between data time and processing time. The default is "n-1" (day
over day, you're aiming to process yesterday's data) but people post other
use cases on this mailing list quite frequently.
On Fri, Oct 12, 2018 at 7:46 AM
What about an exponential back off on the poke interval?
On Fri, 12 Oct 2018, 13:01 Ash Berlin-Taylor, wrote:
> That would work for some of our other uses cases (and has been an idea in
> our backlog for months) but not this case as we're reading from someone
> else's bucket so can't set up
That would work for some of our other uses cases (and has been an idea in our
backlog for months) but not this case as we're reading from someone else's
bucket so can't set up notifications etc. :(
-ash
> On 12 Oct 2018, at 11:57, Bolke de Bruin wrote:
>
> S3 Bucket notification that
A lot of our dags are ingesting data (usually daily or weekly) from suppliers,
and they are universally late.
In the case I'm setting up now the delivery lag is about 30hours - data for
2018-10-10 turned up at 2018-10-12 05:43.
I was going to just set this up with an S3KeySensor and a daily