Yesterday I finished the draft of a new example on the "ETL with airflow"
site. This example explores the concept of a "Data vault" methodology on
top of Hive, 100% orchestrated by airflow:
https://gtoonstra.github.io/etl-with-airflow/datavault2.html
The theory of the data vault is that you can
Thanks for all the details! With a pluggable fetcher we would be able to
add our own logic for how to fetch so sounds like a good place to start for
something like this!
On Wed, Feb 28, 2018, 4:39 PM Joy Gao wrote:
> +1 on DagFetcher abstraction, very airflow-esque :)
>
> On
+1 on DagFetcher abstraction, very airflow-esque :)
On Wed, Feb 28, 2018 at 11:25 AM, Maxime Beauchemin
wrote:
> Addressing a few of your questions / concerns:
>
> * The scheduler uses a multiprocess queue to queue up tasks, each
> subprocess is in charge of a single
Addressing a few of your questions / concerns:
* The scheduler uses a multiprocess queue to queue up tasks, each
subprocess is in charge of a single DAG "scheduler cycle" which triggers
what it can for active DagRuns. Currently it fills the DagBag from the
local file system, looking for a
Welcome! :)
On Sun, Feb 25, 2018 at 5:16 PM, Maxime Beauchemin <
maximebeauche...@gmail.com> wrote:
> Congrats and welcome!
>
> On Sat, Feb 24, 2018 at 1:21 PM, Naik Kaxil wrote:
>
> > Congrats Ash (
> >
> > On 24/02/2018, 20:23, "fo...@driesprongen.nl on behalf of
I'll preface this with the fact that I'm relatively new to Airflow, and
haven't played around with a lot of the internals.
I find the idea of a DagFetcher interesting but would we worry about
slowing down the scheduler significantly? If the scheduler is having to
"fetch" multiple different DAG