This is an automated email from the ASF dual-hosted git repository. github-bot pushed a commit to branch asf-site in repository https://gitbox.apache.org/repos/asf/airflow-site.git
The following commit(s) were added to refs/heads/asf-site by this push: new 048a9d4 Update asf-site to output generated at 73e3138 048a9d4 is described below commit 048a9d4ef42b9143dc754201a49f74460cbe953b Author: mik-laj <mik-...@users.noreply.github.com> AuthorDate: Thu Sep 3 11:43:32 2020 +0000 Update asf-site to output generated at 73e3138 --- blog/airflow-1.10.10/index.html | 4 +- blog/airflow-1.10.12/index.html | 4 +- blog/airflow-1.10.8-1.10.9/index.html | 4 +- blog/airflow-survey/index.html | 4 +- blog/announcing-new-website/index.html | 4 +- blog/apache-airflow-for-newcomers/index.html | 4 +- .../index.html | 4 +- .../index.html | 4 +- .../index.html | 4 +- .../index.html | 4 +- .../index.html | 4 +- .../index.html | 4 +- index.html | 32 +++--- search/index.html | 4 +- sitemap.xml | 115 +++++++++++---------- use-cases/adobe/index.html | 4 +- use-cases/big-fish-games/index.html | 4 +- use-cases/dish/index.html | 4 +- use-cases/experity/index.html | 4 +- use-cases/index.html | 22 ++++ use-cases/index.xml | 38 +++++++ use-cases/onefootball/index.html | 8 +- use-cases/{onefootball => sift}/index.html | 51 ++++----- usecase-logos/sift_logo.png | Bin 0 -> 42233 bytes 24 files changed, 202 insertions(+), 132 deletions(-) diff --git a/blog/airflow-1.10.10/index.html b/blog/airflow-1.10.10/index.html index 247c1dd..6f7f2b4 100644 --- a/blog/airflow-1.10.10/index.html +++ b/blog/airflow-1.10.10/index.html @@ -36,13 +36,13 @@ <meta property="og:image" content="/images/feature-image.png" /> <meta property="article:published_time" content="2020-04-09T00:00:00+00:00" /> -<meta property="article:modified_time" content="2020-08-30T12:04:33+02:00" /><meta property="og:site_name" content="Apache Airflow" /> +<meta property="article:modified_time" content="2020-09-03T04:39:51-07:00" /><meta property="og:site_name" content="Apache Airflow" /> <meta itemprop="name" content="Apache Airflow 1.10.10"> <meta itemprop="description" content="We are happy to present Apache Airflow 1.10.10"> <meta itemprop="datePublished" content="2020-04-09T00:00:00+00:00" /> -<meta itemprop="dateModified" content="2020-08-30T12:04:33+02:00" /> +<meta itemprop="dateModified" content="2020-09-03T04:39:51-07:00" /> <meta itemprop="wordCount" content="1143"> diff --git a/blog/airflow-1.10.12/index.html b/blog/airflow-1.10.12/index.html index 6a43dbf..f7aee46 100644 --- a/blog/airflow-1.10.12/index.html +++ b/blog/airflow-1.10.12/index.html @@ -36,13 +36,13 @@ <meta property="og:image" content="/images/feature-image.png" /> <meta property="article:published_time" content="2020-08-25T00:00:00+00:00" /> -<meta property="article:modified_time" content="2020-08-30T12:04:33+02:00" /><meta property="og:site_name" content="Apache Airflow" /> +<meta property="article:modified_time" content="2020-09-03T04:39:51-07:00" /><meta property="og:site_name" content="Apache Airflow" /> <meta itemprop="name" content="Apache Airflow 1.10.12"> <meta itemprop="description" content="We are happy to present Apache Airflow 1.10.12"> <meta itemprop="datePublished" content="2020-08-25T00:00:00+00:00" /> -<meta itemprop="dateModified" content="2020-08-30T12:04:33+02:00" /> +<meta itemprop="dateModified" content="2020-09-03T04:39:51-07:00" /> <meta itemprop="wordCount" content="911"> diff --git a/blog/airflow-1.10.8-1.10.9/index.html b/blog/airflow-1.10.8-1.10.9/index.html index 0f57cde..c163dd3 100644 --- a/blog/airflow-1.10.8-1.10.9/index.html +++ b/blog/airflow-1.10.8-1.10.9/index.html @@ -36,13 +36,13 @@ <meta property="og:image" content="/images/feature-image.png" /> <meta property="article:published_time" content="2020-02-23T00:00:00+00:00" /> -<meta property="article:modified_time" content="2020-08-30T12:04:33+02:00" /><meta property="og:site_name" content="Apache Airflow" /> +<meta property="article:modified_time" content="2020-09-03T04:39:51-07:00" /><meta property="og:site_name" content="Apache Airflow" /> <meta itemprop="name" content="Apache Airflow 1.10.8 & 1.10.9"> <meta itemprop="description" content="We are happy to present the new 1.10.8 and 1.10.9 releases of Apache Airflow."> <meta itemprop="datePublished" content="2020-02-23T00:00:00+00:00" /> -<meta itemprop="dateModified" content="2020-08-30T12:04:33+02:00" /> +<meta itemprop="dateModified" content="2020-09-03T04:39:51-07:00" /> <meta itemprop="wordCount" content="437"> diff --git a/blog/airflow-survey/index.html b/blog/airflow-survey/index.html index d84a68e..3766a1b 100644 --- a/blog/airflow-survey/index.html +++ b/blog/airflow-survey/index.html @@ -36,13 +36,13 @@ <meta property="og:image" content="/images/feature-image.png" /> <meta property="article:published_time" content="2019-12-11T00:00:00+00:00" /> -<meta property="article:modified_time" content="2020-08-30T12:04:33+02:00" /><meta property="og:site_name" content="Apache Airflow" /> +<meta property="article:modified_time" content="2020-09-03T04:39:51-07:00" /><meta property="og:site_name" content="Apache Airflow" /> <meta itemprop="name" content="Airflow Survey 2019"> <meta itemprop="description" content="Receiving and adjusting to our users’ feedback is a must. Let’s see who Airflow users are, how they play with it, and what they miss."> <meta itemprop="datePublished" content="2019-12-11T00:00:00+00:00" /> -<meta itemprop="dateModified" content="2020-08-30T12:04:33+02:00" /> +<meta itemprop="dateModified" content="2020-09-03T04:39:51-07:00" /> <meta itemprop="wordCount" content="1775"> diff --git a/blog/announcing-new-website/index.html b/blog/announcing-new-website/index.html index ab8a10c..e3240bb 100644 --- a/blog/announcing-new-website/index.html +++ b/blog/announcing-new-website/index.html @@ -36,13 +36,13 @@ <meta property="og:image" content="/images/feature-image.png" /> <meta property="article:published_time" content="2019-12-11T00:00:00+00:00" /> -<meta property="article:modified_time" content="2020-08-30T12:04:33+02:00" /><meta property="og:site_name" content="Apache Airflow" /> +<meta property="article:modified_time" content="2020-09-03T04:39:51-07:00" /><meta property="og:site_name" content="Apache Airflow" /> <meta itemprop="name" content="New Airflow website"> <meta itemprop="description" content="We are thrilled about our new website!"> <meta itemprop="datePublished" content="2019-12-11T00:00:00+00:00" /> -<meta itemprop="dateModified" content="2020-08-30T12:04:33+02:00" /> +<meta itemprop="dateModified" content="2020-09-03T04:39:51-07:00" /> <meta itemprop="wordCount" content="282"> diff --git a/blog/apache-airflow-for-newcomers/index.html b/blog/apache-airflow-for-newcomers/index.html index f90ddc3..74085a2 100644 --- a/blog/apache-airflow-for-newcomers/index.html +++ b/blog/apache-airflow-for-newcomers/index.html @@ -37,14 +37,14 @@ Authoring Workflow in Apache Airflow. Airflow makes it easy to author workflows <meta property="og:image" content="/images/feature-image.png" /> <meta property="article:published_time" content="2020-08-17T00:00:00+00:00" /> -<meta property="article:modified_time" content="2020-08-30T12:04:33+02:00" /><meta property="og:site_name" content="Apache Airflow" /> +<meta property="article:modified_time" content="2020-09-03T04:39:51-07:00" /><meta property="og:site_name" content="Apache Airflow" /> <meta itemprop="name" content="Apache Airflow For Newcomers"> <meta itemprop="description" content="Apache Airflow is a platform to programmatically author, schedule, and monitor workflows. A workflow is a sequence of tasks that processes a set of data. You can think of workflow as the path that describes how tasks go from being undone to done. Scheduling, on the other hand, is the process of planning, controlling, and optimizing when a particular task should be done. Authoring Workflow in Apache Airflow. Airflow makes it easy to author workflows using python scripts."> <meta itemprop="datePublished" content="2020-08-17T00:00:00+00:00" /> -<meta itemprop="dateModified" content="2020-08-30T12:04:33+02:00" /> +<meta itemprop="dateModified" content="2020-09-03T04:39:51-07:00" /> <meta itemprop="wordCount" content="1070"> diff --git a/blog/apache-con-europe-2019-thoughts-and-insights-by-airflow-committers/index.html b/blog/apache-con-europe-2019-thoughts-and-insights-by-airflow-committers/index.html index 420e617..43f7b30 100644 --- a/blog/apache-con-europe-2019-thoughts-and-insights-by-airflow-committers/index.html +++ b/blog/apache-con-europe-2019-thoughts-and-insights-by-airflow-committers/index.html @@ -36,13 +36,13 @@ <meta property="og:image" content="/images/feature-image.png" /> <meta property="article:published_time" content="2019-11-22T00:00:00+00:00" /> -<meta property="article:modified_time" content="2020-08-30T12:04:33+02:00" /><meta property="og:site_name" content="Apache Airflow" /> +<meta property="article:modified_time" content="2020-09-03T04:39:51-07:00" /><meta property="og:site_name" content="Apache Airflow" /> <meta itemprop="name" content="ApacheCon Europe 2019 — Thoughts and Insights by Airflow Committers"> <meta itemprop="description" content="Here come some thoughts by Airflow committers and contributors from the ApacheCon Europe 2019. Get to know the ASF community!"> <meta itemprop="datePublished" content="2019-11-22T00:00:00+00:00" /> -<meta itemprop="dateModified" content="2020-08-30T12:04:33+02:00" /> +<meta itemprop="dateModified" content="2020-09-03T04:39:51-07:00" /> <meta itemprop="wordCount" content="138"> diff --git a/blog/documenting-using-local-development-environments/index.html b/blog/documenting-using-local-development-environments/index.html index 9f96d11..05382bf 100644 --- a/blog/documenting-using-local-development-environments/index.html +++ b/blog/documenting-using-local-development-environments/index.html @@ -36,13 +36,13 @@ <meta property="og:image" content="/images/feature-image.png" /> <meta property="article:published_time" content="2019-11-22T00:00:00+00:00" /> -<meta property="article:modified_time" content="2020-08-30T12:04:33+02:00" /><meta property="og:site_name" content="Apache Airflow" /> +<meta property="article:modified_time" content="2020-09-03T04:39:51-07:00" /><meta property="og:site_name" content="Apache Airflow" /> <meta itemprop="name" content="Documenting using local development environment"> <meta itemprop="description" content="The story behind documenting local development environment of Apache Airflow"> <meta itemprop="datePublished" content="2019-11-22T00:00:00+00:00" /> -<meta itemprop="dateModified" content="2020-08-30T12:04:33+02:00" /> +<meta itemprop="dateModified" content="2020-09-03T04:39:51-07:00" /> <meta itemprop="wordCount" content="256"> diff --git a/blog/experience-in-google-season-of-docs-2019-with-apache-airflow/index.html b/blog/experience-in-google-season-of-docs-2019-with-apache-airflow/index.html index a49f1ae..cfc3598 100644 --- a/blog/experience-in-google-season-of-docs-2019-with-apache-airflow/index.html +++ b/blog/experience-in-google-season-of-docs-2019-with-apache-airflow/index.html @@ -37,14 +37,14 @@ About Me I have been writing tech articles on medium as well as my blog for the <meta property="og:image" content="/images/feature-image.png" /> <meta property="article:published_time" content="2019-12-20T00:00:00+00:00" /> -<meta property="article:modified_time" content="2020-08-30T12:04:33+02:00" /><meta property="og:site_name" content="Apache Airflow" /> +<meta property="article:modified_time" content="2020-09-03T04:39:51-07:00" /><meta property="og:site_name" content="Apache Airflow" /> <meta itemprop="name" content="Experience in Google Season of Docs 2019 with Apache Airflow"> <meta itemprop="description" content="I came across Google Season of Docs (GSoD) almost by accident, thanks to my extensive HackerNews and Twitter addiction. I was familiar with the Google Summer of Code but not with this program. It turns out it was the inaugural phase. I read the details, and the process felt a lot like GSoC except that this was about documentation. About Me I have been writing tech articles on medium as well as my blog for the past 1."> <meta itemprop="datePublished" content="2019-12-20T00:00:00+00:00" /> -<meta itemprop="dateModified" content="2020-08-30T12:04:33+02:00" /> +<meta itemprop="dateModified" content="2020-09-03T04:39:51-07:00" /> <meta itemprop="wordCount" content="1521"> diff --git a/blog/experience-with-airflow-as-an-outreachy-intern/index.html b/blog/experience-with-airflow-as-an-outreachy-intern/index.html index cfd9900..daaaed6 100644 --- a/blog/experience-with-airflow-as-an-outreachy-intern/index.html +++ b/blog/experience-with-airflow-as-an-outreachy-intern/index.html @@ -37,14 +37,14 @@ Contribution Period The first thing I had to do was choose a project under an or <meta property="og:image" content="/images/feature-image.png" /> <meta property="article:published_time" content="2020-08-30T00:00:00+00:00" /> -<meta property="article:modified_time" content="2020-08-30T12:04:33+02:00" /><meta property="og:site_name" content="Apache Airflow" /> +<meta property="article:modified_time" content="2020-09-03T04:39:51-07:00" /><meta property="og:site_name" content="Apache Airflow" /> <meta itemprop="name" content="Journey with Airflow as an Outreachy Intern"> <meta itemprop="description" content="Outreachy is a program which organises three months paid internships with FOSS projects for people who are typically underrepresented in those projects. Contribution Period The first thing I had to do was choose a project under an organisation. After going through all the projects I chose “Extending the REST API of Apache Airflow”, because I had a good idea of what REST API(s) are, so I thought it would be easier to get started with the contributions."> <meta itemprop="datePublished" content="2020-08-30T00:00:00+00:00" /> -<meta itemprop="dateModified" content="2020-08-30T12:04:33+02:00" /> +<meta itemprop="dateModified" content="2020-09-03T04:39:51-07:00" /> <meta itemprop="wordCount" content="532"> diff --git a/blog/implementing-stable-api-for-apache-airflow/index.html b/blog/implementing-stable-api-for-apache-airflow/index.html index be30470..1c5d2a1 100644 --- a/blog/implementing-stable-api-for-apache-airflow/index.html +++ b/blog/implementing-stable-api-for-apache-airflow/index.html @@ -36,13 +36,13 @@ <meta property="og:image" content="/images/feature-image.png" /> <meta property="article:published_time" content="2020-07-19T00:00:00+00:00" /> -<meta property="article:modified_time" content="2020-08-30T12:04:33+02:00" /><meta property="og:site_name" content="Apache Airflow" /> +<meta property="article:modified_time" content="2020-09-03T04:39:51-07:00" /><meta property="og:site_name" content="Apache Airflow" /> <meta itemprop="name" content="Implementing Stable API for Apache Airflow"> <meta itemprop="description" content="An Outreachy intern's progress report on contributing to Apache Airflow REST API."> <meta itemprop="datePublished" content="2020-07-19T00:00:00+00:00" /> -<meta itemprop="dateModified" content="2020-08-30T12:04:33+02:00" /> +<meta itemprop="dateModified" content="2020-09-03T04:39:51-07:00" /> <meta itemprop="wordCount" content="703"> diff --git a/blog/its-a-breeze-to-develop-apache-airflow/index.html b/blog/its-a-breeze-to-develop-apache-airflow/index.html index 2047ec2..44f8169 100644 --- a/blog/its-a-breeze-to-develop-apache-airflow/index.html +++ b/blog/its-a-breeze-to-develop-apache-airflow/index.html @@ -36,13 +36,13 @@ <meta property="og:image" content="/images/feature-image.png" /> <meta property="article:published_time" content="2019-11-22T00:00:00+00:00" /> -<meta property="article:modified_time" content="2020-08-30T12:04:33+02:00" /><meta property="og:site_name" content="Apache Airflow" /> +<meta property="article:modified_time" content="2020-09-03T04:39:51-07:00" /><meta property="og:site_name" content="Apache Airflow" /> <meta itemprop="name" content="It's a "Breeze" to develop Apache Airflow"> <meta itemprop="description" content="A Principal Software Engineer's journey to developer productivity. Learn how Jarek and his team speeded up and simplified Airflow development for the community."> <meta itemprop="datePublished" content="2019-11-22T00:00:00+00:00" /> -<meta itemprop="dateModified" content="2020-08-30T12:04:33+02:00" /> +<meta itemprop="dateModified" content="2020-09-03T04:39:51-07:00" /> <meta itemprop="wordCount" content="127"> diff --git a/index.html b/index.html index 9ff4539..c6b86b2 100644 --- a/index.html +++ b/index.html @@ -1226,12 +1226,12 @@ if (!doNotTrack) { <div id="integrations-container" class="list-items"> - <a class="list-item" href="/docs/stable/integration.html#azure-microsoft-azure"> + <a class="list-item" href="/docs/stable/integration.html#gcp-google-cloud-platform"> <div class="card"> <div class="box-event box-event__integration"> - <span class="box-event__integration--name">Azure Files</span> + <span class="box-event__integration--name">Cloud Storage (GCS)</span> </div> </div> @@ -1239,12 +1239,12 @@ if (!doNotTrack) { - <a class="list-item" href="/docs/stable/integration.html#software-integrations"> + <a class="list-item" href="/docs/stable/integration.html#protocol-integrations"> <div class="card"> <div class="box-event box-event__integration"> - <span class="box-event__integration--name">MongoDB</span> + <span class="box-event__integration--name">Filesystem</span> </div> </div> @@ -1252,12 +1252,12 @@ if (!doNotTrack) { - <a class="list-item" href="/docs/stable/integration.html#software-integrations"> + <a class="list-item" href="/docs/stable/integration.html#gcp-google-cloud-platform"> <div class="card"> <div class="box-event box-event__integration"> - <span class="box-event__integration--name">Redis</span> + <span class="box-event__integration--name">Storage Transfer Service</span> </div> </div> @@ -1265,12 +1265,12 @@ if (!doNotTrack) { - <a class="list-item" href="/docs/stable/integration.html#asf-apache-software-foundation"> + <a class="list-item" href="/docs/stable/integration.html#aws-amazon-web-services"> <div class="card"> <div class="box-event box-event__integration"> - <span class="box-event__integration--name">Apache Spark</span> + <span class="box-event__integration--name">Amazon SageMaker</span> </div> </div> @@ -1278,12 +1278,12 @@ if (!doNotTrack) { - <a class="list-item" href="/docs/stable/integration.html#asf-apache-software-foundation"> + <a class="list-item" href="/docs/stable/integration.html#software-integrations"> <div class="card"> <div class="box-event box-event__integration"> - <span class="box-event__integration--name">Apache Sqoop</span> + <span class="box-event__integration--name">MongoDB</span> </div> </div> @@ -1291,12 +1291,12 @@ if (!doNotTrack) { - <a class="list-item" href="/docs/stable/integration.html#aws-amazon-web-services"> + <a class="list-item" href="/docs/stable/integration.html#gcp-google-cloud-platform"> <div class="card"> <div class="box-event box-event__integration"> - <span class="box-event__integration--name">Amazon Redshift</span> + <span class="box-event__integration--name">Cloud Data Loss Prevention (DLP)</span> </div> </div> @@ -1304,12 +1304,12 @@ if (!doNotTrack) { - <a class="list-item" href="/docs/stable/integration.html#service-integrations"> + <a class="list-item" href="/docs/stable/integration.html#aws-amazon-web-services"> <div class="card"> <div class="box-event box-event__integration"> - <span class="box-event__integration--name">Dingding</span> + <span class="box-event__integration--name">Amazon EMR</span> </div> </div> @@ -1317,12 +1317,12 @@ if (!doNotTrack) { - <a class="list-item" href="/docs/stable/integration.html#aws-amazon-web-services"> + <a class="list-item" href="/docs/stable/integration.html#gcp-google-cloud-platform"> <div class="card"> <div class="box-event box-event__integration"> - <span class="box-event__integration--name">Amazon Kinesis Data Firehose</span> + <span class="box-event__integration--name">Cloud Vision</span> </div> </div> diff --git a/search/index.html b/search/index.html index b93e313..4cbca79 100644 --- a/search/index.html +++ b/search/index.html @@ -35,12 +35,12 @@ <meta property="og:url" content="/search/" /> <meta property="og:image" content="/images/feature-image.png" /> -<meta property="article:modified_time" content="2020-08-30T12:04:33+02:00" /><meta property="og:site_name" content="Apache Airflow" /> +<meta property="article:modified_time" content="2020-09-03T04:39:51-07:00" /><meta property="og:site_name" content="Apache Airflow" /> <meta itemprop="name" content="Search Results"> <meta itemprop="description" content=""> -<meta itemprop="dateModified" content="2020-08-30T12:04:33+02:00" /> +<meta itemprop="dateModified" content="2020-09-03T04:39:51-07:00" /> <meta itemprop="wordCount" content="0"> diff --git a/sitemap.xml b/sitemap.xml index f0e94fa..2cf4022 100644 --- a/sitemap.xml +++ b/sitemap.xml @@ -4,217 +4,217 @@ <url> <loc>/docs/overview/</loc> - <lastmod>2020-08-30T12:04:33+02:00</lastmod> + <lastmod>2020-09-03T04:39:51-07:00</lastmod> </url> <url> <loc>/docs/tasks/beds/</loc> - <lastmod>2020-08-30T12:04:33+02:00</lastmod> + <lastmod>2020-09-03T04:39:51-07:00</lastmod> </url> <url> <loc>/docs/tasks/ponycopters/configuring-ponycopters/</loc> - <lastmod>2020-08-30T12:04:33+02:00</lastmod> + <lastmod>2020-09-03T04:39:51-07:00</lastmod> </url> <url> <loc>/docs/getting-started/</loc> - <lastmod>2020-08-30T12:04:33+02:00</lastmod> + <lastmod>2020-09-03T04:39:51-07:00</lastmod> </url> <url> <loc>/docs/examples/</loc> - <lastmod>2020-08-30T12:04:33+02:00</lastmod> + <lastmod>2020-09-03T04:39:51-07:00</lastmod> </url> <url> <loc>/docs/tasks/ponycopters/launching-ponycopters/</loc> - <lastmod>2020-08-30T12:04:33+02:00</lastmod> + <lastmod>2020-09-03T04:39:51-07:00</lastmod> </url> <url> <loc>/docs/tutorials/multi-bear/</loc> - <lastmod>2020-08-30T12:04:33+02:00</lastmod> + <lastmod>2020-09-03T04:39:51-07:00</lastmod> </url> <url> <loc>/docs/tasks/porridge/</loc> - <lastmod>2020-08-30T12:04:33+02:00</lastmod> + <lastmod>2020-09-03T04:39:51-07:00</lastmod> </url> <url> <loc>/docs/concepts/</loc> - <lastmod>2020-08-30T12:04:33+02:00</lastmod> + <lastmod>2020-09-03T04:39:51-07:00</lastmod> </url> <url> <loc>/docs/tasks/task/</loc> - <lastmod>2020-08-30T12:04:33+02:00</lastmod> + <lastmod>2020-09-03T04:39:51-07:00</lastmod> </url> <url> <loc>/docs/tutorials/tutorial2/</loc> - <lastmod>2020-08-30T12:04:33+02:00</lastmod> + <lastmod>2020-09-03T04:39:51-07:00</lastmod> </url> <url> <loc>/docs/tasks/</loc> - <lastmod>2020-08-30T12:04:33+02:00</lastmod> + <lastmod>2020-09-03T04:39:51-07:00</lastmod> </url> <url> <loc>/docs/tutorials/</loc> - <lastmod>2020-08-30T12:04:33+02:00</lastmod> + <lastmod>2020-09-03T04:39:51-07:00</lastmod> </url> <url> <loc>/docs/reference/</loc> - <lastmod>2020-08-30T12:04:33+02:00</lastmod> + <lastmod>2020-09-03T04:39:51-07:00</lastmod> </url> <url> <loc>/docs/contribution-guidelines/</loc> - <lastmod>2020-08-30T12:04:33+02:00</lastmod> + <lastmod>2020-09-03T04:39:51-07:00</lastmod> </url> <url> <loc>/docs/</loc> - <lastmod>2020-08-30T12:04:33+02:00</lastmod> + <lastmod>2020-09-03T04:39:51-07:00</lastmod> </url> <url> <loc>/blog/tags/community/</loc> - <lastmod>2020-08-30T12:04:33+02:00</lastmod> + <lastmod>2020-09-03T04:39:51-07:00</lastmod> </url> <url> <loc>/blog/experience-with-airflow-as-an-outreachy-intern/</loc> - <lastmod>2020-08-30T12:04:33+02:00</lastmod> + <lastmod>2020-09-03T04:39:51-07:00</lastmod> </url> <url> <loc>/tags/</loc> - <lastmod>2020-08-30T12:04:33+02:00</lastmod> + <lastmod>2020-09-03T04:39:51-07:00</lastmod> </url> <url> <loc>/blog/airflow-1.10.12/</loc> - <lastmod>2020-08-30T12:04:33+02:00</lastmod> + <lastmod>2020-09-03T04:39:51-07:00</lastmod> </url> <url> <loc>/blog/tags/release/</loc> - <lastmod>2020-08-30T12:04:33+02:00</lastmod> + <lastmod>2020-09-03T04:39:51-07:00</lastmod> </url> <url> <loc>/blog/apache-airflow-for-newcomers/</loc> - <lastmod>2020-08-30T12:04:33+02:00</lastmod> + <lastmod>2020-09-03T04:39:51-07:00</lastmod> </url> <url> <loc>/announcements/</loc> - <lastmod>2020-08-30T12:04:33+02:00</lastmod> + <lastmod>2020-09-03T04:39:51-07:00</lastmod> </url> <url> <loc>/blog/implementing-stable-api-for-apache-airflow/</loc> - <lastmod>2020-08-30T12:04:33+02:00</lastmod> + <lastmod>2020-09-03T04:39:51-07:00</lastmod> </url> <url> <loc>/blog/tags/rest-api/</loc> - <lastmod>2020-08-30T12:04:33+02:00</lastmod> + <lastmod>2020-09-03T04:39:51-07:00</lastmod> </url> <url> <loc>/blog/airflow-1.10.10/</loc> - <lastmod>2020-08-30T12:04:33+02:00</lastmod> + <lastmod>2020-09-03T04:39:51-07:00</lastmod> </url> <url> <loc>/blog/airflow-1.10.8-1.10.9/</loc> - <lastmod>2020-08-30T12:04:33+02:00</lastmod> + <lastmod>2020-09-03T04:39:51-07:00</lastmod> </url> <url> <loc>/blog/tags/documentation/</loc> - <lastmod>2020-08-30T12:04:33+02:00</lastmod> + <lastmod>2020-09-03T04:39:51-07:00</lastmod> </url> <url> <loc>/blog/experience-in-google-season-of-docs-2019-with-apache-airflow/</loc> - <lastmod>2020-08-30T12:04:33+02:00</lastmod> + <lastmod>2020-09-03T04:39:51-07:00</lastmod> </url> <url> <loc>/blog/airflow-survey/</loc> - <lastmod>2020-08-30T12:04:33+02:00</lastmod> + <lastmod>2020-09-03T04:39:51-07:00</lastmod> </url> <url> <loc>/blog/announcing-new-website/</loc> - <lastmod>2020-08-30T12:04:33+02:00</lastmod> + <lastmod>2020-09-03T04:39:51-07:00</lastmod> </url> <url> <loc>/blog/tags/survey/</loc> - <lastmod>2020-08-30T12:04:33+02:00</lastmod> + <lastmod>2020-09-03T04:39:51-07:00</lastmod> </url> <url> <loc>/blog/tags/users/</loc> - <lastmod>2020-08-30T12:04:33+02:00</lastmod> + <lastmod>2020-09-03T04:39:51-07:00</lastmod> </url> <url> <loc>/blog/apache-con-europe-2019-thoughts-and-insights-by-airflow-committers/</loc> - <lastmod>2020-08-30T12:04:33+02:00</lastmod> + <lastmod>2020-09-03T04:39:51-07:00</lastmod> </url> <url> <loc>/blog/tags/development/</loc> - <lastmod>2020-08-30T12:04:33+02:00</lastmod> + <lastmod>2020-09-03T04:39:51-07:00</lastmod> </url> <url> <loc>/blog/documenting-using-local-development-environments/</loc> - <lastmod>2020-08-30T12:04:33+02:00</lastmod> + <lastmod>2020-09-03T04:39:51-07:00</lastmod> </url> <url> <loc>/blog/its-a-breeze-to-develop-apache-airflow/</loc> - <lastmod>2020-08-30T12:04:33+02:00</lastmod> + <lastmod>2020-09-03T04:39:51-07:00</lastmod> </url> <url> <loc>/docs/getting-started/example-page/</loc> - <lastmod>2020-08-30T12:04:33+02:00</lastmod> + <lastmod>2020-09-03T04:39:51-07:00</lastmod> </url> <url> <loc>/docs/reference/parameter-reference/</loc> - <lastmod>2020-08-30T12:04:33+02:00</lastmod> + <lastmod>2020-09-03T04:39:51-07:00</lastmod> </url> <url> <loc>/docs/tasks/ponycopters/</loc> - <lastmod>2020-08-30T12:04:33+02:00</lastmod> + <lastmod>2020-09-03T04:39:51-07:00</lastmod> </url> <url> <loc>/use-cases/adobe/</loc> - <lastmod>2020-08-30T12:04:33+02:00</lastmod> + <lastmod>2020-09-03T04:39:51-07:00</lastmod> </url> <url> <loc>/use-cases/big-fish-games/</loc> - <lastmod>2020-08-30T12:04:33+02:00</lastmod> + <lastmod>2020-09-03T04:39:51-07:00</lastmod> </url> <url> <loc>/blog/</loc> - <lastmod>2020-08-30T12:04:33+02:00</lastmod> + <lastmod>2020-09-03T04:39:51-07:00</lastmod> </url> <url> @@ -223,62 +223,67 @@ <url> <loc>/community/</loc> - <lastmod>2020-08-30T12:04:33+02:00</lastmod> + <lastmod>2020-09-03T04:39:51-07:00</lastmod> </url> <url> <loc>/use-cases/dish/</loc> - <lastmod>2020-08-30T12:04:33+02:00</lastmod> + <lastmod>2020-09-03T04:39:51-07:00</lastmod> </url> <url> <loc>/ecosystem/</loc> - <lastmod>2020-08-30T12:04:33+02:00</lastmod> + <lastmod>2020-09-03T04:39:51-07:00</lastmod> </url> <url> <loc>/use-cases/experity/</loc> - <lastmod>2020-08-30T12:04:33+02:00</lastmod> + <lastmod>2020-09-03T04:39:51-07:00</lastmod> </url> <url> <loc>/</loc> - <lastmod>2020-08-30T12:04:33+02:00</lastmod> + <lastmod>2020-09-03T04:39:51-07:00</lastmod> </url> <url> <loc>/install/</loc> - <lastmod>2020-08-30T12:04:33+02:00</lastmod> + <lastmod>2020-09-03T04:39:51-07:00</lastmod> </url> <url> <loc>/meetups/</loc> - <lastmod>2020-08-30T12:04:33+02:00</lastmod> + <lastmod>2020-09-03T04:39:51-07:00</lastmod> </url> <url> <loc>/use-cases/onefootball/</loc> - <lastmod>2020-08-30T12:04:33+02:00</lastmod> + <lastmod>2020-09-03T04:39:51-07:00</lastmod> </url> <url> <loc>/privacy-notice/</loc> - <lastmod>2020-08-30T12:04:33+02:00</lastmod> + <lastmod>2020-09-03T04:39:51-07:00</lastmod> </url> <url> <loc>/roadmap/</loc> - <lastmod>2020-08-30T12:04:33+02:00</lastmod> + <lastmod>2020-09-03T04:39:51-07:00</lastmod> </url> <url> <loc>/search/</loc> - <lastmod>2020-08-30T12:04:33+02:00</lastmod> + <lastmod>2020-09-03T04:39:51-07:00</lastmod> + </url> + + <url> + <loc>/use-cases/sift/</loc> + <lastmod>2020-09-03T04:39:51-07:00</lastmod> </url> <url> <loc>/use-cases/</loc> - <lastmod>2020-08-30T12:04:33+02:00</lastmod> + <lastmod>2020-09-03T04:39:51-07:00</lastmod> </url> </urlset> \ No newline at end of file diff --git a/use-cases/adobe/index.html b/use-cases/adobe/index.html index 2727c6d..e6a0e9b 100644 --- a/use-cases/adobe/index.html +++ b/use-cases/adobe/index.html @@ -35,12 +35,12 @@ <meta property="og:url" content="/use-cases/adobe/" /> <meta property="og:image" content="/images/feature-image.png" /> -<meta property="article:modified_time" content="2020-08-30T12:04:33+02:00" /><meta property="og:site_name" content="Apache Airflow" /> +<meta property="article:modified_time" content="2020-09-03T04:39:51-07:00" /><meta property="og:site_name" content="Apache Airflow" /> <meta itemprop="name" content="Adobe"> <meta itemprop="description" content="What was the problem? Modern big data platforms need sophisticated data pipelines connecting to many backend services enabling complex workflows. These workflows need to be deployed, monitored, and run either on regular schedules or triggered by external events. Adobe Experience Platform component services architected and built an orchestration service to enable their users to author, schedule, and monitor complex hierarchical (including sequential a [...] -<meta itemprop="dateModified" content="2020-08-30T12:04:33+02:00" /> +<meta itemprop="dateModified" content="2020-09-03T04:39:51-07:00" /> <meta itemprop="wordCount" content="251"> diff --git a/use-cases/big-fish-games/index.html b/use-cases/big-fish-games/index.html index ab6b70b..898addf 100644 --- a/use-cases/big-fish-games/index.html +++ b/use-cases/big-fish-games/index.html @@ -35,12 +35,12 @@ <meta property="og:url" content="/use-cases/big-fish-games/" /> <meta property="og:image" content="/images/feature-image.png" /> -<meta property="article:modified_time" content="2020-08-30T12:04:33+02:00" /><meta property="og:site_name" content="Apache Airflow" /> +<meta property="article:modified_time" content="2020-09-03T04:39:51-07:00" /><meta property="og:site_name" content="Apache Airflow" /> <meta itemprop="name" content="Big Fish Games"> <meta itemprop="description" content="What was the problem? The main challenge is the lack of standardized ETL workflow orchestration tools. PowerShell and Python-based ETL frameworks built in-house are currently used for scheduling and running analytical workloads. However, there is no web UI through which we can monitor these workflows and it requires additional effort to maintain this framework. These scheduled jobs based on external dependencies are not well suited to modern Big Data [...] -<meta itemprop="dateModified" content="2020-08-30T12:04:33+02:00" /> +<meta itemprop="dateModified" content="2020-09-03T04:39:51-07:00" /> <meta itemprop="wordCount" content="336"> diff --git a/use-cases/dish/index.html b/use-cases/dish/index.html index 9dc510f..e3249b0 100644 --- a/use-cases/dish/index.html +++ b/use-cases/dish/index.html @@ -35,12 +35,12 @@ <meta property="og:url" content="/use-cases/dish/" /> <meta property="og:image" content="/images/feature-image.png" /> -<meta property="article:modified_time" content="2020-08-30T12:04:33+02:00" /><meta property="og:site_name" content="Apache Airflow" /> +<meta property="article:modified_time" content="2020-09-03T04:39:51-07:00" /><meta property="og:site_name" content="Apache Airflow" /> <meta itemprop="name" content="Dish"> <meta itemprop="description" content="What was the problem? We faced increasing complexity managing lengthy crontabs with scheduling being an issue, this required carefully planning timing due to resource constraints, usage patterns, and especially custom code needed for retry logic. In the last case, having to verify success of previous jobs and/or steps prior to running the next. Furthermore, time to results is important, but we were increasingly relying on buffers for processing, wher [...] -<meta itemprop="dateModified" content="2020-08-30T12:04:33+02:00" /> +<meta itemprop="dateModified" content="2020-09-03T04:39:51-07:00" /> <meta itemprop="wordCount" content="249"> diff --git a/use-cases/experity/index.html b/use-cases/experity/index.html index de9d432..ee4124c 100644 --- a/use-cases/experity/index.html +++ b/use-cases/experity/index.html @@ -36,13 +36,13 @@ How did Apache Airflow help to solve this problem? Ultimately we decided flexibl <meta property="og:url" content="/use-cases/experity/" /> <meta property="og:image" content="/images/feature-image.png" /> -<meta property="article:modified_time" content="2020-08-30T12:04:33+02:00" /><meta property="og:site_name" content="Apache Airflow" /> +<meta property="article:modified_time" content="2020-09-03T04:39:51-07:00" /><meta property="og:site_name" content="Apache Airflow" /> <meta itemprop="name" content="Experity"> <meta itemprop="description" content="What was the problem? We had to deploy our complex, flagship app to multiple nodes in multiple ways. This required tasks to communicate across Windows nodes and coordinate timing perfectly. We did not want to buy an expensive enterprise scheduling tool and needed ultimate flexibility. How did Apache Airflow help to solve this problem? Ultimately we decided flexible, multi-node, DAG capable tooling was key and airflow was one of the few tools that fit that bill."> -<meta itemprop="dateModified" content="2020-08-30T12:04:33+02:00" /> +<meta itemprop="dateModified" content="2020-09-03T04:39:51-07:00" /> <meta itemprop="wordCount" content="191"> diff --git a/use-cases/index.html b/use-cases/index.html index 0607fdf..b9d6f5b 100644 --- a/use-cases/index.html +++ b/use-cases/index.html @@ -536,6 +536,28 @@ if (!doNotTrack) { </a> + <a href="/use-cases/sift/" class="list-item"> + + + +<div class="card"> + <div class="box-event box-event__case-study hoverable-icon"> + <div class="box-event__case-study--logo"> + + <img src="/usecase-logos/sift_logo.png" alt="Sift logo" /> + + </div> + <p class="box-event__case-study--quote" + >Airflow helped us to define and organize our ML pipeline dependencies, and empowered us to introduce new, diverse batch …</p> + + +<button class="btn-hollow btn-blue bodytext__medium--cerulean-blue" >Learn more</button> + + </div> +</div> + + </a> + </div> <div class="show-more-button"> diff --git a/use-cases/index.xml b/use-cases/index.xml index 87a68d9..db70030 100644 --- a/use-cases/index.xml +++ b/use-cases/index.xml @@ -147,5 +147,43 @@ </description> </item> + <item> + <title>Use-Cases: Sift</title> + <link>/use-cases/sift/</link> + <pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate> + + <guid>/use-cases/sift/</guid> + <description> + + + + +<h5 id="what-was-the-problem">What was the problem?</h5> + +<p>At Sift, we’re constantly training machine learning models that feed into the core of Sift’s Digital Trust &amp; Safety platform. The platform gives our customers a way to discern suspicious online behavior from trustworthy behavior, allowing our customers to protect their online transactions, maintain the integrity of their content platforms, and keep their users’ accounts secure. To make this possible, we’ve built model training pipelines that consist of hundreds of steps [...] + +<p>When we built these workflows, we found that we needed a centralized way to organize the interactions between the many steps in each workflow. But before Airflow, we didn’t have an easy way to express those dependencies. And as we added steps to the workflows, it became increasingly difficult to coordinate their dependencies and keep ML experiments in sync.</p> + +<p>It soon became clear that we needed a way to orchestrate both the scheduled execution of our jobs and the dependencies between steps of not only a single workflow, but of multiple workflows. We needed a way to dynamically create several experimental ML workflows at once that could each have their own code, dependencies, and tasks. Additionally, we needed a way to be able to monitor the status of tasks, and re-run or restart tasks from any given point in a workflow with ease.</p> + +<h5 id="how-did-apache-airflow-help-to-solve-this-problem">How did Apache Airflow help to solve this problem?</h5> + +<p>Airflow makes it easy to clearly define the interactions between various jobs, expanding the scope of what we can do in our model training pipelines. We now have the ability to schedule and coordinate all jobs while managing the dependencies between them using DAGs. Each of our main workflows, including our model training pipeline and ETL pipelines, has its own DAG code that manages its tasks’ dependencies and the execution schedule for the pipeline. We even define dependencies [...] + +<p>As part of our custom Airflow setup, we’ve built out a separate Airflow ecosystem for short-lived experimental DAGs as well, so that we can test changes to our jobs or run separate model training pipelines in isolation. Using deployment scripts that edit our DAGs when we upload them to Airflow, the same code that powers an existing DAG can be deployed in a separate, isolated environment with experimental edits. This means that each experiment can have its own isolated code, runn [...] + +<p>Finally, Airflow has given us the ability to manage our tasks’ successes and failures through its user interface. Airflow allows us to track our tasks’ failures, duration, history, and logs in one central UI, and that same UI also allows us to easily retry single tasks, branches of a DAG, or entire DAGs.</p> + +<h5 id="what-are-the-results">What are the results?</h5> + +<p>Airflow initially gave us a way to solve our existing problems: we used Airflow to replace rigid crons with well-defined DAG dependencies, to build isolated ML experiments using short-lived DAGs, and to track our pipelines’ successes and failures.</p> + +<p>But even after that, Airflow helped us to grow beyond those initial challenges, and expanded the scope of what we could feasibly tackle. Airflow not only made it easier to manage our ever-expanding ML pipelines, but also allowed us to create entirely new pipelines, ranging from workflows that back up our production data to complex ETL pipelines that transform data into experimentation-ready formats.</p> + +<p>Airflow also allowed us to support a more diverse toolset. Shell scripts, Java, Python, Jupyter notebooks, and more - all of these can be managed from an Airflow DAG, allowing developers to utilize our data to test new ideas, generate insights, and improve our models with ease.</p> + + </description> + </item> + </channel> </rss> diff --git a/use-cases/onefootball/index.html b/use-cases/onefootball/index.html index 864b025..edc6acf 100644 --- a/use-cases/onefootball/index.html +++ b/use-cases/onefootball/index.html @@ -36,13 +36,13 @@ On top of that, new data tools appear each month: third party data sources, clou <meta property="og:url" content="/use-cases/onefootball/" /> <meta property="og:image" content="/images/feature-image.png" /> -<meta property="article:modified_time" content="2020-08-30T12:04:33+02:00" /><meta property="og:site_name" content="Apache Airflow" /> +<meta property="article:modified_time" content="2020-09-03T04:39:51-07:00" /><meta property="og:site_name" content="Apache Airflow" /> <meta itemprop="name" content="Onefootball"> <meta itemprop="description" content="What was the problem? With millions of daily active users, managing the complexity of data engineering at Onefootball is a constant challenge. Lengthy crontabs, multiplication of custom API clients, erosion of confidence in the analytics served, increasing heroism (“only one person can solve this issue”). Those are the challenges that most teams face unless they consciously invest in their tools and processes. On top of that, new data tools appear each month: third party data sources, cloud providers solutions, different storage technologies… Managing all those integrations is costly and brittle, especially for small data engineering teams that are trying to do more with less."> -<meta itemprop="dateModified" content="2020-08-30T12:04:33+02:00" /> +<meta itemprop="dateModified" content="2020-09-03T04:39:51-07:00" /> <meta itemprop="wordCount" content="294"> @@ -469,10 +469,10 @@ if (!doNotTrack) { <div class="pager"> - <a > + <a href="/use-cases/sift/"> -<button class="btn-hollow btn-blue bodytext__medium--cerulean-blue" disabled>Previous</button> +<button class="btn-hollow btn-blue bodytext__medium--cerulean-blue" >Previous</button> </a> <a href="/use-cases/experity/"> diff --git a/use-cases/onefootball/index.html b/use-cases/sift/index.html similarity index 87% copy from use-cases/onefootball/index.html copy to use-cases/sift/index.html index 864b025..8ce1b03 100644 --- a/use-cases/onefootball/index.html +++ b/use-cases/sift/index.html @@ -29,21 +29,19 @@ <meta name="msapplication-TileImage" content="/favicons/ms-icon-144x144.png"> <meta name="theme-color" content="#ffffff"> -<title>Onefootball | Apache Airflow</title><meta property="og:title" content="Onefootball" /> -<meta property="og:description" content="What was the problem? With millions of daily active users, managing the complexity of data engineering at Onefootball is a constant challenge. Lengthy crontabs, multiplication of custom API clients, erosion of confidence in the analytics served, increasing heroism (“only one person can solve this issue”). Those are the challenges that most teams face unless they consciously invest in their tools and processes. -On top of that, new data tools appear each month: third party data sources, cloud providers solutions, different storage technologies… Managing all those integrations is costly and brittle, especially for small data engineering teams that are trying to do more with less." /> +<title>Sift | Apache Airflow</title><meta property="og:title" content="Sift" /> +<meta property="og:description" content="What was the problem? At Sift, we’re constantly training machine learning models that feed into the core of Sift’s Digital Trust & Safety platform. The platform gives our customers a way to discern suspicious online behavior from trustworthy behavior, allowing our customers to protect their online transactions, maintain the integrity of their content platforms, and keep their users’ accounts secure. To make this possible, we’ve built model tra [...] <meta property="og:type" content="article" /> -<meta property="og:url" content="/use-cases/onefootball/" /> +<meta property="og:url" content="/use-cases/sift/" /> <meta property="og:image" content="/images/feature-image.png" /> -<meta property="article:modified_time" content="2020-08-30T12:04:33+02:00" /><meta property="og:site_name" content="Apache Airflow" /> -<meta itemprop="name" content="Onefootball"> -<meta itemprop="description" content="What was the problem? With millions of daily active users, managing the complexity of data engineering at Onefootball is a constant challenge. Lengthy crontabs, multiplication of custom API clients, erosion of confidence in the analytics served, increasing heroism (“only one person can solve this issue”). Those are the challenges that most teams face unless they consciously invest in their tools and processes. -On top of that, new data tools appear each month: third party data sources, cloud providers solutions, different storage technologies… Managing all those integrations is costly and brittle, especially for small data engineering teams that are trying to do more with less."> +<meta property="article:modified_time" content="2020-09-03T04:39:51-07:00" /><meta property="og:site_name" content="Apache Airflow" /> +<meta itemprop="name" content="Sift"> +<meta itemprop="description" content="What was the problem? At Sift, we’re constantly training machine learning models that feed into the core of Sift’s Digital Trust & Safety platform. The platform gives our customers a way to discern suspicious online behavior from trustworthy behavior, allowing our customers to protect their online transactions, maintain the integrity of their content platforms, and keep their users’ accounts secure. To make this possible, we’ve built model traini [...] -<meta itemprop="dateModified" content="2020-08-30T12:04:33+02:00" /> -<meta itemprop="wordCount" content="294"> +<meta itemprop="dateModified" content="2020-09-03T04:39:51-07:00" /> +<meta itemprop="wordCount" content="641"> @@ -51,9 +49,8 @@ On top of that, new data tools appear each month: third party data sources, clou <meta name="twitter:card" content="summary_large_image"/> <meta name="twitter:image" content="/images/feature-image.png"/> -<meta name="twitter:title" content="Onefootball"/> -<meta name="twitter:description" content="What was the problem? With millions of daily active users, managing the complexity of data engineering at Onefootball is a constant challenge. Lengthy crontabs, multiplication of custom API clients, erosion of confidence in the analytics served, increasing heroism (“only one person can solve this issue”). Those are the challenges that most teams face unless they consciously invest in their tools and processes. -On top of that, new data tools appear each month: third party data sources, cloud providers solutions, different storage technologies… Managing all those integrations is costly and brittle, especially for small data engineering teams that are trying to do more with less."/> +<meta name="twitter:title" content="Sift"/> +<meta name="twitter:description" content="What was the problem? At Sift, we’re constantly training machine learning models that feed into the core of Sift’s Digital Trust & Safety platform. The platform gives our customers a way to discern suspicious online behavior from trustworthy behavior, allowing our customers to protect their online transactions, maintain the integrity of their content platforms, and keep their users’ accounts secure. To make this possible, we’ve built model tr [...] <script type="application/javascript"> @@ -436,10 +433,10 @@ if (!doNotTrack) { <div class="quote"> - <p class="quote--text">Airflow is extensible enough for any business to define the custom operators they need. Airflow can help you in your DataOps journey: viewing analytics as code, monitoring, reusing components, being a catalyst of team interactions.</p> - <p class="quote--author">Louis Guitton</p> + <p class="quote--text">Airflow helped us to define and organize our ML pipeline dependencies, and empowered us to introduce new, diverse batch processes at increasing scale.</p> + <p class="quote--author">Handong Park</p> - <img src="/usecase-logos/onefootball-logo.svg" alt="Onefootball logo" class="quote--logo" /> + <img src="/usecase-logos/sift_logo.png" alt="Sift logo" class="quote--logo" /> </div> @@ -449,19 +446,27 @@ if (!doNotTrack) { <h5 id="what-was-the-problem">What was the problem?</h5> -<p>With millions of daily active users, managing the complexity of data engineering at Onefootball is a constant challenge. Lengthy crontabs, multiplication of custom API clients, erosion of confidence in the analytics served, increasing heroism (“only one person can solve this issue”). Those are the challenges that most teams face unless they consciously invest in their tools and processes.</p> +<p>At Sift, we’re constantly training machine learning models that feed into the core of Sift’s Digital Trust & Safety platform. The platform gives our customers a way to discern suspicious online behavior from trustworthy behavior, allowing our customers to protect their online transactions, maintain the integrity of their content platforms, and keep their users’ accounts secure. To make this possible, we’ve built model training pipelines that consist of hundreds of steps in MapRedu [...] -<p>On top of that, new data tools appear each month: third party data sources, cloud providers solutions, different storage technologies… Managing all those integrations is costly and brittle, especially for small data engineering teams that are trying to do more with less.</p> +<p>When we built these workflows, we found that we needed a centralized way to organize the interactions between the many steps in each workflow. But before Airflow, we didn’t have an easy way to express those dependencies. And as we added steps to the workflows, it became increasingly difficult to coordinate their dependencies and keep ML experiments in sync.</p> + +<p>It soon became clear that we needed a way to orchestrate both the scheduled execution of our jobs and the dependencies between steps of not only a single workflow, but of multiple workflows. We needed a way to dynamically create several experimental ML workflows at once that could each have their own code, dependencies, and tasks. Additionally, we needed a way to be able to monitor the status of tasks, and re-run or restart tasks from any given point in a workflow with ease.</p> <h5 id="how-did-apache-airflow-help-to-solve-this-problem">How did Apache Airflow help to solve this problem?</h5> -<p>Airflow had been on our radar for a while until one day we took the leap. We used the DAG paradigm to migrate the pipelines running on crontabs. We benefited from the community Hooks and Operators to remove parts of our code, or to refactor the API clients specific to our business. We use the alerts, SLAs and the web UI to regain confidence in our analytics. We use our airflow internal PRs as catalysts for team discussion and to challenge our technical designs.</p> +<p>Airflow makes it easy to clearly define the interactions between various jobs, expanding the scope of what we can do in our model training pipelines. We now have the ability to schedule and coordinate all jobs while managing the dependencies between them using DAGs. Each of our main workflows, including our model training pipeline and ETL pipelines, has its own DAG code that manages its tasks’ dependencies and the execution schedule for the pipeline. We even define dependencies betwe [...] + +<p>As part of our custom Airflow setup, we’ve built out a separate Airflow ecosystem for short-lived experimental DAGs as well, so that we can test changes to our jobs or run separate model training pipelines in isolation. Using deployment scripts that edit our DAGs when we upload them to Airflow, the same code that powers an existing DAG can be deployed in a separate, isolated environment with experimental edits. This means that each experiment can have its own isolated code, running in [...] -<p>We have DAGs orchestrating SQL transformations in our data warehouse, but also DAGs that are orchestrating functions ran against our Kubernetes cluster both for training Machine Learning models and sending daily analytics emails.</p> +<p>Finally, Airflow has given us the ability to manage our tasks’ successes and failures through its user interface. Airflow allows us to track our tasks’ failures, duration, history, and logs in one central UI, and that same UI also allows us to easily retry single tasks, branches of a DAG, or entire DAGs.</p> <h5 id="what-are-the-results">What are the results?</h5> -<p>The learning curve was steep but in about 100 days we were able to efficiently use Airflow to manage the complexity of our data engineering. We currently have 17 DAGs (adding on average 1 per week), we have 2 contributions on apache/airflow, we have 7 internal hooks and operators and are planning to add more as our migration efforts continue.</p> +<p>Airflow initially gave us a way to solve our existing problems: we used Airflow to replace rigid crons with well-defined DAG dependencies, to build isolated ML experiments using short-lived DAGs, and to track our pipelines’ successes and failures.</p> + +<p>But even after that, Airflow helped us to grow beyond those initial challenges, and expanded the scope of what we could feasibly tackle. Airflow not only made it easier to manage our ever-expanding ML pipelines, but also allowed us to create entirely new pipelines, ranging from workflows that back up our production data to complex ETL pipelines that transform data into experimentation-ready formats.</p> + +<p>Airflow also allowed us to support a more diverse toolset. Shell scripts, Java, Python, Jupyter notebooks, and more - all of these can be managed from an Airflow DAG, allowing developers to utilize our data to test new ideas, generate insights, and improve our models with ease.</p> </div> </div> @@ -475,7 +480,7 @@ if (!doNotTrack) { <button class="btn-hollow btn-blue bodytext__medium--cerulean-blue" disabled>Previous</button> </a> - <a href="/use-cases/experity/"> + <a href="/use-cases/onefootball/"> <button class="btn-hollow btn-blue bodytext__medium--cerulean-blue" >Next</button> @@ -490,7 +495,7 @@ if (!doNotTrack) { <div class="base-layout--button"> - <a href=https://github.com/apache/airflow-site/edit/master/landing-pages/site/content/en/use-cases/onefootball.md> + <a href=https://github.com/apache/airflow-site/edit/master/landing-pages/site/content/en/use-cases/sift.md> diff --git a/usecase-logos/sift_logo.png b/usecase-logos/sift_logo.png new file mode 100644 index 0000000..7bd6568 Binary files /dev/null and b/usecase-logos/sift_logo.png differ