This is an automated email from the ASF dual-hosted git repository.

potiuk pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/airflow.git


The following commit(s) were added to refs/heads/main by this push:
     new 36c5fd3  Move instriuctions of constraint/image refreshing to dev
36c5fd3 is described below

commit 36c5fd3df9b271702e1dd2d73c579de3f3bd5fc0
Author: Jarek Potiuk <ja...@potiuk.com>
AuthorDate: Mon Aug 23 11:35:44 2021 +0200

    Move instriuctions of constraint/image refreshing to dev
    
    When we have a prolonged issue with flaky tests or Github runners
    instabilities, our automated constraint and image refresh might
    not work, so we might need to manually refresh the constraints
    and images. Documentation about that was in CONTRIBUTING.rst
    but it is more appriate to keep it in ``dev`` as it only applies
    to committers.
    
    Also during testing the parallell refresh without delays an error
    was discovered  which prevented parallell check of random image
    hash during the build. This has been fixed and parallell
    image cache building should work flawlessly now.
---
 CONTRIBUTING.rst                        | 36 -------------
 dev/REFRESHING_CI_CACHE.md              | 94 +++++++++++++++++++++++++++++++++
 dev/refresh_images.sh                   | 38 +++++++++++++
 scripts/ci/libraries/_build_images.sh   | 68 +++++++++++++-----------
 scripts/ci/libraries/_initialization.sh | 11 ----
 5 files changed, 170 insertions(+), 77 deletions(-)

diff --git a/CONTRIBUTING.rst b/CONTRIBUTING.rst
index b5a81ff..3a561f8 100644
--- a/CONTRIBUTING.rst
+++ b/CONTRIBUTING.rst
@@ -877,42 +877,6 @@ The ``constraints-<PYTHON_MAJOR_MINOR_VERSION>.txt`` and 
``constraints-no-provid
 will be automatically regenerated by CI job every time after the ``setup.py`` 
is updated and pushed
 if the tests are successful.
 
-Manually generating constraint files
-------------------------------------
-
-The constraint files are generated automatically by the CI job. Sometimes 
however it is needed to regenerate
-them manually (committers only). For example when main build did not succeed 
for quite some time).
-This can be done by running this (it utilizes parallel preparation of the 
constraints):
-
-.. code-block:: bash
-
-    export CURRENT_PYTHON_MAJOR_MINOR_VERSIONS_AS_STRING="3.6 3.7 3.8 3.9"
-    for python_version in $(echo 
"${CURRENT_PYTHON_MAJOR_MINOR_VERSIONS_AS_STRING}")
-    do
-      ./breeze build-image --upgrade-to-newer-dependencies --python 
${python_version} --build-cache-local
-    done
-
-    GENERATE_CONSTRAINTS_MODE="pypi-providers" 
./scripts/ci/constraints/ci_generate_all_constraints.sh
-    GENERATE_CONSTRAINTS_MODE="source-providers" 
./scripts/ci/constraints/ci_generate_all_constraints.sh
-    GENERATE_CONSTRAINTS_MODE="no-providers" 
./scripts/ci/constraints/ci_generate_all_constraints.sh
-
-    AIRFLOW_SOURCES=$(pwd)
-
-
-The constraints will be generated in 
"files/constraints-PYTHON_VERSION/constraints-*.txt files. You need to
-checkout the right 'constraints-' branch in a separate repository and then you 
can copy, commit and push the
-generated files:
-
-.. code-block:: bash
-
-    cd <AIRFLOW_WITH_CONSTRAINT_main_DIRECTORY>
-    git pull
-    cp ${AIRFLOW_SOURCES}/files/constraints-*/constraints*.txt .
-    git diff
-    git add .
-    git commit -m "Your commit message here" --no-verify
-    git push
-
 
 Documentation
 =============
diff --git a/dev/REFRESHING_CI_CACHE.md b/dev/REFRESHING_CI_CACHE.md
new file mode 100644
index 0000000..c5a27ee
--- /dev/null
+++ b/dev/REFRESHING_CI_CACHE.md
@@ -0,0 +1,94 @@
+<!--
+ Licensed to the Apache Software Foundation (ASF) under one
+ or more contributor license agreements.  See the NOTICE file
+ distributed with this work for additional information
+ regarding copyright ownership.  The ASF licenses this file
+ to you under the Apache License, Version 2.0 (the
+ "License"); you may not use this file except in compliance
+ with the License.  You may obtain a copy of the License at
+
+   http://www.apache.org/licenses/LICENSE-2.0
+
+ Unless required by applicable law or agreed to in writing,
+ software distributed under the License is distributed on an
+ "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ KIND, either express or implied.  See the License for the
+ specific language governing permissions and limitations
+ under the License.
+-->
+
+<!-- START doctoc generated TOC please keep comment here to allow auto update 
-->
+<!-- DON'T EDIT THIS SECTION, INSTEAD RE-RUN doctoc TO UPDATE -->
+**Table of Contents**  *generated with 
[DocToc](https://github.com/thlorenz/doctoc)*
+
+- [Automated cache refreshing in CI](#automated-cache-refreshing-in-ci)
+- [Manually generating constraint files](#manually-generating-constraint-files)
+- [Manually refreshing the images](#manually-refreshing-the-images)
+
+<!-- END doctoc generated TOC please keep comment here to allow auto update -->
+
+# Automated cache refreshing in CI
+
+Our [CI system](../CI.rst) is build in the way that it self-maintains. Regular 
scheduled builds and
+merges to `main` branch have separate maintenance step that take care about 
refreshing the cache that is
+used to speed up our builds and to speed up rebuilding of 
[Breeze](../BREEZE.rst) images for development
+purpose. This is all happening automatically, usually:
+
+* The latest [constraints](../COMMITTERS.rst#pinned-constraint-files) are 
pushed to appropriate branch
+  after all tests succeeded in `main` merge or in `scheduled` build
+
+* The [images](../IMAGES.rst) in `ghcr.io` registry are refreshed after every 
successful merge to `main`
+  or `scheduled` build and after pushing the constraints, this means that the 
latest image cache uses
+  also the latest tested constraints
+
+Sometimes however, when we have prolonged period of fighting with flakiness of 
GitHub Actions runners or our
+tests, the refresh might not be triggered - because tests will not succeed for 
some time. In this case
+manual refresh might be needed.
+
+# Manually generating constraint files
+
+```bash
+export CURRENT_PYTHON_MAJOR_MINOR_VERSIONS_AS_STRING="3.6 3.7 3.8 3.9"
+for python_version in $(echo 
"${CURRENT_PYTHON_MAJOR_MINOR_VERSIONS_AS_STRING}")
+do
+  ./breeze build-image --upgrade-to-newer-dependencies --python 
${python_version} --build-cache-local
+done
+
+GENERATE_CONSTRAINTS_MODE="pypi-providers" 
./scripts/ci/constraints/ci_generate_all_constraints.sh
+GENERATE_CONSTRAINTS_MODE="source-providers" 
./scripts/ci/constraints/ci_generate_all_constraints.sh
+GENERATE_CONSTRAINTS_MODE="no-providers" 
./scripts/ci/constraints/ci_generate_all_constraints.sh
+
+AIRFLOW_SOURCES=$(pwd)
+```
+
+The constraints will be generated in 
`files/constraints-PYTHON_VERSION/constraints-*.txt` files. You need to
+check out the right 'constraints-' branch in a separate repository, and then 
you can copy, commit and push the
+generated files:
+
+```bash
+cd <AIRFLOW_WITH_CONSTRAINTS-MAIN_DIRECTORY>
+git pull
+cp ${AIRFLOW_SOURCES}/files/constraints-*/constraints*.txt .
+git diff
+git add .
+git commit -m "Your commit message here" --no-verify
+git push
+```
+
+# Manually refreshing the images
+
+The images can be rebuilt and refreshed after the constraints are pushed. 
Refreshing image for particular
+python version is a simple as running the 
[refresh_images.sh](refresh_images.sh) script with pyhon version
+as parameter:
+
+```bash
+./dev/refresh_images.sh 3.9
+```
+
+If you have fast network and powerful computer, you can refresh the images in 
parallel running the
+[refresh_images.sh](refresh_images.sh) with all python versions. You might do 
it with `tmux` manually
+or with gnu parallel:
+
+```bash
+parallel -j 4 --linebuffer --tagstring '{}' ./dev/refresh_images.sh ::: 3.6 
3.7 3.8 3.9
+```
diff --git a/dev/refresh_images.sh b/dev/refresh_images.sh
new file mode 100755
index 0000000..38e283a
--- /dev/null
+++ b/dev/refresh_images.sh
@@ -0,0 +1,38 @@
+#!/bin/bash
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+set -euo pipefail
+rm -rf docker-context-files/*.whl
+rm -rf docker-context-files/*.tgz
+export FORCE_ANSWER_TO_QUESTIONS="true"
+export CI="true"
+
+if [[ $1 == "" ]]; then
+  echo
+  echo ERROR! Please specify python version as parameter
+  echo
+  exit 1
+fi
+
+python_version=$1
+
+./breeze build-image --python "${python_version}" --build-cache-pulled  
--check-if-base-python-image-updated --verbose
+./breeze build-image --python "${python_version}" --build-cache-pulled  
--production-image --verbose
+
+./breeze push-image --python "${python_version}"
+./breeze push-image --production-image --python "${python_version}"
diff --git a/scripts/ci/libraries/_build_images.sh 
b/scripts/ci/libraries/_build_images.sh
index bfdaf21..e4af330 100644
--- a/scripts/ci/libraries/_build_images.sh
+++ b/scripts/ci/libraries/_build_images.sh
@@ -232,16 +232,13 @@ function build_images::check_for_docker_context_files() {
     fi
 }
 
-# Builds local image manifest
-# It contains only one .json file - result of docker inspect - describing the 
image
-# We cannot use docker registry APIs as they are available only with 
authorisation
-# But this image can be pulled without authentication
+# Builds local image manifest. It contains only one random file generated 
during Docker.ci build
 function build_images::build_ci_image_manifest() {
     docker_v build \
         --tag="${AIRFLOW_CI_LOCAL_MANIFEST_IMAGE}" \
         -f- . <<EOF
 FROM scratch
-COPY "manifests/local-build-cache-hash" /build-cache-hash
+COPY "manifests/local-build-cache-hash-${PYTHON_MAJOR_MINOR_VERSION}" 
/build-cache-hash
 LABEL org.opencontainers.image.source="https://github.com/${GITHUB_REPOSITORY}";
 CMD ""
 EOF
@@ -249,9 +246,13 @@ EOF
 
 #
 # Retrieves information about build cache hash random file from the local image
+# The random file is generated during the build and is best indicator whether 
your local CI image
+# has been built using the same pulled image as the remote one
 #
 function build_images::get_local_build_cache_hash() {
     set +e
+    local local_image_build_cache_file
+    
local_image_build_cache_file="${AIRFLOW_SOURCES}/manifests/local-build-cache-hash-${PYTHON_MAJOR_MINOR_VERSION}"
     # Remove the container just in case
     docker_v rm --force "local-airflow-ci-container" 2>/dev/null >/dev/null
     if ! docker_v inspect "${AIRFLOW_CI_IMAGE}" 2>/dev/null >/dev/null; then
@@ -260,34 +261,37 @@ function build_images::get_local_build_cache_hash() {
         verbosity::print_info
         LOCAL_MANIFEST_IMAGE_UNAVAILABLE="true"
         export LOCAL_MANIFEST_IMAGE_UNAVAILABLE
-        touch "${LOCAL_IMAGE_BUILD_CACHE_HASH_FILE}"
+        touch "${local_image_build_cache_file}"
         set -e
         return
 
     fi
     docker_v create --name "local-airflow-ci-container" "${AIRFLOW_CI_IMAGE}" 
2>/dev/null
     docker_v cp "local-airflow-ci-container:/build-cache-hash" \
-        "${LOCAL_IMAGE_BUILD_CACHE_HASH_FILE}" 2>/dev/null ||
-        touch "${LOCAL_IMAGE_BUILD_CACHE_HASH_FILE}"
+        "${local_image_build_cache_file}" 2>/dev/null ||
+        touch "${local_image_build_cache_file}"
     set -e
     verbosity::print_info
-    verbosity::print_info "Local build cache hash: '$(cat 
"${LOCAL_IMAGE_BUILD_CACHE_HASH_FILE}")'"
+    verbosity::print_info "Local build cache hash: '$(cat 
"${local_image_build_cache_file}")'"
     verbosity::print_info
 }
 
 # Retrieves information about the build cache hash random file from the remote 
image.
-# We actually use manifest image for that, which is a really, really small 
image to pull!
-# The problem is that inspecting information about remote image cannot be done 
easily with existing APIs
-# of Dockerhub because they require additional authentication even for public 
images.
-# Therefore instead we are downloading a specially prepared manifest image
-# which is built together with the main image and pushed with it. This special 
manifest image is prepared
-# during building of the main image and contains single file which is randomly 
built during the docker
-# build in the right place in the image (right after installing all 
dependencies of Apache Airflow
-# for the first time). When this random file gets regenerated it means that 
either base image has
-# changed or some of the earlier layers was modified - which means that it is 
usually faster to pull
-# that image first and then rebuild it - because this will likely be faster
+# We use manifest image for that, which is a really, really small image to 
pull!
+# The image is a specially prepared manifest image which is built together 
with the main image and
+# pushed with it. This special manifest image is prepared during building of 
the CI image and contains
+# single file which is generated with random content during the docker
+# build in the right step of the image build (right after installing all 
dependencies of Apache Airflow
+# for the first time).
+# When this random file gets regenerated it means that either base image has 
changed before that step
+# or some of the earlier layers was modified - which means that it is usually 
faster to pull
+# that image first and then rebuild it.
 function build_images::get_remote_image_build_cache_hash() {
     set +e
+    local remote_image_container_id_file
+    
remote_image_container_id_file="${AIRFLOW_SOURCES}/manifests/remote-airflow-manifest-image-${PYTHON_MAJOR_MINOR_VERSION}"
+    local remote_image_build_cache_file
+    
remote_image_build_cache_file="${AIRFLOW_SOURCES}/manifests/remote-build-cache-hash-${PYTHON_MAJOR_MINOR_VERSION}"
     # Pull remote manifest image
     if ! docker_v pull "${AIRFLOW_CI_REMOTE_MANIFEST_IMAGE}" 2>/dev/null 
>/dev/null; then
         verbosity::print_info
@@ -295,32 +299,36 @@ function 
build_images::get_remote_image_build_cache_hash() {
         verbosity::print_info
         REMOTE_DOCKER_REGISTRY_UNREACHABLE="true"
         export REMOTE_DOCKER_REGISTRY_UNREACHABLE
-        touch "${REMOTE_IMAGE_BUILD_CACHE_HASH_FILE}"
+        touch "${remote_image_build_cache_file}"
         set -e
         return
     fi
     set -e
-    rm -f "${REMOTE_IMAGE_CONTAINER_ID_FILE}"
+    rm -f "${remote_image_container_id_file}"
     # Create container dump out of the manifest image without actually running 
it
-    docker_v create --cidfile "${REMOTE_IMAGE_CONTAINER_ID_FILE}" 
"${AIRFLOW_CI_REMOTE_MANIFEST_IMAGE}"
+    docker_v create --cidfile "${remote_image_container_id_file}" 
"${AIRFLOW_CI_REMOTE_MANIFEST_IMAGE}"
     # Extract manifest and store it in local file
-    docker_v cp "$(cat "${REMOTE_IMAGE_CONTAINER_ID_FILE}"):/build-cache-hash" 
\
-        "${REMOTE_IMAGE_BUILD_CACHE_HASH_FILE}"
-    docker_v rm --force "$(cat "${REMOTE_IMAGE_CONTAINER_ID_FILE}")"
-    rm -f "${REMOTE_IMAGE_CONTAINER_ID_FILE}"
+    docker_v cp "$(cat "${remote_image_container_id_file}"):/build-cache-hash" 
\
+        "${remote_image_build_cache_file}"
+    docker_v rm --force "$(cat "${remote_image_container_id_file}")"
+    rm -f "${remote_image_container_id_file}"
     verbosity::print_info
-    verbosity::print_info "Remote build cache hash: '$(cat 
"${REMOTE_IMAGE_BUILD_CACHE_HASH_FILE}")'"
+    verbosity::print_info "Remote build cache hash: '$(cat 
"${remote_image_build_cache_file}")'"
     verbosity::print_info
 }
 
 # Compares layers from both remote and local image and set FORCE_PULL_IMAGES 
to true in case
-# More than the last NN layers are different.
+# The random has in remote image is different than that in the local image
+# indicating that it is likely faster to pull the image from cache rather than 
let the
+# image rebuild fully locally
 function build_images::compare_local_and_remote_build_cache_hash() {
     set +e
+    local local_image_build_cache_file
+    
local_image_build_cache_file="${AIRFLOW_SOURCES}/manifests/local-build-cache-hash-${PYTHON_MAJOR_MINOR_VERSION}"
     local remote_hash
-    remote_hash=$(cat "${REMOTE_IMAGE_BUILD_CACHE_HASH_FILE}")
+    remote_hash=$(cat "${remote_image_build_cache_file}")
     local local_hash
-    local_hash=$(cat "${LOCAL_IMAGE_BUILD_CACHE_HASH_FILE}")
+    local_hash=$(cat "${local_image_build_cache_file}")
 
     if [[ ${remote_hash} != "${local_hash}" || -z ${local_hash} ]] \
         ; then
diff --git a/scripts/ci/libraries/_initialization.sh 
b/scripts/ci/libraries/_initialization.sh
index 3c45e14..9c6365c 100644
--- a/scripts/ci/libraries/_initialization.sh
+++ b/scripts/ci/libraries/_initialization.sh
@@ -583,12 +583,6 @@ function initialization::initialize_package_variables() {
 }
 
 
-function initialization::initialize_build_image_variables() {
-    
REMOTE_IMAGE_CONTAINER_ID_FILE="${AIRFLOW_SOURCES}/manifests/remote-airflow-manifest-image"
-    
LOCAL_IMAGE_BUILD_CACHE_HASH_FILE="${AIRFLOW_SOURCES}/manifests/local-build-cache-hash"
-    
REMOTE_IMAGE_BUILD_CACHE_HASH_FILE="${AIRFLOW_SOURCES}/manifests/remote-build-cache-hash"
-}
-
 function initialization::set_output_color_variables() {
     COLOR_BLUE=$'\e[34m'
     COLOR_GREEN=$'\e[32m'
@@ -622,7 +616,6 @@ function initialization::initialize_common_environment() {
     initialization::initialize_github_variables
     initialization::initialize_test_variables
     initialization::initialize_package_variables
-    initialization::initialize_build_image_variables
 }
 
 function initialization::set_default_python_version_if_empty() {
@@ -869,10 +862,6 @@ function initialization::make_constants_read_only() {
     readonly BUILT_CI_IMAGE_FLAG_FILE
     readonly INIT_SCRIPT_FILE
 
-    readonly REMOTE_IMAGE_CONTAINER_ID_FILE
-    readonly LOCAL_IMAGE_BUILD_CACHE_HASH_FILE
-    readonly REMOTE_IMAGE_BUILD_CACHE_HASH_FILE
-
     readonly INSTALLED_EXTRAS
     readonly INSTALLED_PROVIDERS
 

Reply via email to