Re: [I] Modernize Docs Building workflows [airflow]

2025-01-16 Thread via GitHub


potiuk commented on issue #44373:
URL: https://github.com/apache/airflow/issues/44373#issuecomment-2597193486

   > [@potiuk](https://github.com/potiuk) Do you know how breeze serves the 
container when running `breeze build-docs`? I'm trying to use 
[sphinx-autobuild](https://github.com/sphinx-doc/sphinx-autobuild/tree/main) 
and port forward the docs page, but I can't quite figure out how.
   
   That's quite unlikely to be easy without  quite a redesign. Note I am not 
the author of it, I merely suffled around the scripts and code that was there 
from the beginning, so I do not know more than just by looking and reverse 
engineering of how it works, And if someone would like to redesign this (on top 
of what is already planned in fixing the infrastructuce pieces described here - 
they are absolutely welcome.
   
   In short (you can see the sourcesin links:
   
   1) `breeze build docs` takes the parameters passed via breeze CLI and 
convert them into Build Params that are then used as arguments of a shell 
script /opt/airflow/scripts/in_container/run_docs_build.sh that is run inside 
the container.  
   
   
https://github.com/apache/airflow/blob/main/dev/breeze/src/airflow_breeze/commands/developer_commands.py#L700
   
   Also that breeze command makes sure that image is ready and rebuild. But 
essentially it calls this shell script:
   
   ```python
   cmd = "/opt/airflow/scripts/in_container/run_docs_build.sh " + " ".join(
   [shlex.quote(arg) for arg in doc_builder.args_doc_builder]
   ```
   
   2) This script does some housekeeping and cleanup/removal of stuff at 
completion but essentially it calls (in container):
   
   ```
   python -m docs.build_docs "${@}" 
   ```
   
   So passing the parameters passed to `build_docs` python script defined in 
https://github.com/apache/airflow/blob/main/docs/build_docs.py
   
   3) This one can even be run directly with `--help` and show the parameters 
it can take (defined in argparse here 
https://github.com/apache/airflow/blob/main/docs/build_docs.py#L465) . That 
includes potential selection of the packages for which the documentation should 
be built
   
   This script does quite a few more things:
   
   * it fetches inventories from public inter-sphinx inventories so that links 
to source code referred from libraries can be automatically linked from sphinx
   
   * it also fetches "our" package inventories (prepared in canary run and 
published in Airflow's amazon s3 bucket)  - so that for example if you only 
build one provider and refer to another or to airlfow, sphinx can properly 
build-inter-sphinx links to those "external" documents. Each provider, airflow, 
helm , "proiders index" are separate "sphinx packages" linked between each 
other via inter-sphinx inventories. When you build a package locally, the 
inventoy is regenerated and produced as part of the build, so when you build 
several packages locally they can refer to each-other's new APIs and pages 
added. This for example allows us to see that some links are missing when some 
pages or links are moved and we need to refer to them - with intersphinx we 
will see warnings (and error out) when such links are wrong
   
   * It then selects packages to build - prioritising those that do not have 
inventories - because those should be build first - so that other packages can 
use the inventories when they are built together
   
   * then the packages are built - the build is parallelised to allow to use 
multiple processors - each package is build by one of the N = CPU processors - 
this way building whole documentation on a 16 core machine will take less than 
10 minutes rather than 1.5h - if they were run sequentially

   * then there is interesting mechanism to attempt to retry building packages 
to allow to link to other locally built packages - sometimes when a package is 
being build it has a new page that other packages refer to (refactors and such) 
then such packages will fail until inventories are build for the source 
package. As packages are build in parallel it might mean that some packages 
might fail in the first pass, and they need 2nd or 3rd pass in order to succeed 
(depends how much "circular" or transitive package dependencies we have . This 
happens up to 3 times. See here: 
https://github.com/apache/airflow/blob/main/docs/build_docs.py#L534

   * Buidling itself happens in the document builder class: 
https://github.com/apache/airflow/blob/main/docs/exts/docs_build/docs_builder.py
 - that clas prepares all the parameters needed to run sphinx command to run 
build for such a package. This command is derived here (for each package 
separately):
   
   
https://github.com/apache/airflow/blob/main/docs/exts/docs_build/docs_builder.py#L237
   
   Essentially this:
   
   ```python
   build_cmd = [
   "sphinx-build",
   "-T",  # show full traceback on exception
   "--color",  # do emit colored output
  

Re: [I] Modernize Docs Building workflows [airflow]

2025-01-16 Thread via GitHub


RNHTTR commented on issue #44373:
URL: https://github.com/apache/airflow/issues/44373#issuecomment-2597094187

   @potiuk Do you know how breeze serves the container when running `breeze 
build-docs`? I'm trying to use 
[sphinx-autobuild](https://github.com/sphinx-doc/sphinx-autobuild/tree/main) 
and port forward the docs page, but I can't quite figure out how.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [I] Modernize Docs Building workflows [airflow]

2025-01-10 Thread via GitHub


shahar1 commented on issue #44373:
URL: https://github.com/apache/airflow/issues/44373#issuecomment-2582747341

   > Here's an [Old dev list thread on the 
matter](https://lists.apache.org/list?d...@airflow.apache.org:2023-10:%22The%20GitHub%20Action%20for%20building%20docs%22).
 Maybe a good starting place is first developing all Airflow 3.0 (and new 2.x 
release) documentation using a static site generator like 
[Pelican](https://getpelican.com/). We actually have an [issue on airflow-site 
for this](https://getpelican.com/).
   
   I assume that you meant https://github.com/apache/airflow-site/issues/719 in 
the second link?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [I] Modernize Docs Building workflows [airflow]

2025-01-09 Thread via GitHub


RNHTTR commented on issue #44373:
URL: https://github.com/apache/airflow/issues/44373#issuecomment-2580877699

   Here's an [Old dev list thread on the 
matter](https://lists.apache.org/list?d...@airflow.apache.org:2023-10:%22The%20GitHub%20Action%20for%20building%20docs%22).
 Maybe a good starting place is first developing all Airflow 3.0 (and new 2.x 
release) documentation using a static site generator like 
[Pelican](https://getpelican.com/). We actually have an [issue on airflow-site 
for this](https://getpelican.com/).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org