potiuk commented on issue #44373:
URL: https://github.com/apache/airflow/issues/44373#issuecomment-2597193486
> [@potiuk](https://github.com/potiuk) Do you know how breeze serves the
container when running `breeze build-docs`? I'm trying to use
[sphinx-autobuild](https://github.com/sphinx-doc/sphinx-autobuild/tree/main)
and port forward the docs page, but I can't quite figure out how.
That's quite unlikely to be easy without quite a redesign. Note I am not
the author of it, I merely suffled around the scripts and code that was there
from the beginning, so I do not know more than just by looking and reverse
engineering of how it works, And if someone would like to redesign this (on top
of what is already planned in fixing the infrastructuce pieces described here -
they are absolutely welcome.
In short (you can see the sourcesin links:
1) `breeze build docs` takes the parameters passed via breeze CLI and
convert them into Build Params that are then used as arguments of a shell
script /opt/airflow/scripts/in_container/run_docs_build.sh that is run inside
the container.
https://github.com/apache/airflow/blob/main/dev/breeze/src/airflow_breeze/commands/developer_commands.py#L700
Also that breeze command makes sure that image is ready and rebuild. But
essentially it calls this shell script:
```python
cmd = "/opt/airflow/scripts/in_container/run_docs_build.sh " + " ".join(
[shlex.quote(arg) for arg in doc_builder.args_doc_builder]
```
2) This script does some housekeeping and cleanup/removal of stuff at
completion but essentially it calls (in container):
```
python -m docs.build_docs "${@}"
```
So passing the parameters passed to `build_docs` python script defined in
https://github.com/apache/airflow/blob/main/docs/build_docs.py
3) This one can even be run directly with `--help` and show the parameters
it can take (defined in argparse here
https://github.com/apache/airflow/blob/main/docs/build_docs.py#L465) . That
includes potential selection of the packages for which the documentation should
be built
This script does quite a few more things:
* it fetches inventories from public inter-sphinx inventories so that links
to source code referred from libraries can be automatically linked from sphinx
* it also fetches "our" package inventories (prepared in canary run and
published in Airflow's amazon s3 bucket) - so that for example if you only
build one provider and refer to another or to airlfow, sphinx can properly
build-inter-sphinx links to those "external" documents. Each provider, airflow,
helm , "proiders index" are separate "sphinx packages" linked between each
other via inter-sphinx inventories. When you build a package locally, the
inventoy is regenerated and produced as part of the build, so when you build
several packages locally they can refer to each-other's new APIs and pages
added. This for example allows us to see that some links are missing when some
pages or links are moved and we need to refer to them - with intersphinx we
will see warnings (and error out) when such links are wrong
* It then selects packages to build - prioritising those that do not have
inventories - because those should be build first - so that other packages can
use the inventories when they are built together
* then the packages are built - the build is parallelised to allow to use
multiple processors - each package is build by one of the N = CPU processors -
this way building whole documentation on a 16 core machine will take less than
10 minutes rather than 1.5h - if they were run sequentially
* then there is interesting mechanism to attempt to retry building packages
to allow to link to other locally built packages - sometimes when a package is
being build it has a new page that other packages refer to (refactors and such)
then such packages will fail until inventories are build for the source
package. As packages are build in parallel it might mean that some packages
might fail in the first pass, and they need 2nd or 3rd pass in order to succeed
(depends how much "circular" or transitive package dependencies we have . This
happens up to 3 times. See here:
https://github.com/apache/airflow/blob/main/docs/build_docs.py#L534
* Buidling itself happens in the document builder class:
https://github.com/apache/airflow/blob/main/docs/exts/docs_build/docs_builder.py
- that clas prepares all the parameters needed to run sphinx command to run
build for such a package. This command is derived here (for each package
separately):
https://github.com/apache/airflow/blob/main/docs/exts/docs_build/docs_builder.py#L237
Essentially this:
```python
build_cmd = [
"sphinx-build",
"-T", # show full traceback on exception
"--color", # do emit colored output