Repository: incubator-airflow Updated Branches: refs/heads/master c6681681d -> b81bd08a3
[AIRFLOW-2538] Update faq doc on how to reduce airflow scheduler latency Make sure you have checked _all_ steps below. ### JIRA - [x] My PR addresses the following [Airflow JIRA] (https://issues.apache.org/jira/browse/AIRFLOW/) issues and references them in the PR title. For example, "\[AIRFLOW-XXX\] My Airflow PR" - https://issues.apache.org/jira/browse/AIRFLOW-2538 - In case you are fixing a typo in the documentation you can prepend your commit with \[AIRFLOW-XXX\], code changes always need a JIRA issue. ### Description - [x] Here are some details about my PR, including screenshots of any UI changes: Update the faq doc on how to reduce airflow scheduler latency. This comes from our internal production setting which also aligns with Maxime's email(https://lists.apache.org/thread.html/%3CCAHE Ep7WFAivyMJZ0N+0Zd1T3nvfyCJRudL3XSRLM4utSigR3dQmai l.gmail.com%3E). ### Tests - [ ] My PR adds the following unit tests __OR__ does not need testing for this extremely good reason: ### Commits - [ ] My commits all reference JIRA issues in their subject lines, and I have squashed multiple commits if they address the same issue. In addition, my commits follow the guidelines from "[How to write a good git commit message](http://chris.beams.io/posts/git- commit/)": 1. Subject is separated from body by a blank line 2. Subject is limited to 50 characters 3. Subject does not end with a period 4. Subject uses the imperative mood ("add", not "adding") 5. Body wraps at 72 characters 6. Body explains "what" and "why", not "how" ### Documentation - [ ] In case of new functionality, my PR adds documentation that describes how to use it. - When adding new operators/hooks/sensors, the autoclass documentation generation needs to be added. ### Code Quality - [ ] Passes `git diff upstream/master -u -- "*.py" | flake8 --diff` Closes #3434 from feng-tao/update_faq Project: http://git-wip-us.apache.org/repos/asf/incubator-airflow/repo Commit: http://git-wip-us.apache.org/repos/asf/incubator-airflow/commit/b81bd08a Tree: http://git-wip-us.apache.org/repos/asf/incubator-airflow/tree/b81bd08a Diff: http://git-wip-us.apache.org/repos/asf/incubator-airflow/diff/b81bd08a Branch: refs/heads/master Commit: b81bd08a334efa5242af705743519be43346295e Parents: c668168 Author: Tao feng <tf...@lyft.com> Authored: Thu May 31 22:01:59 2018 -0700 Committer: Maxime Beauchemin <maximebeauche...@gmail.com> Committed: Thu May 31 22:01:59 2018 -0700 ---------------------------------------------------------------------- docs/faq.rst | 10 +++++++++- 1 file changed, 9 insertions(+), 1 deletion(-) ---------------------------------------------------------------------- http://git-wip-us.apache.org/repos/asf/incubator-airflow/blob/b81bd08a/docs/faq.rst ---------------------------------------------------------------------- diff --git a/docs/faq.rst b/docs/faq.rst index d2c6188..33b4d5a 100644 --- a/docs/faq.rst +++ b/docs/faq.rst @@ -162,10 +162,18 @@ How can we reduce the airflow UI page load time? If your dag takes long time to load, you could reduce the value of ``default_dag_run_display_number`` configuration in ``airflow.cfg`` to a smaller value. This configurable controls the number of dag run to show in UI with default value 25. + How to fix Exception: Global variable explicit_defaults_for_timestamp needs to be on (1)? ---------------------------------------------------------------------------------------------- +----------------------------------------------------------------------------------------- This means ``explicit_defaults_for_timestamp`` is disabled in your mysql server and you need to enable it by: #. Set ``explicit_defaults_for_timestamp = 1`` under the mysqld section in your my.cnf file. #. Restart the Mysql server. + + +How to reduce airflow dag scheduling latency in production? +----------------------------------------------------------- + +- ``max_threads``: Scheduler will spawn multiple threads in parallel to schedule dags. This is controlled by ``max_threads`` with default value of 2. User should increase this value to a larger value(e.g numbers of cpus where scheduler runs - 1) in production. +- ``scheduler_heartbeat_sec``: User should consider to increase ``scheduler_heartbeat_sec`` config to a higher value(e.g 60 secs) which controls how frequent the airflow scheduler gets the heartbeat and updates the job's entry in database.