Oh - don't use SequentialExecutor! It blocks the scheduler from heartbeating when running tasks! So if a task takes longer than the scheduler heart beat to run you'll see that message.
I would suggest switching to the LocalExecutor instead. > On 10 Dec 2019, at 19:52, Reed Villanueva <[email protected]> wrote: > > I think that the multiple other scheduler DagFileProcessorManagers, were just > from previous times when I would run the dag, the task with the apparent > offending code would run, then the scheduler heartbeat error would pop up, > and I'd restart the scheduler via a "airflow scheduler -D" command (when I > guess they were not really killed, just missing heartbeats for whatever > reason). So it's not like starting the scheduler would do something weird > like start multiple of them or anything like that. > > Haven't seen anything else unusual about the dag files (to me each of the > tasks are all pretty simple and short) and since implementing the > previously-mentioned change, have not seen the heartbeat error again. I do > think it's weird too considering how unrelated I would think the scheduler > heartbeat would be to an HDFS file not found error. > > Running airflow with the SequentialExecutor, but not sure what you mean by > "process supervisor". Example? > > On Tue, Dec 10, 2019 at 3:54 AM Ash Berlin-Taylor <[email protected] > <mailto:[email protected]>> wrote: > Hmm, having more than one DagFileProcessorManager alive at the same time does > indicate something has gone wrong -- there should only be one of those. > > Are you using sub-dags or doing anything else "unusual" in any of your dag > files? > > What executor are you using? What process supervisor (if any) are you using > to run your scheduler. > > -ash > >> On 9 Dec 2019, at 20:48, Reed Villanueva <[email protected] >> <mailto:[email protected]>> wrote: >> >> Have problem where the airflow (v1.10.5) webserver will complain... >> >> The scheduler does not appear to be running. Last heartbeat was received 45 >> minutes ago. >> But checking the scheduler daemon process (started via airflow scheduler -D) >> can see... >> >> [airflow@airflowetl airflow]$ cat airflow-scheduler.pid >> 64186 >> [airflow@airflowetl airflow]$ ps -aux | grep 64186 >> airflow 64186 0.0 0.1 663340 67796 ? S 15:03 0:00 >> /usr/bin/python3 /home/airflow/.local/bin/airflow scheduler -D >> airflow 94305 0.0 0.0 112716 964 pts/4 R+ 16:01 0:00 grep >> --color=auto 64186 >> and after some period of time the error message goes away again). >> >> This happens very frequently off-and-on even after restarting both the >> webserver and scheduler. >> >> The airflow-scheduler.err file is empty and the .out and .log files appear >> innocuous (need more time to look through deeper). >> >> Running the scheduler in the terminal to see the feed live, everything seems >> to run fine until I see this output in the middle of the dag execution >> >> [2019-11-29 15:51:57,825] {__init__.py:51} INFO - Using executor >> SequentialExecutor >> [2019-11-29 15:51:58,259] {dagbag.py:90} INFO - Filling up the DagBag from >> /home/airflow/airflow/dags/my_dag_file.py >> Once this pops up, I can see in the web UI that the scheduler heartbeat >> error message appears. (Oddly, killing the scheduler process here does not >> generate the heartbeat error message in the web UI). Checking for the >> scheduler process, I see... >> >> [airflow@airflowetl airflow]$ ps -aux | grep scheduler >> airflow 3409 0.2 0.1 523336 67384 ? S Oct24 115:06 airflow >> scheduler -- DagFileProcessorManager >> airflow 25569 0.0 0.0 112716 968 pts/4 S+ 16:00 0:00 grep >> --color=auto scheduler >> airflow 56771 0.0 0.1 662560 67264 ? S Nov26 4:09 airflow >> scheduler -- DagFileProcessorManager >> airflow 64187 0.0 0.1 662564 67096 ? S Nov27 0:00 airflow >> scheduler -- DagFileProcessorManager >> airflow 153959 0.1 0.1 662568 67232 ? S 15:01 0:06 airflow >> scheduler -- DagFileProcessorManager >> IDK if this is this normal or not. >> >> Thought the problem may have been that there were older scheduler processes >> that were not deleted that were still running... >> >> [airflow@airflowetl airflow]$ kill -9 3409 36771 >> bash: kill: (36771) - No such process >> [airflow@airflowetl airflow]$ ps -aux | grep scheduler >> airflow 56771 0.0 0.1 662560 67264 ? S Nov26 4:09 airflow >> scheduler -- DagFileProcessorManager >> airflow 64187 0.0 0.1 662564 67096 ? S Nov27 0:00 airflow >> scheduler -- DagFileProcessorManager >> airflow 153959 0.0 0.1 662568 67232 ? S Nov29 0:06 airflow >> scheduler -- DagFileProcessorManager >> airflow 155741 0.0 0.0 112712 968 pts/2 R+ 15:54 0:00 grep >> --color=auto scheduler >> Notice all the various start times in the output. >> >> Doing a kill -9 56771 64187 ... and then rerunning airflow scheduler -D does >> not seem to have fixed the problem. >> >> Note: the scheduler seems to consistently stop running after a task fails to >> move a file from an FTP location to an HDFS one... >> >> hadoop fs -Dfs.mapr.trace=debug -get \ >> ftp://$FTP_CLIENT:$FTP_PASS@$FTP_IP/$FTP_DIR"$TABLENAME.TSV" \ >> $PROJECT_HOME/tmp/"$TABLENAME.TSV" \ >> | hadoop fs -moveFromLocal $PROJECT_HOME/tmp/"$TABLENAME.TSV" >> "$DATASTORE" >> # see https://stackoverflow.com/a/46433847/8236733 >> <https://stackoverflow.com/a/46433847/8236733> >> Note there is a logic error in this line since $DATASTORE is a hdfs dir >> path, not a file path, but either way I don't think that the airflow >> scheduler should be missing heartbeats like this from something seemingly so >> unrelated. >> >> Anyone know what could be going on here or how to fix? >> >> >> This electronic message is intended only for the named >> recipient, and may contain information that is confidential or >> privileged. If you are not the intended recipient, you are >> hereby notified that any disclosure, copying, distribution or >> use of the contents of this message is strictly prohibited. If >> you have received this message in error or are not the named >> recipient, please notify us immediately by contacting the >> sender at the electronic mail address noted above, and delete >> and destroy all copies of this message. Thank you. > > > This electronic message is intended only for the named > recipient, and may contain information that is confidential or > privileged. If you are not the intended recipient, you are > hereby notified that any disclosure, copying, distribution or > use of the contents of this message is strictly prohibited. If > you have received this message in error or are not the named > recipient, please notify us immediately by contacting the > sender at the electronic mail address noted above, and delete > and destroy all copies of this message. Thank you.
