I think that the multiple other scheduler DagFileProcessorManagers, were
just from previous times when I would run the dag, the task with the
apparent offending code would run, then the scheduler heartbeat error would
pop up, and I'd restart the scheduler via a "airflow scheduler -D" command
(when I guess they were not really killed, just missing heartbeats for
whatever reason). So it's not like starting the scheduler would do
something weird like start multiple of them or anything like that.

Haven't seen anything else unusual about the dag files (to me each of the
tasks are all pretty simple and short) and since implementing the
previously-mentioned change, have not seen the heartbeat error again. I do
think it's weird too considering how unrelated I would think the scheduler
heartbeat would be to an HDFS file not found error.

Running airflow with the SequentialExecutor, but not sure what you mean by
"process supervisor". Example?

On Tue, Dec 10, 2019 at 3:54 AM Ash Berlin-Taylor <[email protected]> wrote:

> Hmm, having more than one DagFileProcessorManager alive at the same time
> does indicate something has gone wrong -- there should only be one of those.
>
> Are you using sub-dags or doing anything else "unusual" in any of your dag
> files?
>
> What executor are you using? What process supervisor (if any) are you
> using to run your scheduler.
>
> -ash
>
> On 9 Dec 2019, at 20:48, Reed Villanueva <[email protected]> wrote:
>
> Have problem where the airflow (v1.10.5) webserver will complain...
>
> The scheduler does not appear to be running. Last heartbeat was received
> 45 minutes ago.
>
> But checking the scheduler daemon process (started via airflow scheduler
> -D) can see...
>
> [airflow@airflowetl airflow]$ cat 
> airflow-scheduler.pid64186[airflow@airflowetl airflow]$ ps -aux | grep 64186
> airflow   64186  0.0  0.1 663340 67796 ?        S    15:03   0:00 
> /usr/bin/python3 /home/airflow/.local/bin/airflow scheduler -D
> airflow   94305  0.0  0.0 112716   964 pts/4    R+   16:01   0:00 grep 
> --color=auto 64186
>
> and after some period of time the error message *goes away again*).
>
> This happens very frequently off-and-on even after restarting both the
> webserver and scheduler.
>
> The airflow-scheduler.err file is empty and the .out and .log files
> appear innocuous (need more time to look through deeper).
>
> Running the scheduler in the terminal to see the feed live, everything
> seems to run fine until I see this output in the middle of the dag execution
>
> [2019-11-29 15:51:57,825] {__init__.py:51} INFO - Using executor 
> SequentialExecutor[2019-11-29 15:51:58,259] {dagbag.py:90} INFO - Filling up 
> the DagBag from /home/airflow/airflow/dags/my_dag_file.py
>
> Once this pops up, I can see in the web UI that the scheduler heartbeat
> error message appears. (Oddly, killing the scheduler process here does not
> generate the heartbeat error message in the web UI). Checking for the
> scheduler process, I see...
>
> [airflow@airflowetl airflow]$ ps -aux | grep scheduler
> airflow    3409  0.2  0.1 523336 67384 ?        S    Oct24 115:06 airflow 
> scheduler -- DagFileProcessorManager
> airflow   25569  0.0  0.0 112716   968 pts/4    S+   16:00   0:00 grep 
> --color=auto scheduler
> airflow   56771  0.0  0.1 662560 67264 ?        S    Nov26   4:09 airflow 
> scheduler -- DagFileProcessorManager
> airflow   64187  0.0  0.1 662564 67096 ?        S    Nov27   0:00 airflow 
> scheduler -- DagFileProcessorManager
> airflow  153959  0.1  0.1 662568 67232 ?        S    15:01   0:06 airflow 
> scheduler -- DagFileProcessorManager
>
> IDK if this is this normal or not.
>
> Thought the problem may have been that there were older scheduler
> processes that were not deleted that were still running...
>
> [airflow@airflowetl airflow]$ kill -9 3409 36771
> bash: kill: (36771) - No such process[airflow@airflowetl airflow]$ ps -aux | 
> grep scheduler
> airflow   56771  0.0  0.1 662560 67264 ?        S    Nov26   4:09 airflow 
> scheduler -- DagFileProcessorManager
> airflow   64187  0.0  0.1 662564 67096 ?        S    Nov27   0:00 airflow 
> scheduler -- DagFileProcessorManager
> airflow  153959  0.0  0.1 662568 67232 ?        S    Nov29   0:06 airflow 
> scheduler -- DagFileProcessorManager
> airflow  155741  0.0  0.0 112712   968 pts/2    R+   15:54   0:00 grep 
> --color=auto scheduler
>
> Notice all the various start times in the output.
>
> Doing a kill -9 56771 64187 ... and then rerunning airflow scheduler -D does
> not seem to have fixed the problem.
>
> Note: the scheduler seems to consistently stop running after a task fails
> to move a file from an FTP location to an HDFS one...
>
> hadoop fs -Dfs.mapr.trace=debug -get \
>         ftp://$FTP_CLIENT:$FTP_PASS@$FTP_IP/$FTP_DIR"$TABLENAME.TSV"; \
>         $PROJECT_HOME/tmp/"$TABLENAME.TSV" \
>         | hadoop fs -moveFromLocal $PROJECT_HOME/tmp/"$TABLENAME.TSV" 
> "$DATASTORE"# see https://stackoverflow.com/a/46433847/8236733
>
> Note there *is* a logic error in this line since $DATASTORE is a hdfs dir
> path, not a file path, but either way I don't think that the airflow
> scheduler should be missing heartbeats like this from something seemingly
> so unrelated.
>
> Anyone know what could be going on here or how to fix?
>
> This electronic message is intended only for the named
> recipient, and may contain information that is confidential or
> privileged. If you are not the intended recipient, you are
> hereby notified that any disclosure, copying, distribution or
> use of the contents of this message is strictly prohibited. If
> you have received this message in error or are not the named
> recipient, please notify us immediately by contacting the
> sender at the electronic mail address noted above, and delete
> and destroy all copies of this message. Thank you.
>
>
>

-- 
This electronic message is intended only for the named 
recipient, and may 
contain information that is confidential or 
privileged. If you are not the 
intended recipient, you are 
hereby notified that any disclosure, copying, 
distribution or 
use of the contents of this message is strictly 
prohibited. If 
you have received this message in error or are not the 
named
recipient, please notify us immediately by contacting the 
sender at 
the electronic mail address noted above, and delete 
and destroy all copies 
of this message. Thank you.

Reply via email to