Oh - don't use SequentialExecutor! It blocks the scheduler from heartbeating 
when running tasks! So if a task takes longer than the scheduler heart beat to 
run you'll see that message.

I would suggest switching to the LocalExecutor instead.

> On 10 Dec 2019, at 19:52, Reed Villanueva <[email protected]> wrote:
> 
> I think that the multiple other scheduler DagFileProcessorManagers, were just 
> from previous times when I would run the dag, the task with the apparent 
> offending code would run, then the scheduler heartbeat error would pop up, 
> and I'd restart the scheduler via a "airflow scheduler -D" command (when I 
> guess they were not really killed, just missing heartbeats for whatever 
> reason). So it's not like starting the scheduler would do something weird 
> like start multiple of them or anything like that.
> 
> Haven't seen anything else unusual about the dag files (to me each of the 
> tasks are all pretty simple and short) and since implementing the 
> previously-mentioned change, have not seen the heartbeat error again. I do 
> think it's weird too considering how unrelated I would think the scheduler 
> heartbeat would be to an HDFS file not found error.
> 
> Running airflow with the SequentialExecutor, but not sure what you mean by 
> "process supervisor". Example?
> 
> On Tue, Dec 10, 2019 at 3:54 AM Ash Berlin-Taylor <[email protected] 
> <mailto:[email protected]>> wrote:
> Hmm, having more than one DagFileProcessorManager alive at the same time does 
> indicate something has gone wrong -- there should only be one of those.
> 
> Are you using sub-dags or doing anything else "unusual" in any of your dag 
> files?
> 
> What executor are you using? What process supervisor (if any) are you using 
> to run your scheduler.
> 
> -ash
> 
>> On 9 Dec 2019, at 20:48, Reed Villanueva <[email protected] 
>> <mailto:[email protected]>> wrote:
>> 
>> Have problem where the airflow (v1.10.5) webserver will complain...
>> 
>> The scheduler does not appear to be running. Last heartbeat was received 45 
>> minutes ago.
>> But checking the scheduler daemon process (started via airflow scheduler -D) 
>> can see...
>> 
>> [airflow@airflowetl airflow]$ cat airflow-scheduler.pid
>> 64186
>> [airflow@airflowetl airflow]$ ps -aux | grep 64186
>> airflow   64186  0.0  0.1 663340 67796 ?        S    15:03   0:00 
>> /usr/bin/python3 /home/airflow/.local/bin/airflow scheduler -D
>> airflow   94305  0.0  0.0 112716   964 pts/4    R+   16:01   0:00 grep 
>> --color=auto 64186
>> and after some period of time the error message goes away again).
>> 
>> This happens very frequently off-and-on even after restarting both the 
>> webserver and scheduler.
>> 
>> The airflow-scheduler.err file is empty and the .out and .log files appear 
>> innocuous (need more time to look through deeper).
>> 
>> Running the scheduler in the terminal to see the feed live, everything seems 
>> to run fine until I see this output in the middle of the dag execution
>> 
>> [2019-11-29 15:51:57,825] {__init__.py:51} INFO - Using executor 
>> SequentialExecutor
>> [2019-11-29 15:51:58,259] {dagbag.py:90} INFO - Filling up the DagBag from 
>> /home/airflow/airflow/dags/my_dag_file.py
>> Once this pops up, I can see in the web UI that the scheduler heartbeat 
>> error message appears. (Oddly, killing the scheduler process here does not 
>> generate the heartbeat error message in the web UI). Checking for the 
>> scheduler process, I see...
>> 
>> [airflow@airflowetl airflow]$ ps -aux | grep scheduler
>> airflow    3409  0.2  0.1 523336 67384 ?        S    Oct24 115:06 airflow 
>> scheduler -- DagFileProcessorManager
>> airflow   25569  0.0  0.0 112716   968 pts/4    S+   16:00   0:00 grep 
>> --color=auto scheduler
>> airflow   56771  0.0  0.1 662560 67264 ?        S    Nov26   4:09 airflow 
>> scheduler -- DagFileProcessorManager
>> airflow   64187  0.0  0.1 662564 67096 ?        S    Nov27   0:00 airflow 
>> scheduler -- DagFileProcessorManager
>> airflow  153959  0.1  0.1 662568 67232 ?        S    15:01   0:06 airflow 
>> scheduler -- DagFileProcessorManager
>> IDK if this is this normal or not.
>> 
>> Thought the problem may have been that there were older scheduler processes 
>> that were not deleted that were still running...
>> 
>> [airflow@airflowetl airflow]$ kill -9 3409 36771
>> bash: kill: (36771) - No such process
>> [airflow@airflowetl airflow]$ ps -aux | grep scheduler
>> airflow   56771  0.0  0.1 662560 67264 ?        S    Nov26   4:09 airflow 
>> scheduler -- DagFileProcessorManager
>> airflow   64187  0.0  0.1 662564 67096 ?        S    Nov27   0:00 airflow 
>> scheduler -- DagFileProcessorManager
>> airflow  153959  0.0  0.1 662568 67232 ?        S    Nov29   0:06 airflow 
>> scheduler -- DagFileProcessorManager
>> airflow  155741  0.0  0.0 112712   968 pts/2    R+   15:54   0:00 grep 
>> --color=auto scheduler
>> Notice all the various start times in the output.
>> 
>> Doing a kill -9 56771 64187 ... and then rerunning airflow scheduler -D does 
>> not seem to have fixed the problem.
>> 
>> Note: the scheduler seems to consistently stop running after a task fails to 
>> move a file from an FTP location to an HDFS one...
>> 
>> hadoop fs -Dfs.mapr.trace=debug -get \
>>         ftp://$FTP_CLIENT:$FTP_PASS@$FTP_IP/$FTP_DIR"$TABLENAME.TSV"; \
>>         $PROJECT_HOME/tmp/"$TABLENAME.TSV" \
>>         | hadoop fs -moveFromLocal $PROJECT_HOME/tmp/"$TABLENAME.TSV" 
>> "$DATASTORE"
>> # see https://stackoverflow.com/a/46433847/8236733 
>> <https://stackoverflow.com/a/46433847/8236733>
>> Note there is a logic error in this line since $DATASTORE is a hdfs dir 
>> path, not a file path, but either way I don't think that the airflow 
>> scheduler should be missing heartbeats like this from something seemingly so 
>> unrelated.
>> 
>> Anyone know what could be going on here or how to fix?
>> 
>> 
>> This electronic message is intended only for the named 
>> recipient, and may contain information that is confidential or 
>> privileged. If you are not the intended recipient, you are 
>> hereby notified that any disclosure, copying, distribution or 
>> use of the contents of this message is strictly prohibited. If 
>> you have received this message in error or are not the named
>> recipient, please notify us immediately by contacting the 
>> sender at the electronic mail address noted above, and delete 
>> and destroy all copies of this message. Thank you.
> 
> 
> This electronic message is intended only for the named 
> recipient, and may contain information that is confidential or 
> privileged. If you are not the intended recipient, you are 
> hereby notified that any disclosure, copying, distribution or 
> use of the contents of this message is strictly prohibited. If 
> you have received this message in error or are not the named
> recipient, please notify us immediately by contacting the 
> sender at the electronic mail address noted above, and delete 
> and destroy all copies of this message. Thank you.

Reply via email to