Airflow (v1.10.7 running in LocalExecutor mode) appears to be automatically creating publicly readable dirs in /tmp for certain tasks processes. The files I've seen so far appear innocuous, but seems like a security risk and would like to know why this may be happening and how to stop it.
I have an airflow task that runs a sqoop <https://sqoop.apache.org/> job. It does this using a BashOperator that calls a bash script with the sqoop job logic. I recently noticed that the server's /tmp dir had a public folder called "sqoop-airflow" whos contents look like... [root@airflowetl sqoop-airflow]# cd /tmp/sqoop-airflow/compile/[root@airflowetl compile]# ls drwxrwxrwx 2 airflow airflows 4.0K Feb 19 20:35 004c815bc9a978acd0093069eefff28a drwxrwxrwx 2 airflow airflows 4.0K Feb 20 21:35 58d38131dc0a3c433c27bf60570c0135 drwxrwxrwx 2 airflow airflows 4.0K Feb 26 19:35 afe2b89410fee2b4467178eced9d40a8...[root@airflowetl compile]#[root@airflowetl compile]#[root@airflowetl compile]# #selecting one of the folders here[root@airflowetl compile]# cd 82298635a8574abd7a55b967cbc1bb64/[root@airflowetl 82298635a8574abd7a55b967cbc1bb64]# lsQueryResult_MY_TABLE$1.class QueryResult_MY_TABLE$7.classQueryResult_MY_TABLE$2.class QueryResult_MY_TABLE$8.classQueryResult_MY_TABLE$3.class QueryResult_MY_TABLE.classQueryResult_MY_TABLE$4.class QueryResult_MY_TABLE$FieldSetterCommand.classQueryResult_MY_TABLE$5.class MY_TABLE.jarQueryResult_MY_TABLE$6.class[root@airflowetl compile]#[root@airflowetl compile]#[root@airflowetl compile]# #selecting one of the folders here Checking the scheduler logs for any reference to this folder shows nothing... [airflow@airflowetl airflow]$ cat airflow-scheduler.out | grep sqoop-airflow[airflow@airflowetl airflow]$ cat airflow-scheduler.log | grep sqoop-airflow The reason I strongly suspect this is caused by airflow and not by something within the bash script itself is that the folder being created in /tmp is call "sqoop-*airflow*" and IDK how this name is created because it is not the name of the script or the airflow task_id nor is it a string in any of my own code (it is the name of the particular command being run within the script among others). Does anyone know how this could be happening / where this comes from? Any way to further debug for more clarity on this? -- This electronic message is intended only for the named recipient, and may contain information that is confidential or privileged. If you are not the intended recipient, you are hereby notified that any disclosure, copying, distribution or use of the contents of this message is strictly prohibited. If you have received this message in error or are not the named recipient, please notify us immediately by contacting the sender at the electronic mail address noted above, and delete and destroy all copies of this message. Thank you.
