Airflow (v1.10.7 running in LocalExecutor mode) appears to be automatically
creating publicly readable dirs in /tmp for certain tasks processes. The
files I've seen so far appear innocuous, but seems like a security risk and
would like to know why this may be happening and how to stop it.

I have an airflow task that runs a sqoop <https://sqoop.apache.org/> job.
It does this using a BashOperator that calls a bash script with the sqoop
job logic. I recently noticed that the server's /tmp dir had a public
folder called "sqoop-airflow" whos contents look like...

[root@airflowetl sqoop-airflow]# cd
/tmp/sqoop-airflow/compile/[root@airflowetl compile]# ls
drwxrwxrwx 2 airflow airflows 4.0K Feb 19 20:35
004c815bc9a978acd0093069eefff28a
drwxrwxrwx 2 airflow airflows 4.0K Feb 20 21:35
58d38131dc0a3c433c27bf60570c0135
drwxrwxrwx 2 airflow airflows 4.0K Feb 26 19:35
afe2b89410fee2b4467178eced9d40a8...[root@airflowetl
compile]#[root@airflowetl compile]#[root@airflowetl compile]#
#selecting one of the folders here[root@airflowetl compile]# cd
82298635a8574abd7a55b967cbc1bb64/[root@airflowetl
82298635a8574abd7a55b967cbc1bb64]# lsQueryResult_MY_TABLE$1.class
QueryResult_MY_TABLE$7.classQueryResult_MY_TABLE$2.class
QueryResult_MY_TABLE$8.classQueryResult_MY_TABLE$3.class
QueryResult_MY_TABLE.classQueryResult_MY_TABLE$4.class
QueryResult_MY_TABLE$FieldSetterCommand.classQueryResult_MY_TABLE$5.class
 MY_TABLE.jarQueryResult_MY_TABLE$6.class[root@airflowetl
compile]#[root@airflowetl compile]#[root@airflowetl compile]#
#selecting one of the folders here

Checking the scheduler logs for any reference to this folder shows
nothing...

[airflow@airflowetl airflow]$ cat airflow-scheduler.out | grep
sqoop-airflow[airflow@airflowetl airflow]$ cat airflow-scheduler.log |
grep sqoop-airflow

The reason I strongly suspect this is caused by airflow and not by
something within the bash script itself is that the folder being created in
/tmp is call "sqoop-*airflow*" and IDK how this name is created because it
is not the name of the script or the airflow task_id nor is it a string in
any of my own code (it is the name of the particular command being run
within the script among others).

Does anyone know how this could be happening / where this comes from? Any
way to further debug for more clarity on this?

-- 
This electronic message is intended only for the named 
recipient, and may 
contain information that is confidential or 
privileged. If you are not the 
intended recipient, you are 
hereby notified that any disclosure, copying, 
distribution or 
use of the contents of this message is strictly 
prohibited. If 
you have received this message in error or are not the 
named
recipient, please notify us immediately by contacting the 
sender at 
the electronic mail address noted above, and delete 
and destroy all copies 
of this message. Thank you.

Reply via email to