Looking through the code for Airflow 1.10.7 I can't see anything in Airflow that would create that folder, especially not containing class files and a jar! There doesn't seem to be anything in the Sqoop hook or operator that would do it either.
Oh wait BashOperator. The only files the BashOperator writes would be to /tmp/airflowtmp*/ -- so I don't know "airflow-sqoop" is coming from, but it's not Airflow. A possible guess: are you running things as the "airflow" linux user perhaps? -ash On Feb 28 2020, at 9:47 pm, Reed Villanueva <[email protected]> wrote: > Airflow (v1.10.7 running in LocalExecutor mode) appears to be automatically > creating publicly readable dirs in /tmp for certain tasks processes. The > files I've seen so far appear innocuous, but seems like a security risk and > would like to know why this may be happening and how to stop it. > I have an airflow task that runs a sqoop (https://sqoop.apache.org/) job. It > does this using a BashOperator that calls a bash script with the sqoop job > logic. I recently noticed that the server's /tmp dir had a public folder > called "sqoop-airflow" whos contents look like... > [root@airflowetl sqoop-airflow]# cd > /tmp/sqoop-airflow/compile/[root@airflowetl compile]# ls > drwxrwxrwx 2 airflow airflows 4.0K Feb 19 20:35 > 004c815bc9a978acd0093069eefff28a > drwxrwxrwx 2 airflow airflows 4.0K Feb 20 21:35 > 58d38131dc0a3c433c27bf60570c0135 > drwxrwxrwx 2 airflow airflows 4.0K Feb 26 19:35 > afe2b89410fee2b4467178eced9d40a8 > ...[root@airflowetl compile]#[root@airflowetl compile]#[root@airflowetl > compile]# #selecting one of the folders here[root@airflowetl compile]# cd > 82298635a8574abd7a55b967cbc1bb64/[root@airflowetl > 82298635a8574abd7a55b967cbc1bb64]# ls > QueryResult_MY_TABLE$1.class QueryResult_MY_TABLE$7.class > QueryResult_MY_TABLE$2.class QueryResult_MY_TABLE$8.class > QueryResult_MY_TABLE$3.class QueryResult_MY_TABLE.class > QueryResult_MY_TABLE$4.class QueryResult_MY_TABLE$FieldSetterCommand.class > QueryResult_MY_TABLE$5.class MY_TABLE.jar > QueryResult_MY_TABLE$6.class > [root@airflowetl compile]#[root@airflowetl compile]#[root@airflowetl > compile]# #selecting one of the folders here > Checking the scheduler logs for any reference to this folder shows nothing... > [airflow@airflowetl airflow]$ cat airflow-scheduler.out | grep sqoop-airflow > [airflow@airflowetl airflow]$ cat airflow-scheduler.log | grep sqoop-airflow > The reason I strongly suspect this is caused by airflow and not by something > within the bash script itself is that the folder being created in /tmp is > call "sqoop-airflow" and IDK how this name is created because it is not the > name of the script or the airflow task_id nor is it a string in any of my own > code (it is the name of the particular command being run within the script > among others). > Does anyone know how this could be happening / where this comes from? Any way > to further debug for more clarity on this? > > > This electronic message is intended only for the named > recipient, and may contain information that is confidential or > privileged. If you are not the intended recipient, you are > hereby notified that any disclosure, copying, distribution or > use of the contents of this message is strictly prohibited. If > you have received this message in error or are not the named > recipient, please notify us immediately by contacting the > sender at the electronic mail address noted above, and delete > and destroy all copies of this message. Thank you.
