Looking through the code for Airflow 1.10.7 I can't see anything in Airflow 
that would create that folder, especially not containing class files and a jar! 
There doesn't seem to be anything in the Sqoop hook or operator that would do 
it either.

Oh wait BashOperator. The only files the BashOperator writes would be to 
/tmp/airflowtmp*/ -- so I don't know "airflow-sqoop" is coming from, but it's 
not Airflow.
A possible guess: are you running things as the "airflow" linux user perhaps?
-ash
On Feb 28 2020, at 9:47 pm, Reed Villanueva <[email protected]> wrote:
> Airflow (v1.10.7 running in LocalExecutor mode) appears to be automatically 
> creating publicly readable dirs in /tmp for certain tasks processes. The 
> files I've seen so far appear innocuous, but seems like a security risk and 
> would like to know why this may be happening and how to stop it.
> I have an airflow task that runs a sqoop (https://sqoop.apache.org/) job. It 
> does this using a BashOperator that calls a bash script with the sqoop job 
> logic. I recently noticed that the server's /tmp dir had a public folder 
> called "sqoop-airflow" whos contents look like...
> [root@airflowetl sqoop-airflow]# cd 
> /tmp/sqoop-airflow/compile/[root@airflowetl compile]# ls
> drwxrwxrwx 2 airflow airflows 4.0K Feb 19 20:35 
> 004c815bc9a978acd0093069eefff28a
> drwxrwxrwx 2 airflow airflows 4.0K Feb 20 21:35 
> 58d38131dc0a3c433c27bf60570c0135
> drwxrwxrwx 2 airflow airflows 4.0K Feb 26 19:35 
> afe2b89410fee2b4467178eced9d40a8
> ...[root@airflowetl compile]#[root@airflowetl compile]#[root@airflowetl 
> compile]# #selecting one of the folders here[root@airflowetl compile]# cd 
> 82298635a8574abd7a55b967cbc1bb64/[root@airflowetl 
> 82298635a8574abd7a55b967cbc1bb64]# ls
> QueryResult_MY_TABLE$1.class QueryResult_MY_TABLE$7.class
> QueryResult_MY_TABLE$2.class QueryResult_MY_TABLE$8.class
> QueryResult_MY_TABLE$3.class QueryResult_MY_TABLE.class
> QueryResult_MY_TABLE$4.class QueryResult_MY_TABLE$FieldSetterCommand.class
> QueryResult_MY_TABLE$5.class MY_TABLE.jar
> QueryResult_MY_TABLE$6.class
> [root@airflowetl compile]#[root@airflowetl compile]#[root@airflowetl 
> compile]# #selecting one of the folders here
> Checking the scheduler logs for any reference to this folder shows nothing...
> [airflow@airflowetl airflow]$ cat airflow-scheduler.out | grep sqoop-airflow
> [airflow@airflowetl airflow]$ cat airflow-scheduler.log | grep sqoop-airflow
> The reason I strongly suspect this is caused by airflow and not by something 
> within the bash script itself is that the folder being created in /tmp is 
> call "sqoop-airflow" and IDK how this name is created because it is not the 
> name of the script or the airflow task_id nor is it a string in any of my own 
> code (it is the name of the particular command being run within the script 
> among others).
> Does anyone know how this could be happening / where this comes from? Any way 
> to further debug for more clarity on this?
>
>
> This electronic message is intended only for the named
> recipient, and may contain information that is confidential or
> privileged. If you are not the intended recipient, you are
> hereby notified that any disclosure, copying, distribution or
> use of the contents of this message is strictly prohibited. If
> you have received this message in error or are not the named
> recipient, please notify us immediately by contacting the
> sender at the electronic mail address noted above, and delete
> and destroy all copies of this message. Thank you.

Reply via email to