Hello All, The issue SPARK-23153 <https://issues.apache.org/jira/browse/SPARK-23153> lets us copy any file to the pod/container, by first copying it to a hadoop supported filesystem e.g. HDFS, s3, cos etc. This is especially useful if, the files have to be copied to large number of pods/nodes. However, in most cases we need the file to be copied only to the driver, it may not be always convenient (esp. in case of clusters with smaller no. of nodes or limited resources), to setup an additional intermediate storage just for this, it cannot work without an intermediate distributed storage of some sort. So, while going through the code of kubectl cp command <https://github.com/kubernetes/kubernetes/blob/master/pkg/kubectl/cmd/cp/cp.go> . It appears, that we can use the same technique using tar cf - /tmp/foo | kubectl exec -i -n <some-namespace> <some-pod> -- tar xf - -C /tmp/bar to copy files in a more secure way (because the file goes through kubernetes API, which has its own security in place) This also lets us compress the file while sending.
If there is any interest in this sort of feature, I am ready to open an issue and work on it. So let us discuss, if this has already been explored and there are some known issues with this approach. Thank you, Prashant.