Hi list, Suppose, you have a private Python package that contains some code people want to be sharing when writing their pipelines.
So, typically, the installation process of the package would be either pip install git+ssh://[email protected]/mycompany/mypackage#egg=mypackage or git clone git://[email protected]/mycompany/mypackage python setup.py mypackage/setup.py Now, the problem starts when we want to get that package into Dataflow. Right now, to my understanding, DataflowRunner supports 3 approaches: 1. Specifying a requirements_file parameter in the pipeline options. This basically must be a requirements.txt file. 2. Specifying an extra_packages parameter in the pipeline options. This must be a list of tarballs, each of which contains a Python package packaged using distutils. 3. Specifying a setup_file parameter in the pipeline options. This will just run the python path/to/my/setup.py package command and then send the files over the wire. The best approach I could come up with was including an *additional* setup.py into the package itself, so that when we install that package, the setup.py file gets installed along with it. And then, I’d point the setup_file option to that file. This gist <https://gist.github.com/doubleyou/be01226352372491babda7602022c506> shows the basic approach in code. Both setup.py and options.py are supposed to be present in the installed package. It kind of works for me, with some caveats, but I just wanted to find out if it’s a more decent way to handle my situation. I’m not keen on specifying that private package as a git dependency, because of having to worry about git credentials, but maybe there are other ways? Thanks! -- Best regards, Dmitry Demeshchuk.
