Hey all, our team has an interesting problem. We have a set of Pig code we developed a few years ago that for various reasons I’d prefer not to convert over to pyspark immediately. I would like to share some UDF code between pig and pyspark for a little while. We can do this if we wrap our pure python functions with shim scripts for spark and pig. Where we ran into issues, was using specific python version / python libs in a virtualenv.
Does anyone know how to influence the python executable that will be called by Pig’s streaming_python? We know how to ship a python installation with virtualenv around the cluster with Oozie, so it is just a matter of figuring out now how to point Pig to run our UDF wrapper script using the venv’s executable, instead of to whatever is in /usr/bin/python on the datanodes. Will updates thread for posterity if we figure it. Thanks! Notes: Examples of streaming_python I see seem to be using python with dependencies installed directly on each node of the cluster. This would work, but definitely not how we want to distribute python code. For reasons, I don’t want to use STREAM. I got this working with STREAM operator, and it is trivial to do what I want using STREAM since you explicitly control the invocation of the script. But it is not really fun to lose out on all the features of streaming_python, and makes me sad having to manually join script results back to the parent relation every time I need to send a field into my python script. ****************************************************************************************** This communication constitutes an electronic communication within the meaning of the Electronic Communications Privacy Act [18 USC 2510] and it is intended to be received and read only by certain individuals for their sole use and benefit. It may contain information that is privileged or protected from disclosure by law. Receipt by anyone other than the intended recipient does not constitute a loss of the confidential or privileged nature of the communication. Any review or distribution by others is strictly prohibited. If it has been misdirected, or if you suspect you have received this in error, please notify me by replying and then delete both the message and reply. Thank you. ***Consider the environment before printing.*** ******************************************************************************************