Many times, when I'm setting up a cluster, I have to use an operating system (as RedHat or CentOS 6.5) which has an old version of Python (Python 2.6). For example, when using a Hadoop distribution that only supports those operating systems (as Hortonworks' HDP or Cloudera).
And that also makes installing additional advanced Python packages difficult (such as Numpy, IPython, etc). Then I tend to use Anaconda Python, an open source version of Python with many of those packages pre-built and pre-installed. But installing Anaconda in each node of the cluster might be tedious. So I made a *simple script which helps installing Anaconda Python in the machines of a cluster *more easily. I wanted to share it here, in case it can help someone wanting using PySpark. https://github.com/tiangolo/anaconda_cluster_install *Sebastián Ramírez* Head of Software Development <http://www.senseta.com> ________________ Tel: (+571) 795 7950 ext: 1012 Cel: (+57) 300 370 77 10 Calle 73 No 7 - 06 Piso 4 Linkedin: co.linkedin.com/in/tiangolo/ Twitter: @tiangolo <https://twitter.com/tiangolo> Email: sebastian.rami...@senseta.com www.senseta.com -- *----------------------------------------------------* *This e-mail transmission, including any attachments, is intended only for the named recipient(s) and may contain information that is privileged, confidential and/or exempt from disclosure under applicable law. If you have received this transmission in error, or are not the named recipient(s), please notify Senseta immediately by return e-mail and permanently delete this transmission, including any attachments.*