In case anyone runs into this issue in the future, we got it working: the
following variable must be set on the edge node:
export
PYSPARK_PYTHON=/your/path/to/whatever/python/you/want/to/run/bin/python
I didn't realize that variable gets passed to every worker node. All I saw
when searching for
There is was a similar problem reported before on this list.
Weird python errors like this generally mean you have different
versions of python in the nodes of your cluster. Can you check that?
From error stack you use 2.7.10 |Anaconda 2.3.0
while OS/CDH version of Python is probably 2.6.
--
Is it possible that you have Python 2.7 on the driver, but Python 2.6
on the workers?.
PySpark requires that you have the same minor version of Python in
both driver and worker. In PySpark 1.4+, it will do this check before
run any tasks.
On Mon, Aug 10, 2015 at 2:53 PM, YaoPau
We did have 2.7 on the driver, 2.6 on the edge nodes and figured that was
the issue, so we've tried many combinations since then with all three of
2.6.6, 2.7.5, and Anaconda's 2.7.10 on each node with different PATHs and
PYTHONPATHs each time. Every combination has produced the same error.
We