So that means I have to pass that bash variable to the EMR clusters when I spin them up, not afterwards. I'll give that a go.
Thanks! Henry On Tue, Apr 4, 2017 at 7:49 AM, Eike von Seggern <eike.segg...@sevenval.com> wrote: > 2017-04-01 21:54 GMT+02:00 Paul Tremblay <paulhtremb...@gmail.com>: > >> When I try to to do a groupByKey() in my spark environment, I get the >> error described here: >> >> http://stackoverflow.com/questions/36798833/what-does-except >> ion-randomness-of-hash-of-string-should-be-disabled-via-pythonh >> >> In order to attempt to fix the problem, I set up my ipython environment >> with the additional line: >> >> PYTHONHASHSEED=1 >> >> When I fire up my ipython shell, and do: >> >> In [7]: hash("foo") >> Out[7]: -2457967226571033580 >> >> In [8]: hash("foo") >> Out[8]: -2457967226571033580 >> >> So my hash function is now seeded so it returns consistent values. But >> when I do a groupByKey(), I get the same error: >> >> >> Exception: Randomness of hash of string should be disabled via >> PYTHONHASHSEED >> >> Anyone know how to fix this problem in python 3.4? >> > > Independent of the python version, you have to ensure that Python on > spark-master and -workers is started with PYTHONHASHSEED set, e.g. by > adding it to the environment of the spark processes. > > Best > > Eike > -- Paul Henry Tremblay Robert Half Technology