Re: Hi all,

2017-11-04 Thread אורן שמון
Hi Jean, We prepare the data for all another jobs. We have a lot of jobs that schedule to different time but all of them need to read same raw data. On Fri, Nov 3, 2017 at 12:49 PM Jean Georges Perrin wrote: > Hi Oren, > > Why don’t you want to use a GroupBy? You can cache or checkpoint the > re

Re: pyspark configuration with Juyter

2017-11-04 Thread Marco Mistroni
Hi probably not what u r looking for but if u get stuck with conda jupyther and spark, if u get an account @ community.cloudera you will enjoy jupyther and spark out of the box Gd luck and hth Kr On Nov 4, 2017 4:59 PM, "makoto" wrote: > I setup environment variables in my ~/.bashrc as follows:

Re: pyspark configuration with Juyter

2017-11-04 Thread makoto
I setup environment variables in my ~/.bashrc as follows: export PYSPARK_PYTHON=/usr/local/oss/anaconda3/bin/python3.6 export PYTHONPATH=$(ls -a ${SPARK_HOME}/python/lib/py4j-*-src.zip):${SPARK_HOME}/python:$PYTHONPATH export PYSPARK_DRIVER_PYTHON=jupyter export PYSPARK_DRIVER_PYTHON_OPTS='noteboo