Hi Jean,
We prepare the data for all another jobs. We have a lot of jobs that
schedule to different time but all of them need to read same raw data.
On Fri, Nov 3, 2017 at 12:49 PM Jean Georges Perrin
wrote:
> Hi Oren,
>
> Why don’t you want to use a GroupBy? You can cache or checkpoint the
> re
Hi probably not what u r looking for but if u get stuck with conda jupyther
and spark, if u get an account @ community.cloudera you will enjoy jupyther
and spark out of the box
Gd luck and hth
Kr
On Nov 4, 2017 4:59 PM, "makoto" wrote:
> I setup environment variables in my ~/.bashrc as follows:
I setup environment variables in my ~/.bashrc as follows:
export PYSPARK_PYTHON=/usr/local/oss/anaconda3/bin/python3.6
export PYTHONPATH=$(ls -a
${SPARK_HOME}/python/lib/py4j-*-src.zip):${SPARK_HOME}/python:$PYTHONPATH
export PYSPARK_DRIVER_PYTHON=jupyter
export PYSPARK_DRIVER_PYTHON_OPTS='noteboo