I have a rather simple avro schema to serialize Tweets (message, username,
timestamp).
Kryo and twitter chill are used to do so.
For my dev environment the Spark context is configured as below
val conf: SparkConf = new SparkConf()
conf.setAppName("kryo_test")
conf.setMaster(“local[4]")
conf.set(
Hey,
I am about to implement a spark app which will require to use both, pyspark and
spark on scala.
Data should be read from AWS S3 (compressed CSV files), and must be
pre-processed by an existing Python codebase. However, our final goal is to
make those datasets available for Spark apps writt
Hey,
I just tried to submit a task to my spark cluster using the following command
./spark/bin/spark-submit --py-files file:///root/abc.zip --master
spark://xxx.xxx.xxx.xxx:7077 test.py
It seems like the dependency I’ve added gets loaded:
14/08/05 23:07:00 INFO spark.SparkContext: Added file fi