NEW to spark and sparksql

Sam Flint Wed, 19 Nov 2014 13:03:37 -0800

Hi,

    I am new to spark.  I have began to read to understand sparks RDD files
as well as SparkSQL.  My question is more on how to build out the RDD files
and best practices.   I have data that is broken down by hour into files on
HDFS in avro format.   Do I need to create a separate RDD for each file? or
using SparkSQL a separate SchemaRDD?


I want to be able to pull lets say an entire day of data into spark and run
some analytics on it.  Then possibly a week, a month, etc.


If there is documentation on this procedure or best practives for building
RDD's please point me at them.

Thanks for your time,
   Sam

NEW to spark and sparksql

Reply via email to