On Fri, Aug 26, 2016 at 10:54 PM, Benjamin Kim <[email protected]> wrote:
> // Create a text file stream on an S3 bucket
> val csv = ssc.textFileStream("s3a://" + awsS3BucketName + "/")
>
> csv.foreachRDD(rdd => {
> if (!rdd.partitions.isEmpty) {
> // process data
> }
> })
Hi Benjamin,
I hardly remember the case now but I'd recommend to move the above
snippet *after* you getOrCreate context, i.e.
> // Get streaming context from checkpoint data or create a new one
> val context = StreamingContext.getOrCreate(checkpoint,
> () => createContext(interval, checkpoint, bucket, database,
> table, partitionBy))
Please the first snippet here so you only create the pipeline after
you get context.
I might be mistaken, too. Sorry.
Pozdrawiam,
Jacek Laskowski
----
https://medium.com/@jaceklaskowski/
Mastering Apache Spark 2.0 http://bit.ly/mastering-apache-spark
Follow me at https://twitter.com/jaceklaskowski
---------------------------------------------------------------------
To unsubscribe e-mail: [email protected]