Programmatically launch several hundred Spark Streams in parallel

Brandon White Fri, 24 Jul 2015 09:23:58 -0700

Hello,

So I have about 500 Spark Streams and I want to know the fastest and most
reliable way to process each of them. Right now, I am creating and process
them in a list:


val ssc = new StreamingContext(sc, Minutes(10))


val streams = paths.par.map { nameAndPath =>
  (path._1, ssc.textFileStream(path._1))
}

streams.par.foreach { nameAndStream =>
  streamTuple.foreachRDD { rdd =>
    df = sqlContext.jsonRDD(rdd)

    df.insertInto(stream._1)
  }
}

ssc.start()



Is this the best way to do this? Are there any better faster methods?

Programmatically launch several hundred Spark Streams in parallel

Reply via email to