Is it practical to maintain a hardware context on each of the worker hosts in Spark? In my particular problem I have an OpenCL (or JavaCL) context which has two things associated with it: - Data stored on a GPU - Code compiled for the GPU If the context goes away, the data is lost and the code must be recompiled.
The code calling in is quite basic and is intended to be used in batch and streaming modes. Here is the batch version: object Classify { def run(sparkContext: SparkContext, config: com.infoblox.Config) { val subjects = Subject.load(sparkContext, config) val classifications = subjects.mapPartitions(subjectIter => classify(config.gpu, subjectIter)).reduceByKey(_ + _) classifications.saveAsTextFile(config.output) } private def classify(gpu: Option[String], subjects: Iterator[Subject]): Iterator[(String, Long)] = { val javaCLContext = JavaCLContext.build(gpu) // <-- val classifier = Classifier.build(javaCLContext) // <-- subjects.foreach(subject => classifier.classifyInBatches(subject)) classifier.classifyRemaining val results = classifier.results classifier.release results.result.iterator } } The two lines with <-- on them are where the JavaCL/OpenCL context is currently created and used, and which is wrong. The JavaCL context is specific to the host, not the map. How do I keep this context between maps, and over a longer duration for a streaming job? Thanks, Chris... --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org