Is it practical to maintain a hardware context on each of the worker hosts in
Spark? In my particular problem I have an OpenCL (or JavaCL) context which has
two things associated with it:
- Data stored on a GPU
- Code compiled for the GPU
If the context goes away, the data is lost and the code must be recompiled.
The code calling in is quite basic and is intended to be used in batch and
streaming modes. Here is the batch version:
object Classify {
def run(sparkContext: SparkContext, config: com.infoblox.Config) {
val subjects = Subject.load(sparkContext, config)
val classifications = subjects.mapPartitions(subjectIter =>
classify(config.gpu, subjectIter)).reduceByKey(_ + _)
classifications.saveAsTextFile(config.output)
}
private def classify(gpu: Option[String], subjects: Iterator[Subject]):
Iterator[(String, Long)] = {
val javaCLContext = JavaCLContext.build(gpu) // <--
val classifier = Classifier.build(javaCLContext) // <--
subjects.foreach(subject => classifier.classifyInBatches(subject))
classifier.classifyRemaining
val results = classifier.results
classifier.release
results.result.iterator
}
}
The two lines with <-- on them are where the JavaCL/OpenCL context is currently
created and used, and which is wrong. The JavaCL context is specific to the
host, not the map. How do I keep this context between maps, and over a longer
duration for a streaming job?
Thanks,
Chris...
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]