Is it practical to maintain a hardware context on each of the worker hosts in 
Spark?  In my particular problem I have an OpenCL (or JavaCL) context which has 
two things associated with it:
  - Data stored on a GPU
  - Code compiled for the GPU
If the context goes away, the data is lost and the code must be recompiled.

The code calling in is quite basic and is intended to be used in batch and 
streaming modes.  Here is the batch version:

object Classify {
  def run(sparkContext: SparkContext, config: com.infoblox.Config) {
    val subjects = Subject.load(sparkContext, config)
    val classifications = subjects.mapPartitions(subjectIter => 
classify(config.gpu, subjectIter)).reduceByKey(_ + _)
    classifications.saveAsTextFile(config.output)
  }

  private def classify(gpu: Option[String], subjects: Iterator[Subject]): 
Iterator[(String, Long)] = {
    val javaCLContext = JavaCLContext.build(gpu)      // <--
    val classifier = Classifier.build(javaCLContext)  // <-- 
    subjects.foreach(subject => classifier.classifyInBatches(subject))
    classifier.classifyRemaining
    val results = classifier.results
    classifier.release
    results.result.iterator
  }
}

The two lines with <-- on them are where the JavaCL/OpenCL context is currently 
created and used, and which is wrong.  The JavaCL context is specific to the 
host, not the map.  How do I keep this context between maps, and over a longer 
duration for a streaming job?

Thanks,
Chris...

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Reply via email to