Re: Job initialization performance of Spark standalone mode vs YARN

2014-03-06 Thread polkosity
We're not using Ooyala's job server. We are holding the spark context for reuse within our own REST server (with a service to run each job). Our low-latency job now reads all its data from a memory cached RDD, instead of from HDFS seq file (upstream jobs cache resultant RDDs for downstream jobs t

Re: Job initialization performance of Spark standalone mode vs YARN

2014-03-05 Thread polkosity
After changing to reuse spark context and cache RDDs in memory, performance is 4 times better. We didn't expect that much of an improvement! -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Job-initialization-performance-of-Spark-standalone-mode-vs-YARN-tp20

Re: Job initialization performance of Spark standalone mode vs YARN

2014-03-03 Thread polkosity
Thats exciting! Will be looking into that, thanks Andrew. Related topic, has anyone had any experience running Spark on Tachyon in-memory filesystem, and could offer their views on using it? -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Job-initializati

Re: Job initialization performance of Spark standalone mode vs YARN

2014-03-03 Thread polkosity
We're thinking of creating a Spark job server with a REST API, which would enable us (as well as managing jobs) to re-use the spark context as you suggest. Thanks Koert! -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Job-initialization-performance-of-Spark

Re: Job initialization performance of Spark standalone mode vs YARN

2014-03-02 Thread polkosity
Thanks for the advice Mayur. I thought I'd report back on the performance difference... Spark standalone mode has executors processing at capacity in under a second :) -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Job-initialization-performance-of-Spark-

Spark performance optimization

2014-02-24 Thread polkosity
As mentioned in a previous post, I have an application which relies on a quick response. The application matches a client's image against a set of stored images. Image features are stored in a SequenceFile and passed over JNI to match in OpenCV, along with the features for the client's image. An

Job initialization performance of Spark standalone mode vs YARN

2014-02-24 Thread polkosity
Is there any difference in the performance of Spark standalone mode and YARN when it comes to initializing a new Spark job? In my application, response time is absolutely critical, and I'm hoping to have the executors working within a few seconds of submitting the job. Both options ran quickly