Re: Standardized Spark dev environment

2015-01-20 Thread Sean Owen
If the goal is a reproducible test environment then I think that is what Jenkins is. Granted you can only ask it for a test. But presumably you get the same result if you start from the same VM image as Jenkins and run the same steps. I bet it is not hard to set up and maintain. I bet it is easier

Re: Standardized Spark dev environment

2015-01-20 Thread Patrick Wendell
To respond to the original suggestion by Nick. I always thought it would be useful to have a Docker image on which we run the tests and build releases, so that we could have a consistent environment that other packagers or people trying to exhaustively run Spark tests could replicate (or at least l

R: Standardized Spark dev environment

2015-01-20 Thread Paolo Platter
Hi all, I also tried the docker way and it works well. I suggest to look at sequenceiq/spark dockers, they are very active on that field. Paolo Inviata dal mio Windows Phone Da: jay vyas Inviato: ‎21/‎01/‎2015 04:45 A: Nicholas

KNN for large data set

2015-01-20 Thread DEVAN M.S.
Hi all, Please help me to find out best way for K-nearest neighbor using spark for large data sets.

Re: not found: type LocalSparkContext

2015-01-20 Thread Reynold Xin
You don't need the LocalSparkContext. It is only for Spark's own unit test. You can just create a SparkContext and use it in your unit tests, e.g. val sc = new SparkContext("local", "my test app", new SparkConf) On Tue, Jan 20, 2015 at 7:27 PM, James wrote: > I could not correctly import org.a

Re: GraphX ShortestPaths backwards?

2015-01-20 Thread Michael Malak
I created https://issues.apache.org/jira/browse/SPARK-5343 for this. - Original Message - From: Michael Malak To: "dev@spark.apache.org" Cc: Sent: Monday, January 19, 2015 5:09 PM Subject: GraphX ShortestPaths backwards? GraphX ShortestPaths seems to be following edges backwards inste

Re: Standardized Spark dev environment

2015-01-20 Thread jay vyas
I can comment on both... hi will and nate :) 1) Will's Dockerfile solution is the most simple direct solution to the dev environment question : its a efficient way to build and develop spark environments for dev/test.. It would be cool to put that Dockerfile (and/or maybe a shell script which

Re: not found: type LocalSparkContext

2015-01-20 Thread James
I could not correctly import org.apache.spark.LocalSparkContext, I use sbt on Intellij for developing,here is my build sbt. ``` libraryDependencies += "org.apache.spark" %% "spark-core" % "1.2.0" libraryDependencies += "org.apache.spark" %% "spark-graphx" % "1.2.0" libraryDependencies += "com.c

Re: Standardized Spark dev environment

2015-01-20 Thread Nicholas Chammas
How many profiles (hadoop / hive /scala) would this development environment support ? As many as we want. We probably want to cover a good chunk of the build matrix that Spark officially supports. What does this provide, concretely? It provides

Re: Standardized Spark dev environment

2015-01-20 Thread Will Benton
Hey Nick, I did something similar with a Docker image last summer; I haven't updated the images to cache the dependencies for the current Spark master, but it would be trivial to do so: http://chapeau.freevariable.com/2014/08/jvm-test-docker.html best, wb - Original Message - > From

RE: Standardized Spark dev environment

2015-01-20 Thread nate
If there is some interest in more standardization and setup of dev/test environments spark community might be interested in starting to participate in apache bigtop effort: http://bigtop.apache.org/ While the project had its start and initial focus on packaging, testing, deploying Hadoop/hdfs

Re: Standardized Spark dev environment

2015-01-20 Thread Sean Owen
My concern would mostly be maintenance. It adds to an already very complex build. It only assists developers who are a small audience. What does this provide, concretely? On Jan 21, 2015 12:14 AM, "Nicholas Chammas" wrote: > What do y'all think of creating a standardized Spark development > envir

Re: Standardized Spark dev environment

2015-01-20 Thread Ted Yu
How many profiles (hadoop / hive /scala) would this development environment support ? Cheers On Tue, Jan 20, 2015 at 4:13 PM, Nicholas Chammas < nicholas.cham...@gmail.com> wrote: > What do y'all think of creating a standardized Spark development > environment, perhaps encoded as a Vagrantfile,

Re: Standardized Spark dev environment

2015-01-20 Thread shenyan zhen
Great suggestion. On Jan 20, 2015 7:14 PM, "Nicholas Chammas" wrote: > What do y'all think of creating a standardized Spark development > environment, perhaps encoded as a Vagrantfile, and publishing it under > `dev/`? > > The goal would be to make it easier for new developers to get started with

Standardized Spark dev environment

2015-01-20 Thread Nicholas Chammas
What do y'all think of creating a standardized Spark development environment, perhaps encoded as a Vagrantfile, and publishing it under `dev/`? The goal would be to make it easier for new developers to get started with all the right configs and tools pre-installed. If we use something like Vagran

Re: Spectral clustering

2015-01-20 Thread Andrew Musselman
Awesome, thanks On Tue, Jan 20, 2015 at 12:56 PM, Xiangrui Meng wrote: > Fan and Stephen (cc'ed) are working on this feature. They will update > the JIRA page and report progress soon. -Xiangrui > > On Fri, Jan 16, 2015 at 12:04 PM, Andrew Musselman > wrote: > > Hi, thinking of picking up this

Re: Spectral clustering

2015-01-20 Thread Xiangrui Meng
Fan and Stephen (cc'ed) are working on this feature. They will update the JIRA page and report progress soon. -Xiangrui On Fri, Jan 16, 2015 at 12:04 PM, Andrew Musselman wrote: > Hi, thinking of picking up this Jira ticket: > https://issues.apache.org/jira/browse/SPARK-4259 > > Anyone done any w

Re: Is there any way to support multiple users executing SQL on thrift server?

2015-01-20 Thread Cheng Lian
Hey Yi, I'm quite unfamiliar with Hadoop/HDFS auth mechanisms for now, but would like to investigate this issue later. Would you please open an JIRA for it? Thanks! Cheng On 1/19/15 1:00 AM, Yi Tian wrote: Is there any way to support multiple users executing SQL on one thrift server? I

Re: Spark client reconnect to driver in yarn-cluster deployment mode

2015-01-20 Thread Andrew Or
Hi Preeze, > Is there any designed way that the client connects back to the driver (still running in YARN) for collecting results at a later stage? No, there is not support built into Spark for this. For this to happen seamlessly the driver will have to start a server (pull model) or send the res

Re: not found: type LocalSparkContext

2015-01-20 Thread Will Benton
It's declared here: https://github.com/apache/spark/blob/master/core/src/test/scala/org/apache/spark/LocalSparkContext.scala I assume you're already importing LocalSparkContext, but since the test classes aren't included in Spark packages, you'll also need to package them up in order to use

not found: type LocalSparkContext

2015-01-20 Thread James
Hi all, When I was trying to write a test on my spark application I met ``` Error:(14, 43) not found: type LocalSparkContext class HyperANFSuite extends FunSuite with LocalSparkContext { ``` At the source code of spark-core I could not found "LocalSparkContext", thus I wonder how to write a test

Re: Will Spark-SQL support vectorized query engine someday?

2015-01-20 Thread Reynold Xin
I don't know if there is a list, but in general running performance profiler can identify a lot of things... On Tue, Jan 20, 2015 at 12:30 AM, Xuelin Cao wrote: > > Thanks, Reynold > > Regarding the "lower hanging fruits", can you give me some example? > Where can I find them in JIRA? > >

Re: Will Spark-SQL support vectorized query engine someday?

2015-01-20 Thread Xuelin Cao
Thanks, Reynold Regarding the "lower hanging fruits", can you give me some example? Where can I find them in JIRA? On Tue, Jan 20, 2015 at 3:55 PM, Reynold Xin wrote: > It will probably eventually make its way into part of the query engine, > one way or another. Note that there are in ge