Re: Multitenancy in Spark - within/across spark context

2014-10-23 Thread Evan Chan
Ashwin, I would say the strategies in general are: 1) Have each user submit separate Spark app (each its own Spark Context), with its own resource settings, and share data through HDFS or something like Tachyon for speed. 2) Share a single spark context amongst multiple users, using fair schedul

Re: label points with a given index

2014-10-23 Thread Lochana Menikarachchi
Figured constructor can be used for this purpose.. On 10/24/14 7:57 AM, Lochana Menikarachchi wrote: SparkConf conf = new SparkConf().setAppName("LogisticRegression").setMaster("local[4]"); JavaSparkContext sc = new JavaSparkContext(conf); JavaRDD lines = sc.textFile("some.csv");

label points with a given index

2014-10-23 Thread Lochana Menikarachchi
SparkConf conf = new SparkConf().setAppName("LogisticRegression").setMaster("local[4]"); JavaSparkContext sc = new JavaSparkContext(conf); JavaRDD lines = sc.textFile("some.csv"); JavaRDD lPoints = lines.map(new CSVLineParser()); Is there anyway to parse an index to a function.

Re: scalastyle annoys me a little bit

2014-10-23 Thread Ted Yu
Created SPARK-4066 and attached patch there. On Thu, Oct 23, 2014 at 1:07 PM, Koert Kuipers wrote: > great thanks i will do that > > On Thu, Oct 23, 2014 at 3:55 PM, Ted Yu wrote: > >> Koert: >> If you have time, you can try this diff - with which you would be able to >> specify the following o

Re: scalastyle annoys me a little bit

2014-10-23 Thread Koert Kuipers
great thanks i will do that On Thu, Oct 23, 2014 at 3:55 PM, Ted Yu wrote: > Koert: > If you have time, you can try this diff - with which you would be able to > specify the following on the command line: > -Dscalastyle.failonviolation=false > > diff --git a/pom.xml b/pom.xml > index 687cc63..10

Re: scalastyle annoys me a little bit

2014-10-23 Thread Ted Yu
Koert: If you have time, you can try this diff - with which you would be able to specify the following on the command line: -Dscalastyle.failonviolation=false diff --git a/pom.xml b/pom.xml index 687cc63..108585e 100644 --- a/pom.xml +++ b/pom.xml @@ -123,6 +123,7 @@ 1.2.17 1.0.4 2.

Re: PR for Hierarchical Clustering Needs Review

2014-10-23 Thread Xiangrui Meng
Hi RJ, We are close to the v1.2 feature freeze deadline, so I'm busy with the pipeline feature and couple bugs. I will ask other developers to help review the PR. Thanks for working with Yu and helping the code review! Best, Xiangrui On Thu, Oct 23, 2014 at 2:58 AM, RJ Nowling wrote: > Hi all,

Re: scalastyle annoys me a little bit

2014-10-23 Thread Koert Kuipers
Hey Ted, i tried: mvn clean package -DskipTests -Dscalastyle.failOnViolation=false no luck, still get [ERROR] Failed to execute goal org.scalastyle:scalastyle-maven-plugin:0.4.0:check (default) on project spark-core_2.10: Failed during scalastyle execution: You have 3 Scalastyle violation(s). -> [

Spark 1.2 feature freeze on November 1

2014-10-23 Thread Patrick Wendell
Hey All, Just a reminder that as planned [1] we'll go into a feature freeze on November 1. On that date I'll cut a 1.2 release branch and make the up-or-down call on any patches that go into that branch, along with individual committers. It is common for us to receive a very large volume of patch

Re: scalastyle annoys me a little bit

2014-10-23 Thread Ted Yu
Koert: Have you tried adding the following on your commandline ? -Dscalastyle.failOnViolation=false Cheers On Thu, Oct 23, 2014 at 11:07 AM, Patrick Wendell wrote: > Hey Koert, > > I think disabling the style checks in maven package could be a good > idea for the reason you point out. I was so

Re: scalastyle annoys me a little bit

2014-10-23 Thread Marcelo Vanzin
I know this is all very subjective, but I find long lines difficult to read. I also like how 100 characters fit in my editor setup fine (split wide screen), while a longer line length would mean I can't have two buffers side-by-side without horizontal scrollbars. I think it's fine to add a switch

Re: scalastyle annoys me a little bit

2014-10-23 Thread Patrick Wendell
Hey Koert, I think disabling the style checks in maven package could be a good idea for the reason you point out. I was sort of mixed on that when it was proposed for this exact reason. It's just annoying to developers. In terms of changing the global limit, this is more religion than anything el

scalastyle annoys me a little bit

2014-10-23 Thread Koert Kuipers
100 max width seems very restrictive to me. even the most restrictive environment i have for development (ssh with emacs) i get a lot more characters to work with than that. personally i find the code harder to read, not easier. like i kept wondering why there are weird newlines in the middle of

Re: Multitenancy in Spark - within/across spark context

2014-10-23 Thread Marcelo Vanzin
You may want to take a look at https://issues.apache.org/jira/browse/SPARK-3174. On Thu, Oct 23, 2014 at 2:56 AM, Jianshi Huang wrote: > Upvote for the multitanency requirement. > > I'm also building a data analytic platform and there'll be multiple users > running queries and computations simult

Receiver/DStream storage level

2014-10-23 Thread Michael Allman
I'm implementing a custom ReceiverInputDStream and I'm not sure how to initialize the Receiver with the storage level. The storage level is set on the DStream, but there doesn't seem to be a way to pass it to the Receiver. At the same time, setting the storage level separately on the Receiver se

Re: reading/writing parquet decimal type

2014-10-23 Thread Michael Allman
Hi Matei, Another thing occurred to me. Will the binary format you're writing sort the data in numeric order? Or would the decimals have to be decoded for comparison? Cheers, Michael > On Oct 12, 2014, at 10:48 PM, Matei Zaharia wrote: > > The fixed-length binary type can hold fewer bytes t

Re: Exception while running unit tests that makes use of local-cluster mode

2014-10-23 Thread Varadharajan Mukundan
Hi All, I just figured it out that it fails whenever its run from Intellij. I think it relates to the classpath issues mentioned in https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark#ContributingtoSpark-ScalaTestIssues . -- Thanks, M. Varadharajan --

Memory

2014-10-23 Thread Tom Hubregtsen
Hi all, I would like to validate my understanding of memory regions in Spark. Any comments on my description below would be appreciated! Execution is split up into stages, based on wide dependencies between RDDs and actions such as save. All transformations involving narrow dependencies before th

PR for Hierarchical Clustering Needs Review

2014-10-23 Thread RJ Nowling
Hi all, A few months ago, I collected feedback on what the community was looking for in clustering methods. A number of the community members requested a divisive hierarchical clustering method. Yu Ishikawa has stepped up to implement such a method. I've been working with him to communicate wha

Re: Multitenancy in Spark - within/across spark context

2014-10-23 Thread Jianshi Huang
Upvote for the multitanency requirement. I'm also building a data analytic platform and there'll be multiple users running queries and computations simultaneously. One of the paint point is control of resource size. Users don't really know how much nodes they need, they always use as much as possi