Ashwin,
I would say the strategies in general are:
1) Have each user submit separate Spark app (each its own Spark
Context), with its own resource settings, and share data through HDFS
or something like Tachyon for speed.
2) Share a single spark context amongst multiple users, using fair
schedul
Figured constructor can be used for this purpose..
On 10/24/14 7:57 AM, Lochana Menikarachchi wrote:
SparkConf conf = new
SparkConf().setAppName("LogisticRegression").setMaster("local[4]");
JavaSparkContext sc = new JavaSparkContext(conf);
JavaRDD lines = sc.textFile("some.csv");
SparkConf conf = new
SparkConf().setAppName("LogisticRegression").setMaster("local[4]");
JavaSparkContext sc = new JavaSparkContext(conf);
JavaRDD lines = sc.textFile("some.csv");
JavaRDD lPoints = lines.map(new CSVLineParser());
Is there anyway to parse an index to a function.
Created SPARK-4066 and attached patch there.
On Thu, Oct 23, 2014 at 1:07 PM, Koert Kuipers wrote:
> great thanks i will do that
>
> On Thu, Oct 23, 2014 at 3:55 PM, Ted Yu wrote:
>
>> Koert:
>> If you have time, you can try this diff - with which you would be able to
>> specify the following o
great thanks i will do that
On Thu, Oct 23, 2014 at 3:55 PM, Ted Yu wrote:
> Koert:
> If you have time, you can try this diff - with which you would be able to
> specify the following on the command line:
> -Dscalastyle.failonviolation=false
>
> diff --git a/pom.xml b/pom.xml
> index 687cc63..10
Koert:
If you have time, you can try this diff - with which you would be able to
specify the following on the command line:
-Dscalastyle.failonviolation=false
diff --git a/pom.xml b/pom.xml
index 687cc63..108585e 100644
--- a/pom.xml
+++ b/pom.xml
@@ -123,6 +123,7 @@
1.2.17
1.0.4
2.
Hi RJ,
We are close to the v1.2 feature freeze deadline, so I'm busy with the
pipeline feature and couple bugs. I will ask other developers to help
review the PR. Thanks for working with Yu and helping the code review!
Best,
Xiangrui
On Thu, Oct 23, 2014 at 2:58 AM, RJ Nowling wrote:
> Hi all,
Hey Ted,
i tried:
mvn clean package -DskipTests -Dscalastyle.failOnViolation=false
no luck, still get
[ERROR] Failed to execute goal
org.scalastyle:scalastyle-maven-plugin:0.4.0:check (default) on project
spark-core_2.10: Failed during scalastyle execution: You have 3 Scalastyle
violation(s). -> [
Hey All,
Just a reminder that as planned [1] we'll go into a feature freeze on
November 1. On that date I'll cut a 1.2 release branch and make the
up-or-down call on any patches that go into that branch, along with
individual committers.
It is common for us to receive a very large volume of patch
Koert:
Have you tried adding the following on your commandline ?
-Dscalastyle.failOnViolation=false
Cheers
On Thu, Oct 23, 2014 at 11:07 AM, Patrick Wendell
wrote:
> Hey Koert,
>
> I think disabling the style checks in maven package could be a good
> idea for the reason you point out. I was so
I know this is all very subjective, but I find long lines difficult to read.
I also like how 100 characters fit in my editor setup fine (split wide
screen), while a longer line length would mean I can't have two
buffers side-by-side without horizontal scrollbars.
I think it's fine to add a switch
Hey Koert,
I think disabling the style checks in maven package could be a good
idea for the reason you point out. I was sort of mixed on that when it
was proposed for this exact reason. It's just annoying to developers.
In terms of changing the global limit, this is more religion than
anything el
100 max width seems very restrictive to me.
even the most restrictive environment i have for development (ssh with
emacs) i get a lot more characters to work with than that.
personally i find the code harder to read, not easier. like i kept
wondering why there are weird newlines in the
middle of
You may want to take a look at https://issues.apache.org/jira/browse/SPARK-3174.
On Thu, Oct 23, 2014 at 2:56 AM, Jianshi Huang wrote:
> Upvote for the multitanency requirement.
>
> I'm also building a data analytic platform and there'll be multiple users
> running queries and computations simult
I'm implementing a custom ReceiverInputDStream and I'm not sure how to
initialize the Receiver with the storage level. The storage level is set on the
DStream, but there doesn't seem to be a way to pass it to the Receiver. At the
same time, setting the storage level separately on the Receiver se
Hi Matei,
Another thing occurred to me. Will the binary format you're writing sort the
data in numeric order? Or would the decimals have to be decoded for comparison?
Cheers,
Michael
> On Oct 12, 2014, at 10:48 PM, Matei Zaharia wrote:
>
> The fixed-length binary type can hold fewer bytes t
Hi All,
I just figured it out that it fails whenever its run from Intellij. I
think it relates to the classpath issues mentioned in
https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark#ContributingtoSpark-ScalaTestIssues
.
--
Thanks,
M. Varadharajan
--
Hi all,
I would like to validate my understanding of memory regions in Spark. Any
comments on my description below would be appreciated!
Execution is split up into stages, based on wide dependencies between RDDs
and actions such as save. All transformations involving narrow dependencies
before th
Hi all,
A few months ago, I collected feedback on what the community was looking
for in clustering methods. A number of the community members requested a
divisive hierarchical clustering method.
Yu Ishikawa has stepped up to implement such a method. I've been working
with him to communicate wha
Upvote for the multitanency requirement.
I'm also building a data analytic platform and there'll be multiple users
running queries and computations simultaneously. One of the paint point is
control of resource size. Users don't really know how much nodes they need,
they always use as much as possi
20 matches
Mail list logo