Re: Signal/Noise Ratio

2014-02-21 Thread Reynold Xin
FYI I submitted an ASF INFRA ticket on granting the AMPLab Jenkins permission to use the github commit status API. If that goes through, we can configure Jenkins to use the commit status API without leaving comments on the pull requests. https://issues.apache.org/jira/browse/INFRA-7367 On Fri,

Re: coding style discussion: explicit return type in public APIs

2014-02-19 Thread Reynold Xin
to specify exposed interfaces, imo, > outweighs potential gains in breveity of code. > Btw this is a degenerate contrieved example already stretching its use ... > > Regards > Mridul > > Regards > Mridul > On Feb 19, 2014 1:49 PM, "Reynold Xin" wrote: > > >

Re: coding style discussion: explicit return type in public APIs

2014-02-19 Thread Reynold Xin
hristopher T. Nguyen > > Co-founder & CEO, Adatao <http://adatao.com> > > linkedin.com/in/ctnguyen > > > > > > > > On Tue, Feb 18, 2014 at 10:40 PM, Reynold Xin > wrote: > > > >> +1 Christopher's suggestion. > >> > >> M

Re: coding style discussion: explicit return type in public APIs

2014-02-18 Thread Reynold Xin
ll > cause signature change. > > > Regards, > Mridul > > > On Wed, Feb 19, 2014 at 11:52 AM, Reynold Xin wrote: > > Hi guys, > > > > Want to bring to the table this issue to see what other members of the > > community think and then we can codify it

Re: coding style discussion: explicit return type in public APIs

2014-02-18 Thread Reynold Xin
Case 2 should probably be expanded to cover most primitive types. On Tue, Feb 18, 2014 at 10:22 PM, Reynold Xin wrote: > Hi guys, > > Want to bring to the table this issue to see what other members of the > community think and then we can codify it in the Spark coding style guide.

coding style discussion: explicit return type in public APIs

2014-02-18 Thread Reynold Xin
Hi guys, Want to bring to the table this issue to see what other members of the community think and then we can codify it in the Spark coding style guide. The topic is about declaring return types explicitly in public APIs. In general I think we should favor explicit type declaration in public AP

Re: Fast Serialization

2014-02-13 Thread Reynold Xin
The perf difference between that and Kryo is pretty small according to their own benchmark. However, if they can provide better compatibility than Kryo, we should definitely give it a shot! Would you like to do some testing? On Thu, Feb 13, 2014 at 12:27 AM, Evan Chan wrote: > Any interest in

Re: Could someone with karma to add my userid hsaputra so I could assign issue in https://spark-project.atlassian.net?

2014-02-11 Thread Reynold Xin
I added you to the dev list on jira for spark. On Tue, Feb 11, 2014 at 2:58 PM, Henry Saputra wrote: > Hi Guys, > > With ASF JIRA still in transfer mode, could someone with permission to > add my userid "hsaputra" in https://spark-project.atlassian.net so I > could assign issues and resolve them

Re: [VOTE] Graduation of Apache Spark from the Incubator

2014-02-10 Thread Reynold Xin
and be it further > >> RESOLVED, that the persons listed immediately below be and > >> hereby are appointed to serve as the initial members of the > >> Apache Spark Project: > >> > >> * Mosharaf Chowdhury > >> * Jason Dai > >> * Tath

Re: [VOTE] Graduation of Apache Spark from the Incubator

2014-02-10 Thread Reynold Xin
t; > * Tathagata Das > > * Ankur Dave > > * Aaron Davidson > > * Thomas Dudziak > > * Robert Evans > > * Thomas Graves > > * Andy Konwinski > > * Stephen Haberman > > * Mark Hamstra > > * Shane Huang > > * Ryan LeCompte >

Re: Proposal: Clarifying minor points of Scala style

2014-02-10 Thread Reynold Xin
+1 on both On Mon, Feb 10, 2014 at 1:34 AM, Aaron Davidson wrote: > There are a few bits of the Scala style that are underspecified by > both the Scala > style guide and our own supplemental > notes< > https://cwiki.apache.org/confluence/display/SPARK/Spark+C

Re: [GitHub] incubator-spark pull request: Improved NetworkReceiver in Spark St...

2014-02-07 Thread Reynold Xin
test On Fri, Feb 7, 2014 at 3:23 PM, AmplabJenkins wrote: > Github user AmplabJenkins commented on the pull request: > > > https://github.com/apache/incubator-spark/pull/559#issuecomment-34518332 > > All automated tests passed. > Refer to this link for build results: > https://amplab.cs

Re: [GitHub] incubator-spark pull request:

2014-02-07 Thread Reynold Xin
> > > > -Original Message- > > > From: Andrew Ash mailto:and...@andrewash.com)> > > > Reply-To: "dev@spark.incubator.apache.org (mailto: > dev@spark.incubator.apache.org)" dev@spark.incubator.apache.org)> > > > Date: Friday, February

Re: [GitHub] incubator-spark pull request:

2014-02-07 Thread Reynold Xin
I concur wholeheartedly ... On Fri, Feb 7, 2014 at 4:55 PM, Dean Wampler wrote: > This SPAM is not doing anyone any good. How about another mailing list for > people who want to see this? > > Sent from my rotary phone. > > > > On Feb 7, 2014, at 10:33 AM, mridulm wrote: > > > > Github user mri

Re: Discussion on strategy or roadmap should happen on dev@ list

2014-02-06 Thread Reynold Xin
We can try it on dev, but I personally find the JIRA notifications pretty spammy ... It will clutter the dev list, and make it harder to search for useful information here. On Thu, Feb 6, 2014 at 6:27 PM, Matei Zaharia wrote: > Henry (or anyone else), do you have any preference on sending these

Re: Is there any way to make a quick test on some pre-commit code?

2014-02-06 Thread Reynold Xin
You can do sbt/sbt assemble-deps and then just run sbt/sbt package each time. You can even do sbt/sbt ~package for automatic incremental compilation. On Thu, Feb 6, 2014 at 4:46 PM, Nan Zhu wrote: > Hi, all > > Is it always necessary to run sbt assembly when you want to test some code

Re: Proposal for Spark Release Strategy

2014-02-06 Thread Reynold Xin
+1 for 1.0 The point of 1.0 is for us to self-enforce API compatibility in the context of longer term support. If we continue down the 0.xx road, we will always have excuse for breaking APIs. That said, a major focus of 0.9 and some of the work that are happening for 1.0 (e.g. configuration, Java

Re: [0.9.0] Possible deadlock in shutdown hook?

2014-02-06 Thread Reynold Xin
Is it safe if we interrupt the running thread during shutdown? On Thu, Feb 6, 2014 at 3:27 AM, Andrew Ash wrote: > Per the book Java Concurrency in Practice the already-running threads > continue running while the shutdown hooks run. So I think the race between > the writing thread and the d

Re: Not closing the merged PRs anymore from Spark github mirror?

2014-02-03 Thread Reynold Xin
It was a transient thing. There's a script that we are using to automatically fetch diffs from a PR and apply the diff against the git repo. Patrick changed the way it works last week, and a regression there was PRs are no longer closed automatically. I believe he has fixed it. Patrick will also w

Re: [VOTE] Release Apache Spark 0.9.0-incubating (rc5)

2014-01-30 Thread Reynold Xin
Thanks. That does go out of the scope of the Spark release. The EC2 script starts instances and use some scripts to setup this version. For that to work, we need to have a release first. On Thu, Jan 30, 2014 at 11:47 AM, bkrouse wrote: > I just tried the EC2 scripts as a part of this rc5, and i

Re: Problems while moving from 0.8.0 to 0.8.1

2014-01-27 Thread Reynold Xin
Do you mind pasting the whole stack trace for the NPE? On Mon, Jan 27, 2014 at 6:44 AM, Archit Thakur wrote: > Hi, > > Implementation of aggregation logic has been changed with 0.8.1 > (Aggregator.scala) > > It is now using AppendOnlyMap as compared to java.util.HashMap in 0.8.0 > release. > >

Re: [VOTE] Release Apache Spark 0.9.0-incubating (rc5)

2014-01-26 Thread Reynold Xin
7;m wondering if anyone else has seen this problem or I'm missing something > to run the test correctly. > > Thanks, > Taka > > > > > On Sat, Jan 25, 2014 at 5:00 PM, Sean McNamara > wrote: > > > +1 > > > > On 1/25/14, 4:04 PM, &q

Re: GroupByKey implementation.

2014-01-26 Thread Reynold Xin
While I echo Mark's sentiment, versioning has nothing to do with this problem. It has been the case even in Spark 0.8.0. Note that mapSideCombine is turned off for groupByKey, so there is no need to merge any combiners. On Sun, Jan 26, 2014 at 12:22 PM, Archit Thakur wrote: > Hi, > > Below is t

Re: Any suggestion about JIRA 1006 "MLlib ALS gets stack overflow with too many iterations"?

2014-01-25 Thread Reynold Xin
inherently recursive properties. > > > On Sat, Jan 25, 2014 at 9:57 PM, Reynold Xin wrote: > > > It seems to me fixing DAGScheduler to make it not recursive is the better > > solution here, given the cost of checkpointing. > > > > On Sat, Jan 25, 2014 at 9:49 PM,

Re: Any suggestion about JIRA 1006 "MLlib ALS gets stack overflow with too many iterations"?

2014-01-25 Thread Reynold Xin
It seems to me fixing DAGScheduler to make it not recursive is the better solution here, given the cost of checkpointing. On Sat, Jan 25, 2014 at 9:49 PM, Xia, Junluan wrote: > Hi all > > The description about this Bug submitted by Matei is as following > > > The tipping point seems to be around

Re: [VOTE] Release Apache Spark 0.9.0-incubating (rc5)

2014-01-25 Thread Reynold Xin
+1 > On Jan 25, 2014, at 12:07 PM, Hossein wrote: > > +1 > > Compiled and tested on Mavericks. > > --Hossein > > > On Sat, Jan 25, 2014 at 11:38 AM, Patrick Wendell wrote: > >> I'll kick of the voting with a +1. >> >> On Thu, Jan 23, 2014 at 11:33 PM, Patrick Wendell >> wrote: >>> Please vote on

Re: JavaRDD.collect()

2014-01-24 Thread Reynold Xin
The reason is likely because first() is entirely executed on the driver node in the same process, while collect() needs to connect with worker nodes. Usually the first time you run an action, most of the JVM code are not optimized, and the classloader also needs to load a lot of things on the fly.

Re: [DISCUSS] Graduating as a TLP

2014-01-23 Thread Reynold Xin
+1 supporting Matei as the VP. On Thu, Jan 23, 2014 at 4:11 PM, Chris Mattmann wrote: > +1 from me. > > I'll throw Matei's name into the hat for VP. He's done a great job > and has stood out to me with his report filing and tenacity and > would make an excellent chair. > > Being a chair entails

Re: [DISCUSS] Graduating as a TLP

2014-01-23 Thread Reynold Xin
+1 On Thu, Jan 23, 2014 at 2:45 PM, Matei Zaharia wrote: > Hi folks, > > We’ve been working on the transition to Apache for a while, and our last > shepherd’s report says the following: > > > Spark > > Alan Cabrera (acabrera): > > Seems like a nice active project. IMO, th

Re: [VOTE] Release Apache Spark 0.9.0-incubating (rc3)

2014-01-20 Thread Reynold Xin
That's a perm gen issue - you need to adjust the perm gem size. In sbt it should've been set automatically, but I think for Maven, you need to set the maven opts, which is documented in the build instructions. On Sun, Jan 19, 2014 at 11:35 PM, Ewen Cheslack-Postava wrote: > I can't get the tests

Re: [VOTE] Release Apache Spark 0.9.0-incubating (rc2)

2014-01-19 Thread Reynold Xin
+1 On Sat, Jan 18, 2014 at 11:11 PM, Patrick Wendell wrote: > I'll kick of the voting with a +1. > > On Sat, Jan 18, 2014 at 11:05 PM, Patrick Wendell > wrote: > > Please vote on releasing the following candidate as Apache Spark > > (incubating) version 0.9.0. > > > > A draft of the release not

Re: Config properties broken in master

2014-01-18 Thread Reynold Xin
I also just went over the config options to see how pervasive this is. In addition to speculation, there is one more "conflict" of this kind: spark.locality.wait spark.locality.wait.node spark.locality.wait.process spark.locality.wait.rack spark.speculation spark.speculation.interval spark.specu

Re: Is there any plan to develop an application level fair scheduler?

2014-01-17 Thread Reynold Xin
It does. There are two scheduling levels here. The first level is what the cluster manager does. The standalone cluster manager for Spark only supports FIFO at the moment at the level of applications. Regarding Spark itself. Within a single Spark application, both FIFO and fair scheduling are su

Re: [VOTE] Release Apache Spark 0.9.0-incubating (rc1)

2014-01-16 Thread Reynold Xin
+1 On Thu, Jan 16, 2014 at 3:23 PM, Matei Zaharia wrote: > +1 for me as well. > > I built and tested this on Mac OS X, and looked through the new docs. > > Matei > > On Jan 15, 2014, at 5:48 PM, Patrick Wendell wrote: > > > Please vote on releasing the following candidate as Apache Spark > > (i

Re: Contribute SimRank algorightm to mllib

2014-01-10 Thread Reynold Xin
Hi Jerry, Why don't you submit a pull request and then we can discuss there? If SimRank is not common enough, we might take the matrix multiplication method in and merge that. At the very least, even if SimRank doesn't get merged into Spark, we can include a contrib package or a Wiki page that lin

Re: spark code formatter?

2014-01-08 Thread Reynold Xin
really nice to see everyone has the same spacing and indentation. > > > > Sincerely, > > > > DB Tsai > > Machine Learning Engineer > > Alpine Data Labs > > -- > > Web: http://alpinenow.com/ > > > > > >

Re: spark code formatter?

2014-01-08 Thread Reynold Xin
We have a Scala style configuration file in Shark: https://github.com/amplab/shark/blob/master/scalastyle-config.xml However, the scalastyle project is still pretty primitive and doesn't cover most of the use cases. It is still great to include it to cover basic checks such as 100-char wide lines.

Re: multinomial logistic regression

2014-01-06 Thread Reynold Xin
Thanks. Why don't you submit a pr and then we can work on it? > On Jan 6, 2014, at 6:15 PM, Michael Kun Yang wrote: > > Hi Hossein, > > I can still use LabeledPoint with little modification. Currently I convert > the category into {0, 1} sequence, but I can do the conversion in the body > of meth

Re: Build Changes for SBT Users

2014-01-05 Thread Reynold Xin
Why is it not possible? You always update the script; just can't update scripts for released versions. On Sat, Jan 4, 2014 at 9:07 PM, Patrick Wendell wrote: > I agree TD - I was just saying that Reynold's proposal that we could > update the release post-hoc is unfortunately not possible. > >

Re: Build Changes for SBT Users

2014-01-04 Thread Reynold Xin
Doesn't Apache do redirection from incubation. to the normal website also? By the time that happens, we can also update the URL in the script? On Sat, Jan 4, 2014 at 4:13 PM, Patrick Wendell wrote: > Hey Holden, > > That sounds reasonable to me. Where would we get a url we can control > though

Re: Terminology: "worker" vs "slave"

2014-01-02 Thread Reynold Xin
It is historic. I think we are converging towards worker: the "slave" daemon in the standalone cluster manager executor: the jvm process that is launched by the worker that executes tasks On Thu, Jan 2, 2014 at 10:39 PM, Andrew Ash wrote: > The terms worker and slave seem to be used interch

Re: Disallowing null mergeCombiners

2013-12-31 Thread Reynold Xin
I added the option that doesn't require the caller to specify the mergeCombiner closure a while ago when I wanted to disable mapSideCombine. In virtually all use cases I know of, it is fine & easy to specify a mergeCombiner, so I'm all for this given it simplifies the codebase. On Tue, Dec 31, 20

Re: Spark graduate project ideas

2013-12-31 Thread Reynold Xin
There is a recent discussion on academic projects on Spark. Take a look at the replies to that email (unfortunately you have to dig through the archive to find the replies): http://mail-archives.apache.org/mod_mbox/spark-dev/201312.mbox/%3CCAHH8_ON-2y69fBfVtt6pngWtEPOZdsmvt4hZ=doe-dzsk6k...@mail.g

Re: Systematically performance diagnose

2013-12-30 Thread Reynold Xin
The application web ui is pretty useful. We have been adding more and more information to the web ui for easier performance analysis. Look at Patrick Wendell's two talks at the Spark Summit for more information: http://spark-summit.org/summit-2013/ On Sat, Dec 28, 2013 at 8:12 PM, Hao Lin wrote

Re: test suite results in OOME

2013-12-30 Thread Reynold Xin
the following on the commandline: > -Dtest=TaskResultGetterSuite > > Many other test suites were run. > > How can I run one suite ? > > Thanks > > > On Sat, Dec 28, 2013 at 3:13 PM, Reynold Xin wrote: > > > I usually use sbt. i.e. sbt/sbt test > > > &g

Re: test suite results in OOME

2013-12-28 Thread Reynold Xin
I usually use sbt. i.e. sbt/sbt test On Sat, Dec 28, 2013 at 7:00 AM, Ted Yu wrote: > Hi, > I used the following setting to run test suite: > export MAVEN_OPTS="-Xmx2g -XX:MaxPermSize=812M > -XX:ReservedCodeCacheSize=512m" > > I got: > > [ERROR] [12/28/2013 08:34:03.747] > [sparkWorker1-akka.

Re: Option folding idiom

2013-12-26 Thread Reynold Xin
I'm not strongly against Option.fold, but I find the readability getting worse for the use case you brought up. For the use case of if/else, I find Option.fold pretty confusing because it reverses the order of Some vs None. Also, when code gets long, the lack of an obvious boundary (the only bound

Re: Akka problem when using scala command to launch Spark applications in the current 0.9.0-SNAPSHOT

2013-12-24 Thread Reynold Xin
Spark all use the assembly jar, > and thus java, right? > > -Evan > > > > On Fri, Dec 20, 2013 at 11:36 PM, Reynold Xin wrote: > > > It took me hours to debug a problem yesterday on the latest master branch > > (0.9.0-SNAPSHOT), and I would like to share with th

Akka problem when using scala command to launch Spark applications in the current 0.9.0-SNAPSHOT

2013-12-20 Thread Reynold Xin
It took me hours to debug a problem yesterday on the latest master branch (0.9.0-SNAPSHOT), and I would like to share with the dev list in case anybody runs into this Akka problem. A little background for those of you who haven't followed closely the development of Spark and YARN 2.2: YARN 2.2 use

Re: spark.task.maxFailures

2013-12-16 Thread Reynold Xin
I just merged your pull request https://github.com/apache/incubator-spark/pull/245 On Mon, Dec 16, 2013 at 2:12 PM, Grega Kešpret wrote: > Any news regarding this setting? Is this expected behaviour? Is there some > other way I can have Spark fail-fast? > > Thanks! > > On Mon, Dec 9, 2013 at 4:

Re: Sorry about business lately and general unavailability

2013-12-04 Thread Reynold Xin
Thanks for the update Chris. We do need to graduate soon. People have been asking me does "incubating" means the project is very immature. :( One thing we need to do is to import the JIRA tickets from AMPLab's JIRA. That INFRA ticket hasn't moved much along. Can you help push that? On Wed, Dec

Re: PySpark / scikit-learn integration sprint at Cloudera - Strata Conference Friday 14th Feb 2014

2013-12-02 Thread Reynold Xin
Definitely some people will get confused. It's up to you. If we post it, we can mark it in the title that this is a hackathon. On Mon, Dec 2, 2013 at 1:43 PM, Olivier Grisel wrote: > 2013/12/2 Reynold Xin : > > Including the link to the meetup group: > http://www.meetup.com/s

Re: PySpark / scikit-learn integration sprint at Cloudera - Strata Conference Friday 14th Feb 2014

2013-12-02 Thread Reynold Xin
Including the link to the meetup group: http://www.meetup.com/spark-users/ On Mon, Dec 2, 2013 at 1:22 PM, Reynold Xin wrote: > Olivier, > > Do you want us to create a Spark user meetup event for this hackathon? > > On Mon, Dec 2, 2013 at 1:12 PM, Olivier Grisel >

Re: PySpark / scikit-learn integration sprint at Cloudera - Strata Conference Friday 14th Feb 2014

2013-12-02 Thread Reynold Xin
Olivier, Do you want us to create a Spark user meetup event for this hackathon? On Mon, Dec 2, 2013 at 1:12 PM, Olivier Grisel wrote: > Hi all, > > Just a quick reply to say that I would be glad to meet some of you to > hack on some prototype scikit-learn / PySpark integration. > > Cloudera just

Re: spark.task.maxFailures

2013-11-29 Thread Reynold Xin
Looks like a bug to me. Can you submit a pull request? On Fri, Nov 29, 2013 at 2:02 AM, Grega Kešpret wrote: > Looking at > http://spark.incubator.apache.org/docs/latest/configuration.html > docs says: > Number of individual task failures before giving up on the job. Should be > greater than o

Re: Problem with tests

2013-11-24 Thread Reynold Xin
in it. This > seems to be what is causing my errors. > > > > On Sat, Nov 23, 2013 at 8:00 AM, Nathan Kronenfeld < > nkronenf...@oculusinfo.com> wrote: > > > https://github.com/apache/incubator-spark/pull/18 > > > > > > On Fri, Nov 22, 2013 at 6:35

Re: Problem with tests

2013-11-22 Thread Reynold Xin
Can you provide a link to your pull request? On Sat, Nov 23, 2013 at 5:02 AM, Nathan Kronenfeld < nkronenf...@oculusinfo.com> wrote: > Actually, looking into recent commits, it looks like my hunch may be > exactly correct: > > https://github.com/apache/incubator-spark/commit/f639b65eabcc8666b74a

Re: Problem with tests

2013-11-22 Thread Reynold Xin
Can you provide a link to your pull request? On Sat, Nov 23, 2013 at 5:02 AM, Nathan Kronenfeld < nkronenf...@oculusinfo.com> wrote: > Actually, looking into recent commits, it looks like my hunch may be > exactly correct: > > https://github.com/apache/incubator-spark/commit/f639b65eabcc8666b74a

Re: issue regarding akka, protobuf and Hadoop version

2013-11-06 Thread Reynold Xin
for someone who want to > > take a look. > > > > Best Regards, > > Raymond Liu > > > > > > -Original Message- > > From: Reynold Xin [mailto:r...@apache.org] > > Sent: Tuesday, November 05, 2013 10:07 AM > > To: dev@spark.incubator.apache.org

Re: appId is no longer in the command line args for StandaloneExecutor

2013-11-05 Thread Reynold Xin
+aaron on this one since he changed the executor runner. (I think it is probably an oversight but Aaron should confirm.) On Tue, Nov 5, 2013 at 10:44 AM, Imran Rashid wrote: > Hi, > > a while back, ExecutorRunner was changed so the command line args included > the appId. > > https://github.co

Re: issue regarding akka, protobuf and Hadoop version

2013-11-04 Thread Reynold Xin
and which will surely bring extra > works on future code merging/rebase. So again, what's the code sync > strategy and what's the plan of merge back into master? > > Best Regards, > Raymond Liu > > > -Original Message- > From: Reynold Xin [mailto:r...@

Re: issue regarding akka, protobuf and Hadoop version

2013-11-04 Thread Reynold Xin
Adding in a few guys so they can chime in. On Mon, Nov 4, 2013 at 4:33 PM, Reynold Xin wrote: > I chatted with Matt Massie about this, and here are some options: > > 1. Use dependency injection in google-guice to make Akka use one version > of protobuf, and YARN use the other ve

Re: issue regarding akka, protobuf and Hadoop version

2013-11-04 Thread Reynold Xin
I chatted with Matt Massie about this, and here are some options: 1. Use dependency injection in google-guice to make Akka use one version of protobuf, and YARN use the other version. 2. Look into OSGi to accomplish the same goal. 3. Rewrite the messaging part of Spark to use a simple, custom RP

Re: SPARK-942

2013-11-03 Thread Reynold Xin
It's not a very elegant solution, but one possibility is for the CacheManager to check whether it will have enough space. If it is running out of space, skips buffering the output of the iterator & directly write the output of the iterator to disk (if storage level allows that). But it is still tr

Re: Are we moving too fast or too far on 0.8.1-SNAPSHOT?

2013-10-28 Thread Reynold Xin
Hi Mark, I can't comment much on the Spark part right now (because I have to run in 3 mins), but we will make Shark 0.8.1 work with Spark 0.8.1 for sure. Some of the changes will get cherry picked into branch-0.8 of Shark. On Mon, Oct 28, 2013 at 6:22 PM, Mark Hamstra wrote: > Or more to the po

Re: help me with setting up IntelliJ Idea development IDE for Spark

2013-10-27 Thread Reynold Xin
Just generate the IntelliJ project file using sbt/sbt gen-idea And then open the folder in IntelliJ (no need to import anything). On Sun, Oct 27, 2013 at 8:31 PM, dachuan wrote: > Hi, all, > > Could anybody help me set up the dev IDE for spark in IntelliJ idea IDE? > > I have already install

Re: Documentation of Java API and PySpark internals

2013-10-23 Thread Reynold Xin
Thanks, Josh. These are very useful for people to understand the APIs and to write new language bindings. On Wed, Oct 23, 2013 at 8:57 PM, Josh Rosen wrote: > I've created two new pages on the Spark wiki to document the internals of > the Java and Python APIs: > > https://cwiki.apache.org/confl

Re: Is there any MLlib SVM Reference Paper

2013-10-21 Thread Reynold Xin
It is fairly simple and just runs mini-batch sgd. You can actually just look at the code. https://github.com/apache/incubator-spark/blob/master/mllib/src/main/scala/org/apache/spark/mllib/classification/SVM.scala On Mon, Oct 21, 2013 at 10:37 PM, Sarath P R wrote: > Hi All, > > I Would like to

Re: SPARK-883

2013-10-17 Thread Reynold Xin
Thanks. I just closed the issue. On Thu, Oct 17, 2013 at 12:43 AM, karthik tunga wrote: > Hi, > > Is SPARK-883 still > open ? I already see lift-json dependency in pom.xml and didn't find any > reference to "scala.util.parsing.json". > > Che

Re: Experimental Scala-2.10.3 branch based on master

2013-10-04 Thread Reynold Xin
Hi Martin, Thanks for updating us. Prashant has also been updating the scala 2.10 branch at https://github.com/mesos/spark/tree/scala-2.10 Did you take a look at his work? On Fri, Oct 4, 2013 at 8:01 AM, Martin Weindel wrote: > Here you can find an experimental branch of Spark for Scala 2.10.

Re: Spark 0.8.0: bits need to come from ASF infrastructure

2013-09-26 Thread Reynold Xin
>>>> bandwidth than the mirror network. > >>>> > >>>> These are my concerns, that basically we're causing our users to have > >>>> a much worse experience. I've identified these concerns with moving to > >>>> the apache mirror, but perhaps I've overlooked some benefits that > >>>> would counteract these. Are there benefits? > >>>> > >>>> I completely agree that we need to send users to the signatures and > >>>> hashes at the Apache release site (to verify the release). So I did > >>>> add the link to this directly adjacent to the download. > >>>> > >>>> - Patrick > >>>> > >>>> On Thu, Sep 26, 2013 at 3:50 PM, Chris Mattmann > >>> wrote: > >>>>> Hey Guys, > >>>>> > >>>>> Yep the link should by the dyn/closer.cgi link on the website and +1 > >>>>> to Roman's comment about auditing spark-project.org links to be > >>> replaced > >>>>> with ASF counterparts. > >>>>> > >>>>> Cheers, > >>>>> Chris > >>>>> > >>>>> > >>>>> > >>>>> -Original Message -- -- Reynold Xin, AMPLab, UC Berkeley http://rxin.org

Re: Propose to Re-organize the scripts and configurations

2013-09-21 Thread Reynold Xin
Thanks, Shane. Can you also link to this mailing list discussion from the JIRA ticket? -- Reynold Xin, AMPLab, UC Berkeley http://rxin.org On Sat, Sep 21, 2013 at 9:01 PM, Shane Huang wrote: > I summarized the opinions about Config in this post and added a comment on > SPARK-544. >

Fwd: JVMs on single cores but parallel JVMs.

2013-09-21 Thread Reynold Xin
FYI -- Forwarded message -- From: Kevin Burton Date: Sat, Sep 21, 2013 at 9:30 AM Subject: Re: JVMs on single cores but parallel JVMs. To: mechanical-sympa...@googlegroups.com ok... so I'll rephrase this a bit. You're essentially saying that GC and background threads will need

Re: [VOTE] Release Apache Spark 0.8.0-incubating (RC6)

2013-09-18 Thread Reynold Xin
+1 -- Reynold Xin, AMPLab, UC Berkeley http://rxin.org On Wed, Sep 18, 2013 at 11:06 AM, Konstantin Boudnik wrote: > Maven package could be run with -DskipTests that will simply build... well, > the package. > > +1 on the RC. The nits are indeed minor. > > Cos > >

Re: [VOTE] Release Apache Spark 0.8.0-incubating (RC5)

2013-09-16 Thread Reynold Xin
+1 -- Reynold Xin, AMPLab, UC Berkeley http://rxin.org On Sun, Sep 15, 2013 at 11:09 PM, Patrick Wendell wrote: > I also wrote an audit script [1] to verify various aspects of the > release binaries and ran it on this RC. People are welcome to run this > themselves, but I haven&#x

Re: how to debug spark core code?

2013-09-10 Thread Reynold Xin
Among the folks in Berkeley, most of us use IntelliJ / Vim / Sublime Text. You can generate the IntelliJ project for Spark using sbt/sbt gen-idea On Tue, Sep 10, 2013 at 2:23 PM, Mingxi Wu wrote: > thanks Cesar. > > > On Mon, Sep 9, 2013 at 11:08 PM, Cesar Arevalo >wrote: > > > Hi Mingxi, >

Re: Needs a matrix library

2013-09-06 Thread Reynold Xin
is a decent library. -- Reynold Xin, AMPLab, UC Berkeley http://rxin.org On Sat, Sep 7, 2013 at 6:13 AM, Dmitriy Lyubimov wrote: > keep forgetting this: what is graphx release roadmap? > > On Fri, Sep 6, 2013 at 3:04 PM, Konstantin Boudnik wrote: > > Would it be more l

Re: Apache account

2013-09-06 Thread Reynold Xin
Copying Chris on this one. -- Reynold Xin, AMPLab, UC Berkeley http://rxin.org On Fri, Sep 6, 2013 at 2:17 PM, Nick Pentreath wrote: > Hi > > I submitted my license agreement and account name request a while back, but > still haven't received any correspondence. Just wonderi

Re: [Licensing check] Spark 0.8.0-incubating RC1

2013-09-03 Thread Reynold Xin
That seems substantially more overhead than generating github pull requests. Is there any particular reason we want to do that? On Wed, Sep 4, 2013 at 11:01 AM, Henry Saputra wrote: > Thanks Guys. > > In other ASF projects I also allow people to attach the git diff to the > JIRA itself (once we

Re: off-heap RDDs

2013-08-27 Thread Reynold Xin
On Tue, Aug 27, 2013 at 1:37 AM, Imran Rashid wrote: > > Reynold Xin wrote: > > This is especially attractive if the application can read directly from > a byte > > buffer without generic serialization (like Shark). > > interesting -- can you explain how this works in

Re: off-heap RDDs

2013-08-25 Thread Reynold Xin
Mark - you don't necessarily need to construct a separate storage level. One simple way to accomplish this is for the user application to pass Spark a DirectByteBuffer. On Sun, Aug 25, 2013 at 6:06 PM, Mark Hamstra wrote: > I'd need to see a clear and significant advantage to using off-heap RD

Re: off-heap RDDs

2013-08-25 Thread Reynold Xin
This can be a good idea, especially for large heaps, and the changes for Spark is potentially fairly small (need to make BlockManager aware of off heap size and direct byte buffers in its size accounting). This is especially attractive if the application can read directly from a byte buffer without

Re: RDDs with no partitions

2013-08-23 Thread Reynold Xin
But is there any reason to do the handling of those beyond runJob? On Fri, Aug 23, 2013 at 11:04 AM, Charles Reiss wrote: > On 8/22/13 22:57 , Reynold Xin wrote: > > I actually don't think there is any reason to have 0 partition stages, > be it > > either result stage or

Re: RDDs with no partitions

2013-08-22 Thread Reynold Xin
that we actually > have to deal with? > > > > On Thu, Aug 22, 2013 at 9:20 PM, Reynold Xin wrote: > > > Being the guy that added the empty partition rdd, I second your idea that > > we should just short-circuit those in DAGScheduler.runJob. > > > > > >

Re: RDDs with no partitions

2013-08-22 Thread Reynold Xin
Being the guy that added the empty partition rdd, I second your idea that we should just short-circuit those in DAGScheduler.runJob. On Thu, Aug 22, 2013 at 8:26 PM, Mark Hamstra wrote: > So how do these get created, and are we really handling them correctly? > What is prompting my questions

Re: Bagel and partitioning

2013-08-16 Thread Reynold Xin
Hi Denis, Thanks for the email. I didn't look at the paper yet so I don't fully understand your use case. But here are some answers: 1. Do you plan to continue development of Bagel? Bagel will be subsumed by GraphX when GraphX comes out. We will try to provide a Bagel API on top of GraphX so exi

Re: Machine Learning on Spark [long rambling discussion email]

2013-07-24 Thread Reynold Xin
y hard to debug / optimize performance. -- Reynold Xin, AMPLab, UC Berkeley http://rxin.org

Re: Mailing list transition (was Re: Apache Spark podling: Created!)

2013-06-28 Thread Reynold Xin
uthern California, Los Angeles, CA 90089 USA > >++ > > > > > > > >> > >> > >>On Fri, Jun 21, 2013 at 5:03 PM, Mattmann, Chris A (398J) < > >>chris.a.mattm...@jpl.nasa

Re: Mailing list transition (was Re: Apache Spark podling: Created!)

2013-06-28 Thread Reynold Xin
search for archived messages. -- Reynold Xin, AMPLab, UC Berkeley http://rxin.org On Fri, Jun 28, 2013 at 2:04 PM, Andy Konwinski wrote: > + spark-develop...@googlegroups.com to loop in those who haven't > subscribed to dev@spark.i.a.o yet, (also because my emails are getting > bounc