Re: SPARK-942 patch review

2014-02-25 Thread Patrick Wendell
Hey Andrew, Ah, I just meant to say that in cases like this it's usually a mistake... and we try to (in general) be inclusive about merging patches :) Definitely appreciate you calling this one out... this is what people should do in cases like this. - Patrick On Tue, Feb 25, 2014 at 8:00 PM, A

Re: [DISCUSS] Necessity of Maven *and* SBT Build in Spark

2014-02-25 Thread Mridul Muralidharan
The problem is, the complete spark dependency graph is fairly large, and there are lot of conflicting versions in there. In particular, when we bump versions of dependencies - making managing this messy at best. Now, I have not looked in detail at how maven manages this - it might just be accident

Re: [DISCUSS] Necessity of Maven *and* SBT Build in Spark

2014-02-25 Thread Chester Chen
@Sandy Yes, in sbt with multiple projects setup, you can easily set a variable in the build.scala and reference the version number from all dependent projects . Regarding mix of java and scala projects, in my workplace , we have both java and scala codes. The sbt can be used to build both with

Re: SPARK-942 patch review

2014-02-25 Thread Andrew Ash
I've always felt that the Spark team was extremely responsive to PRs and I've been very impressed over the past year with your output. As Matei said, probably the best thing to do here is to be more diligent about closing PRs that are old/abandoned so that every PR is active. Whenever I comment I

Re: [DISCUSS] Necessity of Maven *and* SBT Build in Spark

2014-02-25 Thread Qiuzhuang Lian
We use jarjar Ant plugin task to assemble into one fat jar. Qiuzhuang On Wed, Feb 26, 2014 at 11:26 AM, Evan chan wrote: > Actually you can control exactly how sbt assembly merges or resolves > conflicts. I believe the default settings however lead to order which > cannot be controlled. > > I

Re: [DISCUSS] Necessity of Maven *and* SBT Build in Spark

2014-02-25 Thread Evan chan
Actually you can control exactly how sbt assembly merges or resolves conflicts. I believe the default settings however lead to order which cannot be controlled. I do wish for a smarter fat jar plugin. -Evan To be free is not merely to cast off one's chains, but to live in a way that respec

Re: [DISCUSS] Necessity of Maven *and* SBT Build in Spark

2014-02-25 Thread Mridul Muralidharan
On Wed, Feb 26, 2014 at 5:31 AM, Patrick Wendell wrote: > Evan - this is a good thing to bring up. Wrt the shader plug-in - > right now we don't actually use it for bytecode shading - we simply > use it for creating the uber jar with excludes (which sbt supports > just fine via assembly). Not re

Re: [HELP] ask for some information about public data set

2014-02-25 Thread Evan R. Sparks
Hi hyqgod, This is probably a better question for the spark user's list than the dev list (cc'ing user and bcc'ing dev on this reply). To answer your question, though: Amazon's Public Datasets Page is a nice place to start: http://aws.amazon.com/datasets/ - these work well with spark because the

[HELP] ask for some information about public data set

2014-02-25 Thread 黄远强
Hi all: I am a freshman in Spark community. i dream of being a expert in the field of big data. But i have no idea where to start after i have gone through the published documents in Spark website and examples in Spark source code. I want to know if there are some public data set in the inte

Re: MLLib - Thoughts about refactoring Updater for LBFGS?

2014-02-25 Thread Evan R. Sparks
Hi everyone, Sorry I'm late to the thread here, but I want to point out a few things. This is, of course, a most welcome contribution and it will be immediately useful to everything currently using the stochastic gradient optimizers! 1) I'm all for refactoring the optimization methods to make the

Re: SPARK-942 patch review

2014-02-25 Thread Patrick Wendell
Hey Andrew, Indeed, sometimes there are patches that sit around a while and in this case it can be because it's unclear to the reviewers whether they are features worth having - or just by accident. To put things in perspective, Spark merges about 80% of the proposed patches (if you look we are o

Re: MLLib - Thoughts about refactoring Updater for LBFGS?

2014-02-25 Thread Debasish Das
Hi DB, Could you please point me to your spark PR ? Thanks. Deb On Tue, Feb 25, 2014 at 5:03 PM, DB Tsai wrote: > Hi Deb, Xiangrui > > I just moved the LBFGS code to maven central, and cleaned up the code > a little bit. > > https://github.com/AlpineNow/incubator-spark/commits/dbtsai-LBFGS >

Re: MLLib - Thoughts about refactoring Updater for LBFGS?

2014-02-25 Thread DB Tsai
Hi Deb, Xiangrui I just moved the LBFGS code to maven central, and cleaned up the code a little bit. https://github.com/AlpineNow/incubator-spark/commits/dbtsai-LBFGS After looking at Mallet, the api is pretty simple, and it's probably can be easily tested based on my PR. It will be tricky to j

Re: [DISCUSS] Necessity of Maven *and* SBT Build in Spark

2014-02-25 Thread Evan Chan
Sandy, I believe the sbt-pom-reader plugin might work very well for this exact use case. Otherwise, the SBT build file is just Scala code, so it can easily read the pom XML directly if needed and parse stuff out. On Tue, Feb 25, 2014 at 4:36 PM, Sandy Ryza wrote: > To perhaps restate what some

Re: [DISCUSS] Necessity of Maven *and* SBT Build in Spark

2014-02-25 Thread Sandy Ryza
To perhaps restate what some have said, Maven is by far the most common build tool for the Hadoop / JVM data ecosystem. While Maven is less pretty than SBT, expertise in it is abundant. SBT requires contributors to projects in the ecosystem to learn yet another tool. If we think of Spark as a pr

Re: [DISCUSS] Necessity of Maven *and* SBT Build in Spark

2014-02-25 Thread Evan Chan
Hi Patrick, If you include shaded dependencies inside of the main Spark jar, such that it would have combined classes from all dependencies, wouldn't you end up with a sub-assembly jar? It would be dangerous in that since it is a single unit, it would break normal packaging assumptions that the j

Re: [DISCUSS] Necessity of Maven *and* SBT Build in Spark

2014-02-25 Thread Patrick Wendell
What I mean is this. AFIAK the shader plug-in is primarily designed for creating uber jars which contain spark and all dependencies. But since Spark is something people depend on in Maven, what I actually want is to create the normal old Spark jar [1], but then include shaded versions of some of ou

Re: [DISCUSS] Necessity of Maven *and* SBT Build in Spark

2014-02-25 Thread Evan Chan
Patrick -- not sure I understand your request, do you mean - somehow creating a shaded jar (eg with maven shader plugin) - then including it in the spark jar (which would then be an assembly)? On Tue, Feb 25, 2014 at 4:01 PM, Patrick Wendell wrote: > Evan - this is a good thing to bring up. Wrt t

Re: [DISCUSS] Necessity of Maven *and* SBT Build in Spark

2014-02-25 Thread Patrick Wendell
Evan - this is a good thing to bring up. Wrt the shader plug-in - right now we don't actually use it for bytecode shading - we simply use it for creating the uber jar with excludes (which sbt supports just fine via assembly). I was wondering actually, do you know if it's possible to added shaded a

Re: [DISCUSS] Necessity of Maven *and* SBT Build in Spark

2014-02-25 Thread yao
Hi Patrick, > (b) You have downloaded Spark and forked it's maven build to change around > the dependencies. We go with this approach. We've cloned Spark repo and currently maintain our own branch. The idea is to fix Spark issues found in our production system first and contribute back to commu

Re: [DISCUSS] Necessity of Maven *and* SBT Build in Spark

2014-02-25 Thread Patrick Wendell
Hey Yao, Would you mind explaining exactly how your company extends the Spark maven build? For instance: (a) You are depending on Spark in your build and your build is using Maven. (b) You have downloaded Spark and forked it's maven build to change around the dependencies. (c) You are writing pom

Re: [DISCUSS] Necessity of Maven *and* SBT Build in Spark

2014-02-25 Thread Evan Chan
The problem is that plugins are not equivalent. There is AFAIK no equivalent to the maven shader plugin for SBT. There is an SBT plugin which can apparently read POM XML files (sbt-pom-reader). However, it can't possibly handle plugins, which is still problematic. On Tue, Feb 25, 2014 at 3:31 P

Re: [DISCUSS] Necessity of Maven *and* SBT Build in Spark

2014-02-25 Thread yao
I would prefer keep both of them, it would be better even if that means pom.xml will be generated using sbt. Some company, like my current one, have their own build infrastructures built on top of maven. It is not easy to support sbt for these potential spark clients. But I do agree to only keep on

Re: [DISCUSS] Necessity of Maven *and* SBT Build in Spark

2014-02-25 Thread Sravya Tirukkovalur
I am no sbt guru, but I could exclude transitive dependencies this way: libraryDependencies += "log4j" % "log4j" % "1.2.15" exclude("javax.jms", "jms") Thanks! On Tue, Feb 25, 2014 at 3:20 PM, Evan Chan wrote: > The correct way to exclude dependencies in SBT is actually to declare > a depe

Re: [DISCUSS] Necessity of Maven *and* SBT Build in Spark

2014-02-25 Thread Evan Chan
The correct way to exclude dependencies in SBT is actually to declare a dependency as "provided". I'm not familiar with Maven or its dependencySet, but provided will mark the entire dependency tree as excluded. It is also possible to exclude jar by jar, but this is pretty error prone and messy.

Re: [DISCUSS] Necessity of Maven *and* SBT Build in Spark

2014-02-25 Thread Koert Kuipers
yes in sbt assembly you can exclude jars (although i never had a need for this) and files in jars. for example i frequently remove log4j.properties, because for whatever reason hadoop decided to include it making it very difficult to use our own logging config. On Tue, Feb 25, 2014 at 4:24 PM,

Re: [DISCUSS] Necessity of Maven *and* SBT Build in Spark

2014-02-25 Thread Konstantin Boudnik
On Fri, Feb 21, 2014 at 11:11AM, Patrick Wendell wrote: > Kos - thanks for chiming in. Could you be more specific about what is > available in maven and not in sbt for these issues? I took a look at > the bigtop code relating to Spark. As far as I could tell [1] was the > main point of integration

Re: MLLib - Thoughts about refactoring Updater for LBFGS?

2014-02-25 Thread Debasish Das
Hi DB, I am considering building on your PR and add Mallet as the dependency so that we can run some basic comparisons test on large scale sparse datasets that I have. In the meantime, let's discuss if there are other optimization packages that we should try. My wishlist has bounded bfgs as well

Re: MLLib - Thoughts about refactoring Updater for LBFGS?

2014-02-25 Thread DB Tsai
I find some comparison between Mallet vs Fortran version. The result is closed but not the same. http://t3827.ai-mallet-development.aitalk.info/help-with-l-bfgs-t3827.html Here is LBFGS-B Cost: 0.6902411220175793 Gradient: -5.453609E-007, -2.858372E-008, -1.369706E-007 Theta: -0.01418621010217140

Re: MLLib - Thoughts about refactoring Updater for LBFGS?

2014-02-25 Thread DB Tsai
Hi Deb, On Tue, Feb 25, 2014 at 7:07 AM, Debasish Das wrote: > Continuation on last email sent by mistake: > > Is cpl license is compatible with apache ? > > http://opensource.org/licenses/cpl1.0.php Based on what I read here, there is no problem to include CPL code in apache project as long as

Re: MLLib - Thoughts about refactoring Updater for LBFGS?

2014-02-25 Thread Xiangrui Meng
Hi Deb, CPL 1.0 is compatible if the inclusion is appropriately labeled (https://www.apache.org/legal/3party.html). I think it is great to have an L-BFGS optimizer in mllib, but we need to investigate some time to figure out which one to use. I'm not sure whether jblas or netlib-java will make a b

Re: Github emails

2014-02-25 Thread Daniel Gruno
On 02/25/2014 07:55 AM, Matei Zaharia wrote: > This is probably a snafu because we had a GitHub hook that was sending > messages to d...@spark.incubator.apache.org, and that list was recently moved > (or is in the process of being moved?) to dev@spark.apache.org. Unfortunately > there’s nothing

Re: MLLib - Thoughts about refactoring Updater for LBFGS?

2014-02-25 Thread Debasish Das
Continuation on last email sent by mistake: Is cpl license is compatible with apache ? http://opensource.org/licenses/cpl1.0.php Mallet jars are available on maven. They have hessian based solvers which looked interesting along with bfgs and cg. Definitely the lbfgs f2j looks promising as the b

Re: MLLib - Thoughts about refactoring Updater for LBFGS?

2014-02-25 Thread Debasish Das
Hi DB, Xiangrui, Mallet from cmu also has bfgs cg and a good optimization package. Do you know if cpl license si On Feb 22, 2014 11:50 AM, "Xiangrui Meng" wrote: > Hi DB, > > It is great to have the L-BFGS optimizer in MLlib and thank you for taking > care of the license issue. I looked through