Hey Andrew,
Ah, I just meant to say that in cases like this it's usually a
mistake... and we try to (in general) be inclusive about merging
patches :) Definitely appreciate you calling this one out... this is
what people should do in cases like this.
- Patrick
On Tue, Feb 25, 2014 at 8:00 PM, A
The problem is, the complete spark dependency graph is fairly large,
and there are lot of conflicting versions in there.
In particular, when we bump versions of dependencies - making managing
this messy at best.
Now, I have not looked in detail at how maven manages this - it might
just be accident
@Sandy
Yes, in sbt with multiple projects setup, you can easily set a variable in the
build.scala and reference the version number from all dependent projects .
Regarding mix of java and scala projects, in my workplace , we have both java
and scala codes. The sbt can be used to build both with
I've always felt that the Spark team was extremely responsive to PRs and
I've been very impressed over the past year with your output. As Matei
said, probably the best thing to do here is to be more diligent about
closing PRs that are old/abandoned so that every PR is active. Whenever I
comment I
We use jarjar Ant plugin task to assemble into one fat jar.
Qiuzhuang
On Wed, Feb 26, 2014 at 11:26 AM, Evan chan wrote:
> Actually you can control exactly how sbt assembly merges or resolves
> conflicts. I believe the default settings however lead to order which
> cannot be controlled.
>
> I
Actually you can control exactly how sbt assembly merges or resolves conflicts.
I believe the default settings however lead to order which cannot be
controlled.
I do wish for a smarter fat jar plugin.
-Evan
To be free is not merely to cast off one's chains, but to live in a way that
respec
On Wed, Feb 26, 2014 at 5:31 AM, Patrick Wendell wrote:
> Evan - this is a good thing to bring up. Wrt the shader plug-in -
> right now we don't actually use it for bytecode shading - we simply
> use it for creating the uber jar with excludes (which sbt supports
> just fine via assembly).
Not re
Hi hyqgod,
This is probably a better question for the spark user's list than the dev
list (cc'ing user and bcc'ing dev on this reply).
To answer your question, though:
Amazon's Public Datasets Page is a nice place to start:
http://aws.amazon.com/datasets/ - these work well with spark because
the
Hi all:
I am a freshman in Spark community. i dream of being a expert in the field of
big data. But i have no idea where to start after i have gone through the
published documents in Spark website and examples in Spark source code. I
want to know if there are some public data set in the inte
Hi everyone,
Sorry I'm late to the thread here, but I want to point out a few things.
This is, of course, a most welcome contribution and it will be immediately
useful to everything currently using the stochastic gradient optimizers!
1) I'm all for refactoring the optimization methods to make the
Hey Andrew,
Indeed, sometimes there are patches that sit around a while and in
this case it can be because it's unclear to the reviewers whether they
are features worth having - or just by accident.
To put things in perspective, Spark merges about 80% of the proposed
patches (if you look we are o
Hi DB,
Could you please point me to your spark PR ?
Thanks.
Deb
On Tue, Feb 25, 2014 at 5:03 PM, DB Tsai wrote:
> Hi Deb, Xiangrui
>
> I just moved the LBFGS code to maven central, and cleaned up the code
> a little bit.
>
> https://github.com/AlpineNow/incubator-spark/commits/dbtsai-LBFGS
>
Hi Deb, Xiangrui
I just moved the LBFGS code to maven central, and cleaned up the code
a little bit.
https://github.com/AlpineNow/incubator-spark/commits/dbtsai-LBFGS
After looking at Mallet, the api is pretty simple, and it's probably
can be easily tested
based on my PR.
It will be tricky to j
Sandy, I believe the sbt-pom-reader plugin might work very well for
this exact use case. Otherwise, the SBT build file is just Scala
code, so it can easily read the pom XML directly if needed and parse
stuff out.
On Tue, Feb 25, 2014 at 4:36 PM, Sandy Ryza wrote:
> To perhaps restate what some
To perhaps restate what some have said, Maven is by far the most common
build tool for the Hadoop / JVM data ecosystem. While Maven is less pretty
than SBT, expertise in it is abundant. SBT requires contributors to
projects in the ecosystem to learn yet another tool. If we think of Spark
as a pr
Hi Patrick,
If you include shaded dependencies inside of the main Spark jar, such
that it would have combined classes from all dependencies, wouldn't
you end up with a sub-assembly jar? It would be dangerous in that
since it is a single unit, it would break normal packaging assumptions
that the j
What I mean is this. AFIAK the shader plug-in is primarily designed
for creating uber jars which contain spark and all dependencies. But
since Spark is something people depend on in Maven, what I actually
want is to create the normal old Spark jar [1], but then include
shaded versions of some of ou
Patrick -- not sure I understand your request, do you mean
- somehow creating a shaded jar (eg with maven shader plugin)
- then including it in the spark jar (which would then be an assembly)?
On Tue, Feb 25, 2014 at 4:01 PM, Patrick Wendell wrote:
> Evan - this is a good thing to bring up. Wrt t
Evan - this is a good thing to bring up. Wrt the shader plug-in -
right now we don't actually use it for bytecode shading - we simply
use it for creating the uber jar with excludes (which sbt supports
just fine via assembly).
I was wondering actually, do you know if it's possible to added shaded
a
Hi Patrick,
> (b) You have downloaded Spark and forked it's maven build to change around
> the dependencies.
We go with this approach. We've cloned Spark repo and currently maintain
our own branch. The idea is to fix Spark issues found in our production
system first and contribute back to commu
Hey Yao,
Would you mind explaining exactly how your company extends the Spark
maven build? For instance:
(a) You are depending on Spark in your build and your build is using Maven.
(b) You have downloaded Spark and forked it's maven build to change
around the dependencies.
(c) You are writing pom
The problem is that plugins are not equivalent. There is AFAIK no
equivalent to the maven shader plugin for SBT.
There is an SBT plugin which can apparently read POM XML files
(sbt-pom-reader). However, it can't possibly handle plugins, which
is still problematic.
On Tue, Feb 25, 2014 at 3:31 P
I would prefer keep both of them, it would be better even if that means
pom.xml will be generated using sbt. Some company, like my current one,
have their own build infrastructures built on top of maven. It is not easy
to support sbt for these potential spark clients. But I do agree to only
keep on
I am no sbt guru, but I could exclude transitive dependencies this way:
libraryDependencies +=
"log4j" % "log4j" % "1.2.15" exclude("javax.jms", "jms")
Thanks!
On Tue, Feb 25, 2014 at 3:20 PM, Evan Chan wrote:
> The correct way to exclude dependencies in SBT is actually to declare
> a depe
The correct way to exclude dependencies in SBT is actually to declare
a dependency as "provided". I'm not familiar with Maven or its
dependencySet, but provided will mark the entire dependency tree as
excluded. It is also possible to exclude jar by jar, but this is
pretty error prone and messy.
yes in sbt assembly you can exclude jars (although i never had a need for
this) and files in jars.
for example i frequently remove log4j.properties, because for whatever
reason hadoop decided to include it making it very difficult to use our own
logging config.
On Tue, Feb 25, 2014 at 4:24 PM,
On Fri, Feb 21, 2014 at 11:11AM, Patrick Wendell wrote:
> Kos - thanks for chiming in. Could you be more specific about what is
> available in maven and not in sbt for these issues? I took a look at
> the bigtop code relating to Spark. As far as I could tell [1] was the
> main point of integration
Hi DB,
I am considering building on your PR and add Mallet as the dependency so
that we can run some basic comparisons test on large scale sparse datasets
that I have.
In the meantime, let's discuss if there are other optimization packages
that we should try.
My wishlist has bounded bfgs as well
I find some comparison between Mallet vs Fortran version. The result
is closed but not the same.
http://t3827.ai-mallet-development.aitalk.info/help-with-l-bfgs-t3827.html
Here is LBFGS-B
Cost: 0.6902411220175793
Gradient: -5.453609E-007, -2.858372E-008, -1.369706E-007
Theta: -0.01418621010217140
Hi Deb,
On Tue, Feb 25, 2014 at 7:07 AM, Debasish Das wrote:
> Continuation on last email sent by mistake:
>
> Is cpl license is compatible with apache ?
>
> http://opensource.org/licenses/cpl1.0.php
Based on what I read here, there is no problem to include CPL code in
apache project
as long as
Hi Deb,
CPL 1.0 is compatible if the inclusion is appropriately labeled
(https://www.apache.org/legal/3party.html). I think it is great to
have an L-BFGS optimizer in mllib, but we need to investigate some
time to figure out which one to use. I'm not sure whether jblas or
netlib-java will make a b
On 02/25/2014 07:55 AM, Matei Zaharia wrote:
> This is probably a snafu because we had a GitHub hook that was sending
> messages to d...@spark.incubator.apache.org, and that list was recently moved
> (or is in the process of being moved?) to dev@spark.apache.org. Unfortunately
> there’s nothing
Continuation on last email sent by mistake:
Is cpl license is compatible with apache ?
http://opensource.org/licenses/cpl1.0.php
Mallet jars are available on maven. They have hessian based solvers which
looked interesting along with bfgs and cg.
Definitely the lbfgs f2j looks promising as the b
Hi DB, Xiangrui,
Mallet from cmu also has bfgs cg and a good optimization package. Do you
know if cpl license si
On Feb 22, 2014 11:50 AM, "Xiangrui Meng" wrote:
> Hi DB,
>
> It is great to have the L-BFGS optimizer in MLlib and thank you for taking
> care of the license issue. I looked through
34 matches
Mail list logo