le in play!
> So play/json has just one big dep which is Jackson!
>
> I agree that jackson is the right way to go as a beginning.
> But for scala developers, a higher thin layer like play/json is useful to
> bring typesafety...
>
> Pascal
>
> On Tue, Feb 11, 2014 at 1:31 AM,
Any interest in adding Fast Serialization (or possibly replacing the
default of Java Serialization)?
https://code.google.com/p/fast-serialization/
--
--
Evan Chan
Staff Engineer
e...@ooyala.com |
>> 1) Line-wrapped method return type is indented with two spaces:
>> >> >> def longMethodName(... long param list ...)
>> >> >> : Long = {
>> >> >> 2
>> >> >> }
>> >> >>
>> >> >> *Justification: *I think this is the most commonly used style in
>> Spark
>> >> >> today. It's also similar to the "extends" style used in classes, with
>> >> >> the
>> >> >> same justification: it is visually distinguished from the 4-indented
>> >> >> parameter list.
>> >> >>
>> >> >> 2) URLs and code examples in comments should not be line-wrapped.
>> >> >> Here<
>> >> >>
>> >> >>
>> https://github.com/apache/incubator-spark/pull/557/files#diff-c338f10f3567d4c1d7fec4bf9e2677e1L29
>> >> >> >is
>> >> >> an example of the latter.
>> >> >>
>> >> >> *Justification*: Line-wrapping can cause confusion when trying to
>> >> >> copy-paste a URL or command. Can additionally cause IDE issues or,
>> >> >> avoidably, Javadoc issues.
>> >> >>
>> >> >> Any thoughts on these, or additional style issues not explicitly
>> >> >> covered in
>> >> >> either the Scala style guide or Spark wiki?
>> >> >>
>> >
>> >
>>
--
--
Evan Chan
Staff Engineer
e...@ooyala.com |
to me like a pretty straightforward change (although
>> > > JsonProtocol.scala would be a little more verbose since it couldn't use
>> > > the Lift JSON DSL), and I'd like to do it. I'm writing now to ask for
>> > > some community feedback before making the change (and submitting a JIRA
>> > > and PR). If no one has any serious objections (to the effort in
>> general
>> > > or to to the choice of spark-json in particular), I'll go ahead and do
>> it,
>> > > but if anyone has concerns, I'd be happy to discuss and address them
>> > > before getting started.
>> > >
>> > >
>> > > thanks,
>> > > wb
>> >
>> >
>>
--
--
Evan Chan
Staff Engineer
e...@ooyala.com |
On Thu, Feb 6, 2014 at 11:54 AM, Evan Chan wrote:
> +1 for 0.10.0.
>
> It would give more time to study things (such as the new SparkConf)
> and let the community decide if any breaking API changes are needed.
>
> Also, a +1 for minor revisions not breaking code compatibility,
ixes) and their urgency. In general
>>>> >>>> these releases are designed to patch bugs. However, higher level
>>>> >>>> libraries may introduce small features, such as a new algorithm,
>>>> >>>> provided they are entirely additive and isolated from existing code
>>>> >>>> paths. Spark core may not introduce any features.
>>>> >>>>
>>>> >>>> When new components are added to Spark, they may initially be marked
>>>> >>>> as "alpha". Alpha components do not have to abide by the above
>>>> >>>> guidelines, however, to the maximum extent possible, they should try
>>>> >>>> to. Once they are marked "stable" they have to follow these
>>>> >>>> guidelines. At present, GraphX is the only alpha component of Spark.
>>>> >>>>
>>>> >>>> [1] API compatibility:
>>>> >>>>
>>>> >>>> An API is any public class or interface exposed in Spark that is not
>>>> >>>> marked as semi-private or experimental. Release A is API compatible
>>>> >>>> with release B if code compiled against release A *compiles cleanly*
>>>> >>>> against B. This does not guarantee that a compiled application that
>>>> is
>>>> >>>> linked against version A will link cleanly against version B without
>>>> >>>> re-compiling. Link-level compatibility is something we'll try to
>>>> >>>> guarantee that as well, and we might make it a requirement in the
>>>> >>>> future, but challenges with things like Scala versions have made
>>>> this
>>>> >>>> difficult to guarantee in the past.
>>>> >>>>
>>>> >>>> == Merging Pull Requests ==
>>>> >>>> To merge pull requests, committers are encouraged to use this tool
>>>> [2]
>>>> >>>> to collapse the request into one commit rather than manually
>>>> >>>> performing git merges. It will also format the commit message nicely
>>>> >>>> in a way that can be easily parsed later when writing credits.
>>>> >>>> Currently it is maintained in a public utility repository, but we'll
>>>> >>>> merge it into mainline Spark soon.
>>>> >>>>
>>>> >>>> [2]
>>>> >>>
>>>> https://github.com/pwendell/spark-utils/blob/master/apache_pr_merge.py
>>>> >>>>
>>>> >>>> == Tentative Release Window for 1.0.0 ==
>>>> >>>> Feb 1st - April 1st: General development
>>>> >>>> April 1st: Code freeze for new features
>>>> >>>> April 15th: RC1
>>>> >>>>
>>>> >>>> == Deviations ==
>>>> >>>> For now, the proposal is to consider these tentative guidelines. We
>>>> >>>> can vote to formalize these as project rules at a later time after
>>>> >>>> some experience working with them. Once formalized, any deviation to
>>>> >>>> these guidelines will be subject to a lazy majority vote.
>>>> >>>>
>>>> >>>> - Patrick
>>>> >>>
>>>>
>>>>
>>>
>>
--
--
Evan Chan
Staff Engineer
e...@ooyala.com |
I might have missed it earlier, but is anybody planning to present at
ApacheCon? I think it's in Denver this year, April 7-9.
Thinking of submitting a talk about how we use Spark and Cassandra.
-Evan
--
--
Evan Chan
Staff Engineer
e...@ooyala.com |
ot be recursive.
>
>
> regards,
> Andrew
>
--
--
Evan Chan
Staff Engineer
e...@ooyala.com |
> > > >
>> > > > > > > > >We¹ve been working on the transition to Apache for a while,
>> > and
>> > > > our
>> > > > > > last
>> > > > > > > > >shepherd¹s report says the following:
>> >
gt; >>> cover
> >>> >> most of the use cases. It is still great to include it to cover
> basic
> >>> >> checks such as 100-char wide lines.
> >>> >>
> >>> >>
> >>> >> On Wed, Jan 8, 2014 at 8:02 PM
lop an application level fair
> scheduler?
> >>
> >> Hi, All
> >>
> >> Is there any plan to develop an application level fair scheduler?
> >>
> >> I think it will have more value than a fair scheduler within the
> application (actually I d
ges/environment) to certain location when job
> finished
>
> 2. But it is not easy for user to review the job info with #1, we
> could build extra job history service for developers
>
> 3. But where will we build this history service? In Driver node or
> Master no
http://www.scala-graph.org/
Have you guys seen the above site? I wonder if this will ever be
merged into the Scala standard library, but might be interesting to
see if this fits into GraphX at all, or to add a Spark backend to it.
--
--
Evan Chan
Staff Engineer
e...@ooyala.com |
he basic pattern (and my proposed formatting standard) for folding over
> an `Option[A]` from which you need to produce a B (which may be Unit if
> you're only interested in side effects) is:
>
> anOption.fold
> {
> // something that evaluates to a B if anOp
ccess.createInstanceFor(DynamicAccess.scala:85)
> at akka.actor.ActorSystemImpl.(ActorSystem.scala:546)
> at akka.actor.ActorSystem$.apply(ActorSystem.scala:111)
> at akka.actor.ActorSystem$.apply(ActorSystem.scala:104)
> at org.apache.spark.util.AkkaUtils$.createActorSystem(AkkaUtils.scala:79)
&
group:
> >>>>> and also activity on the apache mailing list (which is a really
> horrible
> >>>>> experience!). Is it a firm policy on apache's front to disallow
> external
> >>>>> groups? I'm going to be ramping up on spark
r several
> >>>>> months. This branch is current with master and has been reviewed for
> >>>>> merging:
> >>>>>
> >>>>> https://github.com/apache/incubator-spark/tree/scala-2.10
> >>>>>
> >>>>> Scala
jetbrains.jps.cmdline.BuildSession.run(BuildSession.java:113)
> at
>
> org.jetbrains.jps.cmdline.BuildMain$MyMessageHandler$1.run(BuildMain.java:133)
> at
>
> org.jetbrains.jps.service.impl.SharedThreadPoolImpl$1.run(SharedThreadPoolImpl.java:41)
> at java.util.concurrent.Exec
be
> found
> > >>> at:
> > >>>>>>> http://people.apache.org/~pwendell/spark-0.8.1-incubating-rc4/
> > >>>>>>>
> > >>>>>>> Release artifacts are signed with the following key:
> > >>>>>>> https://people.apache.org/keys/committer/pwendell.asc
> > >>>>>>>
> > >>>>>>> The staging repository for this release can be found at:
> > >>>>>>>
> > >>>>
> > https://repository.apache.org/content/repositories/orgapachespark-040/
> > >>>>>>>
> > >>>>>>> The documentation corresponding to this release can be found at:
> > >>>>>>>
> > http://people.apache.org/~pwendell/spark-0.8.1-incubating-rc4-docs/
> > >>>>>>>
> > >>>>>>> For information about the contents of this release see:
> > >>>>>>>
> > >>>>>>
> > >>>>
> > >>>
> >
> https://git-wip-us.apache.org/repos/asf?p=incubator-spark.git;a=blob;f=CHANGES.txt;h=ce0aeab524505b63c7999e0371157ac2def6fe1c;hb=branch-0.8
> > >>>>>>>
> > >>>>>>> Please vote on releasing this package as Apache Spark
> > >>>> 0.8.1-incubating!
> > >>>>>>>
> > >>>>>>> The vote is open until Saturday, December 14th at 01:00 UTC and
> > >>>>>>> passes if a majority of at least 3 +1 PPMC votes are cast.
> > >>>>>>>
> > >>>>>>> [ ] +1 Release this package as Apache Spark 0.8.1-incubating
> > >>>>>>> [ ] -1 Do not release this package because ...
> > >>>>>>>
> > >>>>>>> To learn more about Apache Spark, please see
> > >>>>>>> http://spark.incubator.apache.org/
> > >>>>>>
> > >>>>>>
> > >>>>>
> > >>>>>
> > >>>>> --
> > >>>>> s
> > >>>>>
> > >>>>
> > >>>>
> > >>>>
> > >>>> --
> > >>>> s
> > >>>>
> > >>>
> > >>
> > >>
> > >>
> > >> --
> > >> s
> > >
> >
>
--
--
Evan Chan
Staff Engineer
e...@ooyala.com |
<http://www.ooyala.com/>
<http://www.facebook.com/ooyala><http://www.linkedin.com/company/ooyala><http://www.twitter.com/ooyala>
have received this e-mail in error)
> > please notify the sender immediately and delete this e-mail. Any
> > unauthorized copying, disclosure or distribution of the material in this
> > e-mail is strictly forbidden.
> >
> > Please refer to http://www.db.com/en/content/eu_disc
think?
(I've cc'ed two folks from Mesosphere, esp on setting up a test suite)
-Evan
--
--
Evan Chan
Staff Engineer
e...@ooyala.com |
n from disk?
>
> I believe I should look in HadoopRDD.scala where there is the
> getRecordReader, and the headers show that it should be
> in org.apache.hadoop.mapred.RecordReader, but I can't find that file
> anywhere.
>
> Any help would be appreciated.
>
> thanks!
probably a big pita to implement, and
> could likely not be as worthwhile as I naively think it would be.
>
> - Stephen
--
--
Evan Chan
Staff Engineer
e...@ooyala.com |
ming. The Scala 2.10 branch will be merged into
>> master soon.
>>
>> We’re very excited to have both Tom and Prashant join the project as
>> committers.
>>
>> The Apache Spark PPMC
>>
>>
--
--
Evan Chan
Staff Engineer
e...@ooyala.com |
it set or not, i believe) We could
potentially run a git bisect starting roughly 2-3 weeks ago.
On Sun, Nov 3, 2013 at 3:55 PM, Evan Chan wrote:
> Mark: I'm using JDK 1.6. --
>
> On Wed, Oct 30, 2013 at 1:44 PM, Mark Hamstra wrote:
>> What JDK version on you using, Evan?
>>
>
rverSuite, oddly enough.) Upgrading the JDK didn't
> improve the results of the ClearStory test suite I was looking at, so my
> misery isn't over; but yours might be with a newer JDK
>
>
>
> On Wed, Oct 30, 2013 at 12:44 PM, Evan Chan wrote:
>
>> Must be
er/201310.mbox/%3C64474308D680D540A4D8151B0F7C03F7025EF289%40SHSMSX104.ccr.corp.intel.com%3E
>
>
> On Wed, Oct 30, 2013 at 9:40 AM, Evan Chan wrote:
>
>> I'm at the latest
>>
>> commit f0e23a023ce1356bc0f04248605c48d4d08c2d05
>> Merge: aec9bf9 a197137
>> Author: Reynol
918)
at java.lang.Thread.run(Thread.java:680)
Anybody else seen this yet?
I have a really simple PR and this fails without my change, so I may
go ahead and submit it anyways.
--
--
Evan Chan
Staff Engineer
e...@ooyala.com |
emote server,
> and let the server invoke it. I don’t care much about securities. What i’m
> unable to do was how to get it serialized.
>
> Thanks in advance.
>
> --Bochun
>
>
--
--
Evan Chan
Staff Engineer
e...@ooyala.com |
<http://www.ooyala.com/>
<http://www
ith
>>> > supporting a new programming language on Spark. I can see that Spark
>>> > already support Java and Python. Would someone provide me some
>>> > suggestion/references to start with? I think this would be a great
>>> learning
>>> > experince for me. Thank you in advance.
>>> >
>>> > --
>>> > - Laksh Gupta
>>>
--
--
Evan Chan
Staff Engineer
e...@ooyala.com |
park itself -- there were some messages on this earlier. For now I'd
> just recommend doing it in a RAMFS if possible (symlink the assembly/target
> directory to be a RAMFS).
>
> Matei
>
> On Oct 9, 2013, at 12:45 AM, Evan Chan wrote:
>
> > Once you have compiled ever
ave run "sbt assembly" on the command line. However, this takes an
> impractically long time (843 s when I last ran it on my workstation with an
> Intel Core 2 Quad Q9400 and 8 GB of RAM). Is there any faster way?
>
> Best regards,
> Markus Losoi (markus.lo...@gmail.com)
>
>
I wanted them separate was that people might use Akka in
> their own application for other things. As such, the Spark Akka properties
> only affect Akka instances started by Spark. I think we should keep them
> separate to avoid messing with users' applications.
>
> Matei
>
>
gt; I think this is an old property that isn't used anymore, so it would be
> good to clean it up and get rid of it.
>
> Matei
>
> On Sep 28, 2013, at 6:23 PM, Evan Chan wrote:
>
> > Hey guys,
> >
> > Does anyone see a reason to keep the "spark.hostP
hich calls it, but _never_ uses the resulting output!)
- MapOutputTracker (which passes it as a value, not sure to where)
Just trying to clean house on properties.
thanks,
Evan
--
--
Evan Chan
Staff Engineer
e...@ooyala.com |
<http://www.ooyala.com/>
<http://www.facebo
nal Message-
> From: Evan Chan [mailto:e...@ooyala.com]
> Sent: Thursday, September 26, 2013 2:43 PM
> To: dev@spark.incubator.apache.org
> Subject: Re: Propose to Re-organize the scripts and configurations
>
> Shane, and others,
>
> Let's work together on the c
rk application. A Configuration instance can be
> >> de-/serialized
> >> > > > from/to a json formatted file.
> >> > > > 2. Each application (SparkContext) has one Configuration
> instance
> >> and
> >> > > it
> >> > > > is initialized by the application which creates it (either read
> >> from
> >> > > file
> >> > > > or passed from command line options or env SPARK_JAVA_OPTS).
> >> > > > 3. When launching an Executor on a node, the Configuration is
> >> firstly
> >> > > > initialized using the node-local configuration file as default.
> >> The
> >> > > > Configuration passed from application driver context will
> >> override any
> >> > > > options specified in default.
> >> > >
> >> > > This sounds great to me! The one thing I'll add is that we might
> want
> >> to
> >> > > prevent applications from overriding certain settings on each node,
> >> such as
> >> > > work directories. The best way is to probably just ignore the app's
> >> version
> >> > > of those settings in the Executor.
> >> > >
> >> > > If you guys would like, feel free to write up this design on
> >> SPARK-544 and
> >> > > start working on it. I think it looks good.
> >> > >
> >> > > Matei
> >> >
> >> >
> >> >
> >> >
> >> > --
> >> > *Shane Huang *
> >> > *Intel Asia-Pacific R&D Ltd.*
> >> > *Email: shengsheng.hu...@intel.com*
> >>
> >
> >
> >
> > --
> > *Shane Huang *
> > *Intel Asia-Pacific R&D Ltd.*
> > *Email: shengsheng.hu...@intel.com*
> >
> >
>
>
> --
> *Shane Huang *
> *Intel Asia-Pacific R&D Ltd.*
> *Email: shengsheng.hu...@intel.com*
>
--
--
Evan Chan
Staff Engineer
e...@ooyala.com |
<http://www.ooyala.com/>
<http://www.facebook.com/ooyala><http://www.linkedin.com/company/ooyala><http://www.twitter.com/ooyala>
must be set *before* initializing your SparkContext.
--
--
Evan Chan
Staff Engineer
e...@ooyala.com |
<http://www.ooyala.com/>
<http://www.facebook.com/ooyala><http://www.linkedin.com/company/ooyala><http://www.twitter.com/ooyala>
node N. In
> the future, when Spark asks Tachyon the location of X, Tachyon will return
> node N. There is no network I/O involved in the whole process. Let me know
> if I misunderstood something.
>
> Haoyuan
>
>
> On Fri, Aug 30, 2013 at 10:00 AM, Evan Chan wrote:
>
> &
> >> 5. Create an apache FOAF file with the key signature
> >>
> >> However, this doesn't seem sufficient to get my key on this page (at
> >> least, not yet):
> >> http://people.apache.org/keys/group/spark.asc
> >>
> >> Chris - are there other steps I missed? Is there a manual way to
> >> augment this file?
> >>
> >> - Patrick
> >>
>
--
--
Evan Chan
Staff Engineer
e...@ooyala.com |
<http://www.ooyala.com/>
<http://www.facebook.com/ooyala><http://www.linkedin.com/company/ooyala><http://www.twitter.com/ooyala>
do the PRs against the GitHub repo for now.
>
> Matei
>
> On Sep 3, 2013, at 4:57 PM, Evan Chan wrote:
>
>> Sorry one more clarification.
>>
>> For doc pull requests for 0.8 release, should these be done against
>> the existing mesos/spark repo, or against th
s also going to document public methods in SparkContext that have
not been documented before, such as getPersistentRdds,
getExecutorStatus etc. Some folks on my team don't realize that such
methods existed as they were not in the doc.
-Evan
>
> Matei
>
> On Sep 3, 2013, at 4:
bout
>> >> signing.
>> >>
>> >>> Are we "locking" pull requests to github repo by tomorrow?
>> >>> Meaning no more push to GitHub repo for Spark.
>> >>>
>> >>> From your email seems like there will be more potential pull requests
>> for
>> >>> github repo to be merged back to ASF Git repo.
>> >>
>> >> We'll probably use the GitHub repo for the last few changes in this
>> >> release and then switch. The reason is that there's a bit of work to do
>> >> pull requests against the Apache one.
>> >>
>> >> Matei
>>
>>
--
--
Evan Chan
Staff Engineer
e...@ooyala.com |
Cross post thank you.
On Fri, Aug 30, 2013 at 3:44 PM, Evan Chan wrote:
> Wanted to take a moment to say a big THANK YOU to the AMP Lab guys for
> organizing an awesome AMP Camp 2013!!!
> Very well organized. The EC2 clusters for trying things out was especially
> great.
>
>
Hey guys,
What is the schedule for the 0.8 release?
In general, will the dev community be notified of code freeze, testing
deadlines, doc deadlines, etc.?
I'm specifically looking to know when is the deadline for submitting doc
pull requests. :)
thanks,
Evan
--
--
Evan Chan
Staff Eng
> >
> > >> >
> > >> >
> > >> > I also have two related questions:
> > >> >
> > >> > 1) Can JVM’s heap use virtual memory or just use physical
> > memory?
> > >> >
> > >> > 2) Can direct memory use virtual memory or just use physical
> > >> memory?
> > >> >
> > >> >
> > >> >
> > >> >
> > >> > On Mon, Aug 26, 2013 at 8:06 AM, Haoyuan Li
> > >> wrote:
> > >> >
> > >> >> Hi Imran,
> > >> >>
> > >> >> One possible solution is that you can use
> > >> >> Tachyon<https://github.com/amplab/tachyon>.
> > >> >> When data is in Tachyon, Spark jobs will read it from off-heap
> > memory.
> > >> >> Internally, it uses direct byte buffers to store memory-serialized
> > RDDs
> > >> as
> > >> >> you mentioned. Also, different Spark jobs can share the same data
> in
> > >> >> Tachyon's memory. Here is a presentation
> > >> >> (slide<
> > >> >>
> > >>
> >
> https://docs.google.com/viewer?url=http%3A%2F%2Ffiles.meetup.com%2F3138542%2FTachyon_2013-05-09_Spark_Meetup.pdf
> > >> >> >)
> > >> >> we did in May.
> > >> >>
> > >> >> Haoyuan
> > >> >>
> > >> >>
> > >> >> On Sun, Aug 25, 2013 at 3:26 PM, Imran Rashid <
> im...@therashids.com>
> > >> >> wrote:
> > >> >>
> > >> >> > Hi,
> > >> >> >
> > >> >> > I was wondering if anyone has thought about putting cached data
> in
> > an
> > >> >> > RDD into off-heap memory, eg. w/ direct byte buffers. For really
> > >> >> > long-lived RDDs that use a lot of memory, this seems like a huge
> > >> >> > improvement, since all the memory is now totally ignored during
> GC.
> > >> >> > (and reading data from direct byte buffers is potentially faster
> as
> > >> >> > well, buts thats just a nice bonus).
> > >> >> >
> > >> >> > The easiest thing to do is to store memory-serialized RDDs in
> > direct
> > >> >> > byte buffers, but I guess we could also store the serialized RDD
> on
> > >> >> > disk and use a memory mapped file. Serializing into off-heap
> > buffers
> > >> >> > is a really simple patch, I just changed a few lines (I haven't
> > done
> > >> >> > any real tests w/ it yet, though). But I dont' really have a ton
> > of
> > >> >> > experience w/ off-heap memory, so I thought I would ask what
> others
> > >> >> > think of the idea, if it makes sense or if there are any gotchas
> I
> > >> >> > should be aware of, etc.
> > >> >> >
> > >> >> > thanks,
> > >> >> > Imran
> > >> >> >
> > >> >>
> > >>
> >
>
--
--
Evan Chan
Staff Engineer
e...@ooyala.com |
<http://www.ooyala.com/>
<http://www.facebook.com/ooyala><http://www.linkedin.com/company/ooyala><http://www.twitter.com/ooyala>
When is the upgrade to 2.10 planned?
thx,
Evan
eciate it.
>
> Thanks,
> Grega
>
> On Fri, Aug 9, 2013 at 10:00 AM, Evan Chan wrote:
>
> > Hey Patrick,
> >
> > A while back I posted an SBT recipe allowing users to build Scala job
> > assemblies that excluded Spark and its deps, which is what most people
>
Matei,
How about documentation updates, such as for the binary distribution?
On Friday, August 9, 2013 10:42:15 AM UTC-7, Matei Zaharia wrote:
>
> Hi folks,
>
> In order to make the 0.8 release soon, I've created a new branch for it,
> on which we'll merge only bug fixes and a few of the new de
r maven exec with some way to pass the correct
> environment variables?
> - Do people use a modified version of spark's own `run` script?
> - Do you have some other way of submitting jobs?
>
> Any notes would be helpful in compiling this!
>
> https://spark-project.atlassi
mainstream Scala
> projects.
>
>
> On Tue, Jul 16, 2013 at 4:19 PM, Evan Chan wrote:
>
>> If the APIs for those libraries such as Akka stay the same, you don't
>> need different branches. In SBT you can easily support two different sets
>> of deps dependin
51 matches
Mail list logo