Re: IMPORTANT: Spark mailing lists moving to Apache by September 1st

2013-12-19 Thread Matei Zaharia
Yes, I agree that we should close down the existing Google group on Jan 1st. While it’s more convenient to use, it’s created confusion. I hope that we can get the ASF to support better search interfaces in the future too. I think we just have to drive this from within. The Google Group should b

Re: IMPORTANT: Spark mailing lists moving to Apache by September 1st

2013-12-19 Thread Andy Konwinski
I've set up two new unofficial google groups to mirror the Apache Spark user and dev lists: https://groups.google.com/forum/#!forum/apache-spark-dev-mirror https://groups.google.com/forum/#!forum/apache-spark-user-mirror Basically these lists each subscribe to the corresponding Apache list. They

Re: IMPORTANT: Spark mailing lists moving to Apache by September 1st

2013-12-19 Thread Ted Yu
You may have noticed that the counter of searchable items for last 7 days on search-Hadoop is 0 and the counter for last 30 days is declining quickly. Cheers On Dec 19, 2013, at 10:10 PM, Nick Pentreath wrote: > One option that is 3rd party that works nicely for the Hadoop project and > it's

Re: Spark development for undergraduate project

2013-12-19 Thread Nick Pentreath
Another option would be: 1. Add another recommendation model based on mrec's sgd based model: https://github.com/mendeley/mrec 2. Look at the streaming K-means from Mahout and see if that might be integrated or adapted into MLlib 3. Work on adding to or refactoring the existing linear model frame

Re: IMPORTANT: Spark mailing lists moving to Apache by September 1st

2013-12-19 Thread Nick Pentreath
One option that is 3rd party that works nicely for the Hadoop project and it's related projects is http://search-hadoop.com - managed by sematext. Perhaps we can plead with Otis to add Spark lists to search-spark.com, or the existing site? Just throwing it out there as a potential solution to a

Re: IMPORTANT: Spark mailing lists moving to Apache by September 1st

2013-12-19 Thread Aaron Davidson
I'd be fine with one-way mirrors here (Apache threads being reflected in Google groups) -- I have no idea how one is supposed to navigate the Apache list to look for historic threads. On Thu, Dec 19, 2013 at 7:58 PM, Mike Potts wrote: > Thanks very much for the prompt and comprehensive reply!

Re: IMPORTANT: Spark mailing lists moving to Apache by September 1st

2013-12-19 Thread Mike Potts
Thanks very much for the prompt and comprehensive reply! I appreciate the overarching desire to integrate with apache: I'm very happy to hear that there's a move to use the existing groups as mirrors: that will overcome all of my objections: particularly if it's bidirectional! :) On Thursday,

Re: IMPORTANT: Spark mailing lists moving to Apache by September 1st

2013-12-19 Thread Andy Konwinski
Hey Mike, As you probably noticed when you CC'd spark-develop...@googlegroups.com, that list has already be reconfigured so that it no longer allows posting (and bounces emails sent to it). We will be doing the same thing to the spark-us...@googlegroups.com list too (we'll announce a date for tha

Re: Spark 0.8.1 Released

2013-12-19 Thread Matei Zaharia
Thanks Patrick for coordinating this release! Matei On Dec 19, 2013, at 5:15 PM, Patrick Wendell wrote: > Hi everyone, > > We've just posted Spark 0.8.1, a new maintenance release that contains > some bug fixes and improvements to the 0.8 branch. The full release > notes are available at [1].

Re: Spark development for undergraduate project

2013-12-19 Thread Tathagata Das
+1 to that (assuming by 'online' Andrew meant MLLib algorithm from Spark Streaming) Something you can look into is implementing a streaming KMeans. Maybe you can re-use a lot of the offline KMeans code in MLLib. TD On Thu, Dec 19, 2013 at 5:33 PM, Andrew Ash wrote: > Sounds like a great choic

Re: Spark development for undergraduate project

2013-12-19 Thread Andrew Ash
Sounds like a great choice. It would be particularly impressive if you could add the first online learning algorithm (all the current ones are offline I believe) to pave the way for future contributions. On Thu, Dec 19, 2013 at 8:27 PM, Matthew Cheah wrote: > Thanks a lot everyone! I'm looking

Re: Spark development for undergraduate project

2013-12-19 Thread Matthew Cheah
Thanks a lot everyone! I'm looking into adding an algorithm to MLib for the project. Nice and self-contained. -Matt Cheah On Thu, Dec 19, 2013 at 12:52 PM, Christopher Nguyen wrote: > +1 to most of Andrew's suggestions here, and while we're in that > neighborhood, how about generalizing someth

Spark 0.8.1 Released

2013-12-19 Thread Patrick Wendell
Hi everyone, We've just posted Spark 0.8.1, a new maintenance release that contains some bug fixes and improvements to the 0.8 branch. The full release notes are available at [1]. Apart from various bug fixes, 0.8.1 includes support for YARN 2.2, a high availability mode for the standalone schedul

Re: How to contribute to the spark project

2013-12-19 Thread Azuryy Yu
Hi Gill, please read here: https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark On Dec 20, 2013 5:39 AM, "Gill" wrote: > Hi, > > I attended the spark summit and have been curios to know that how can I > contribute to the spark project. I'm working on query engine optimization

How to contribute to the spark project

2013-12-19 Thread Gill
Hi, I attended the spark summit and have been curios to know that how can I contribute to the spark project. I'm working on query engine optimizations so can help with spark query engine optimizations or with other query engine features. Thanks Gurbir 510 410 5108

Re: Spark development for undergraduate project

2013-12-19 Thread Christopher Nguyen
+1 to most of Andrew's suggestions here, and while we're in that neighborhood, how about generalizing something like "wtf-spark" (from the Bizo team (http://youtu.be/6Sn1xs5DN1Y?t=38m36s)? It may not be of high academic interest, but it's something people would use many times a debugging day. Or a

Re: Spark development for undergraduate project

2013-12-19 Thread Andrew Ash
Wow yes, that PR#230 looks like exactly what I outlined in #2! I'll leave some comments on there. Anything going on for service reliability (#3) since apparently someone is reading my mind? On Thu, Dec 19, 2013 at 2:02 PM, Nick Pentreath wrote: > Some good things to look at though hopefully #2

Re: Spark development for undergraduate project

2013-12-19 Thread Nick Pentreath
Some good things to look at though hopefully #2 will be largely addressed by:  https://github.com/apache/incubator-spark/pull/230— Sent from Mailbox for iPhone On Thu, Dec 19, 2013 at 8:57 PM, Andrew Ash wrote: > I think there are also some improvements that could be made to > deployability in a

Re: [PySpark]: reading arbitrary Hadoop InputFormats

2013-12-19 Thread Nick Pentreath
Hi I managed to find the time to put together a PR on this:  https://github.com/apache/incubator-spark/pull/263 Josh has had a look over it - if anyone else with an interest could give some feedback that would be great. As mentioned in the PR it's more of an RFC and certainly still needs

Re: Spark development for undergraduate project

2013-12-19 Thread Andrew Ash
I think there are also some improvements that could be made to deployability in an enterprise setting. From my experience: 1. Most places I deploy Spark in don't have internet access. So I can't build from source, compile against a different version of Hadoop, etc without doing it locally and th

Re: Spark development for undergraduate project

2013-12-19 Thread Nick Pentreath
Or if you're extremely ambitious work in implementing Spark Streaming in Python— Sent from Mailbox for iPhone On Thu, Dec 19, 2013 at 8:30 PM, Matei Zaharia wrote: > Hi Matt, > If you want to get started looking at Spark, I recommend the following > resources: > - Our issue tracker at http://sp

Re: View bound deprecation (Scala 2.11+)

2013-12-19 Thread Matei Zaharia
We can open a JIRA but let’s wait to see what the Scala guys decide. I’m sure they’ll recommend some alternatives. Matei On Dec 19, 2013, at 9:27 AM, Marek Kolodziej wrote: > All, > > Apparently view bounds will be deprecated going forward. Hopefully they'll > be around for a while after depr

Re: Spark development for undergraduate project

2013-12-19 Thread Matei Zaharia
Hi Matt, If you want to get started looking at Spark, I recommend the following resources: - Our issue tracker at http://spark-project.atlassian.net contains some issues marked “Starter” that are good places to jump into. You might be able to take one of those and extend it into a bigger proje

View bound deprecation (Scala 2.11+)

2013-12-19 Thread Marek Kolodziej
All, Apparently view bounds will be deprecated going forward. Hopefully they'll be around for a while after deprecation, but I wanted to raise this issue for consideration. Here's the SIP: https://issues.scala-lang.org/browse/SI-7629 Shall I file a Jira for that? Thanks! Marek