MLlib mission and goals

2017-01-23 Thread Joseph Bradley
This thread is split off from the "Feedback on MLlib roadmap process proposal" thread for discussing the high-level mission and goals for MLlib. I hope this thread will collect feedback and ideas, not necessarily lead to huge decisions. Copying from the previous thread: *Seth:* """ I would love

Re: MLlib mission and goals

2017-01-23 Thread Stephen Boesch
Along the lines of #1: the spark packages seemed to have had a good start about two years ago: but now there are not more than a handful in general use - e.g. databricks CSV. When the available packages are browsed the majority are incomplete, empty, unmaintained, or unclear. Any ideas on how to

Re: MLlib mission and goals

2017-01-24 Thread Sean Owen
My $0.02, which shouldn't be weighted too much. I believe the mission as of Spark ML has been to provide the framework, and then implementation of 'the basics' only. It should have the tools that cover ~80% of use cases, out of the box, in a pretty well-supported and tested way. It's not a goal t

Re: MLlib mission and goals

2017-01-24 Thread Jörn Franke
I also agree with Joseph and Sean. With respect to spark-packages. I think the issue is that you have to manually add it, although it basically fetches the package from Maven Central (or custom upload). From an organizational perspective there are other issues. E.g. You have to download it from

Re: MLlib mission and goals

2017-01-24 Thread Stephen Boesch
re: spark-packages.org and "Would these really be better in the core project?" That was not at all the intent of my input: instead to ask "how and where to structure/place deployment quality code that yet were *not* part of the distribution?" The spark packages has no curation whatsoever : no

Re: MLlib mission and goals

2017-01-24 Thread Miao Wang
I started working on ML/MLLIB/R since last year. Here are some of my thoughts from a beginner's perspective:   Current ML/MLLIB core algorithms can serve as good implementation examples, which makes adding new algorithms easier. Even a beginner like me, can pick it up quickly and learn how to add n

Re: MLlib mission and goals

2017-01-24 Thread Asher Krim
t; Another related area is SparkR. API Parity between SparkR and ML/MLLIB is > important. We should also pay attention to R users' habits and experiences > when maintaining API parity. > > Miao > > > - Original message ----- > From: Stephen Boesch > To: Sean

Re: MLlib mission and goals

2017-01-24 Thread Saikat Kanjilal
@spark.apache.org; Sean Owen Subject: Re: MLlib mission and goals On the topic of usability, I think more effort should be put into large scale testing. We've encountered issues with building large models that are not apparent in small models, and these issues have made productizing ML/MLLIB

Re: MLlib mission and goals

2017-01-24 Thread bradc
nsity-1993.pdf> John McCalpin. 213876927_Memory_Bandwidth_and_Machine_Balance_in_Current_High_Performance_Computers 1995 <https://www.researchgate.net/publication/213876927_Memory_Bandwidth_and_Machine_Balance_in_Current_High_Performance_Computers> -- View this message in context: http://apache-spark-developers-list.1001

Re: MLlib mission and goals

2017-01-24 Thread Joseph Bradley
puting'93, November 1993. > <https://blogs.oracle.com/BestPerf/resource/Carlile-app_compute-intensity-1993.pdf> > > John McCalpin. 213876927_Memory_Bandwidth_and_Machine_Balance_in_ > Current_High_Performance_Computers 1995 > <https://www.researchgate.net/publication/213876927_Memory_B

Re: MLlib mission and goals

2017-01-31 Thread Seth Hendrickson
racle.com/BestPerf/entry/improving_algorithms >> _in_spark_ml >> Background: >> >> Brad Carlile. Parallelism, compute intensity, and data vectorization. >> SuperComputing'93, November 1993. >> <https://blogs.oracle.com/BestPerf/resource/Carlile-app_compute-