There are several high bars to getting a new algorithm adopted.

*  It needs to be deemed by the MLLib committers/shepherds as widely useful
to the community.  Algorithms offered by larger companies after having
demonstrated usefulness at scale for   use cases  likely to be encountered
by many other companies stand a better chance
* There is quite limited bandwidth for consideration of new algorithms:
there has been a dearth of new ones accepted since early 2015 . So
prioritization is a challenge.
* The code must demonstrate high quality standards especially wrt
testability, maintainability, computational performance, and scalability.
* The chosen algorithms and options should be well documented and include
comparisons/ tradeoffs with state of the art described in relevant papers.
These questions will typically be asked during design/code reviews - i.e.
did you consider the approach shown *here *
* There is also luck and timing involved. The review process might start in
a given month A but reviewers become busy or higher priorities intervene ..
and then when will the reviewing continue..
* At the point that the above are complete then there are intricacies with
integrating with a particular Spark release

Am Mo., 5. Aug. 2019 um 05:58 Uhr schrieb chagas <cha...@gta.ufrj.br>:

> Hi,
>
> After searching the machine learning library for streaming algorithms, I
> found two that fit the criteria: Streaming Linear Regression
> (
> https://spark.apache.org/docs/latest/mllib-linear-methods.html#streaming-linear-regression)
>
> and Streaming K-Means
> (
> https://spark.apache.org/docs/latest/mllib-clustering.html#streaming-k-means
> ).
>
> However, both use the RDD-based API MLlib instead of the DataFrame-based
> API ML; are there any plans for bringing them both to ML?
>
> Also, is there any technical reason why there are so few incremental
> algorithms on the machine learning library? There's only 1 algorithm for
> regression and clustering each, with nothing for classification,
> dimensionality reduction or feature extraction.
>
> If there is a reason, how were those two algorithms implemented? If
> there isn't, what is the general consensus on adding new online machine
> learning algorithms?
>
> Regards,
> Lucas Chagas
>
> ---------------------------------------------------------------------
> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
>
>

Reply via email to