As far as I can understand, your requirements are pretty straight forward and doable with just simple SQL queries. Take a look at Spark SQL on spark documentation.
Prashant Sharma On Tue, Apr 12, 2016 at 8:13 PM, Joe San <codeintheo...@gmail.com> wrote: > up vote > down votefavorite > <http://datascience.stackexchange.com/questions/11167/algorithm-suggestion-for-a-specific-problem/11174?noredirect=1#> > > I'm working on a problem where in I have some data sets about some power > generating units. Each of these units have been activated to run in the > past and while activation, some units went into some issues. I now have all > these data and I would like to come up with some sort of Ranking for these > generating units. The criteria for ranking would be pretty simple to start > with. They are: > > 1. Maximum number of times a particular generating unit was activated > 2. How many times did the generating unit ran into problems during > activation > > Later on I would expand on this ranking algorithm by adding more criteria. > I will be using Apache Spark MLIB library and I can already see that there > are quite a few algorithms already in place. > > http://spark.apache.org/docs/latest/mllib-guide.html > > I'm just not sure which algorithm would fit my purpose. Any suggestions? >