Hi Shane, I've successfully used :
import org.apache.spark.ml.classification.{ RandomForestClassificationModel, RandomForestClassifier } with pio. You can access feature importance through the RandomForestClassifier also. Very simple to convert RDDs to DFs as Pat mentioned, something like: val RDD_2_DF = sqlContext.createDataFrame(yourRDD).toDF("col1", "col2") On Thu, 4 Jan 2018 at 23:10 Pat Ferrel <p...@occamsmachete.com> wrote: > Actually there are libs that will read DFs from HBase > https://svn.apache.org/repos/asf/hbase/hbase.apache.org/trunk/_chapters/spark.html > > This is out of band with PIO and should not be used IMO because the schema > of the EventStore is not guaranteed to remain as-is. The safest way is to > translate or get DFs integrated to PIO. I think there is an existing Jira > that request Spark ML support, which assumes DFs. > > > On Jan 4, 2018, at 12:25 PM, Pat Ferrel <p...@occamsmachete.com> wrote: > > Funny you should ask this. Yes, we are working on a DF based Universal > Recommender but you have to convert the RDD into a DF since PIO does not > read out data in the form of a DF (yet). This is a fairly simple step of > maybe one line of code but would be better supported in PIO itself. The > issue is that the EventStore uses libs that may not read out DFs, but RDDs. > This is certainly the case with Elasticsearch, which provides an RDD lib. I > haven’t seen one from them that read out DFs though it would make a lot of > sense for ES especially. > > So TLDR; yes, just convert the RDD into a DF for now. > > Also please add a feature request as a PIO Jira ticket to look into this. > I for one would +1 > > > On Jan 4, 2018, at 11:55 AM, Shane Johnson <shanewaldenjohn...@gmail.com> > wrote: > > Hello group, Happy new year! Does anyone have a working example or > template using the DataFrame API vs. the RDD based APIs. We are wanting to > migrate to using the new DataFrame APIs to take advantage of the *Feature > Importance* function for our Regression Random Forest Models. > > We are wanting to move from > > import org.apache.spark.mllib.tree.RandomForestimport > org.apache.spark.mllib.tree.model.RandomForestModelimport > org.apache.spark.mllib.util.MLUtils > > to > > import org.apache.spark.ml.regression.{RandomForestRegressionModel, > RandomForestRegressor} > > > Is this something that should be fairly straightforward by adjusting > parameters and calling new classes within DASE or is it much more involved > development. > > Thank You! > > *Shane Johnson | 801.360.3350 <(801)%20360-3350>* > LinkedIn <https://www.linkedin.com/in/shanewjohnson> | Facebook > <https://www.facebook.com/shane.johnson.71653> > > >