Hi Jean, DataFrame is connected with SQLContext which is connected with SparkContext, so I think it's impossible to run `model.transform` without touching Spark. I think what you need is model should support prediction on single instance, then you can make prediction without Spark. You can track the progress of https://issues.apache.org/jira/browse/SPARK-10413.
Thanks Yanbo 2016-02-27 8:52 GMT+08:00 Eugene Morozov <evgeny.a.moro...@gmail.com>: > Hi everyone. > > I have a requirement to run prediction for random forest model locally on > a web-service without touching spark at all in some specific cases. I've > achieved that with previous mllib API (java 8 syntax): > > public List<Tuple2<Double, Double>> predictLocally(RandomForestModel > model, List<LabeledPoint> data) { > return data.stream() > .map(point -> new > Tuple2<>(model.predict(point.features()), point.label())) > .collect(Collectors.toList()); > } > > So I have instance of trained model and can use it any way I want. > The question is whether it's possible to run this on the driver itself > with the following: > DataFrame predictions = model.transform(test); > because AFAIU test has to be a DataFrame, which means it's going to be run > on the cluster. > > The use case to run it on driver is very small amount of data for > prediction - much faster to handle it this way, than using spark cluster. > Thank you. > -- > Be well! > Jean Morozov >