Hi, I'm working on the implementation of a semi-supervised algorithm in Spark and I want it to implement the interfaces provided by MLlib, so that it can use things like model selection.
My problem is that, as far as I can tell, the provided interfaces are meant for supervised algorithms (for example, they assume all the training data is labeled). The other problem is that this method is transductive, so it would receive a dataframe with features and label columns, and the label column would be mostly null, and the algorithm would just fill the non-null entries. What I mean with this is that a `fit` stage doesn't really make sense. But if I want to do model selection, I need to have an Estimator with configurable parameters. Is anyone aware of some work already done in Spark with this characteristics? Are there plans to support this kind of algorithms in the future? Thanks. -- Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/ --------------------------------------------------------------------- To unsubscribe e-mail: [email protected]
