ML classifier and data format for dataset with variable number of features

SK Fri, 11 Jul 2014 17:13:28 -0700

Hi,

I need to perform binary classification on an image dataset. Each image is a
data point described by a Json object. The feature set for each image is a
set of feature vectors, each feature vector corresponding to a distinct
object in the image. For example, if an image has 5 objects, its feature set
will have 5 feature vectors, whereas an image that has 3 objects will have a
feature set consisting of 3 feature vectors. So  the number of feature
vectors  may be different for different images, although  each feature
vector has the same number of attributes. The classification depends on the
features of the individual objects, so I cannot aggregate them all into a
flat vector.


I have looked through the Mllib examples and it appears that the libSVM data
format and the LabeledData format that Mllib uses, require  all the points
to have the same number of features and they read in a flat feature vector.
I would like to know if any of the Mllib supervised learning classifiers can
be used with json data format and whether they can be used to classify
points with different number of features as described above.

thanks
 



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/ML-classifier-and-data-format-for-dataset-with-variable-number-of-features-tp9486.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

ML classifier and data format for dataset with variable number of features

Reply via email to