Hi, I need to perform binary classification on an image dataset. Each image is a data point described by a Json object. The feature set for each image is a set of feature vectors, each feature vector corresponding to a distinct object in the image. For example, if an image has 5 objects, its feature set will have 5 feature vectors, whereas an image that has 3 objects will have a feature set consisting of 3 feature vectors. So the number of feature vectors may be different for different images, although each feature vector has the same number of attributes. The classification depends on the features of the individual objects, so I cannot aggregate them all into a flat vector.
I have looked through the Mllib examples and it appears that the libSVM data format and the LabeledData format that Mllib uses, require all the points to have the same number of features and they read in a flat feature vector. I would like to know if any of the Mllib supervised learning classifiers can be used with json data format and whether they can be used to classify points with different number of features as described above. thanks -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/ML-classifier-and-data-format-for-dataset-with-variable-number-of-features-tp9486.html Sent from the Apache Spark User List mailing list archive at Nabble.com.