Hello,

I am using SparkNLP to do some NER. The result datastructure after training and classification is a Dataset<Row>, with one column each for labels and predictions. For evaluating the model, I would like to use the Spark ML class org.apache.spark.ml.evaluation.MulticlassClassificationEvaluator. However, this evaluator expects labels as double numbers. In the case of an NER task, the results in my case are of type array<struct<annotatorType:string,begin:int,end:int,result:string,metadata:map<string,string>,embeddings:array<float>>>.

It would be possible, of course, to convert this format to the required doubles. But is there a way to easily apply MulticlassClassificationEvaluator to the NER task or is there maybe a better evaluator? I haven't found anything yet (neither in Spark ML nor in SparkNLP).

Thanks a lot.

Cheers,

Martin

Reply via email to