Hello,
I am using SparkNLP to do some NER. The result datastructure after
training and classification is a Dataset<Row>, with one column each for
labels and predictions. For evaluating the model, I would like to use
the Spark ML class
org.apache.spark.ml.evaluation.MulticlassClassificationEvaluator.
However, this evaluator expects labels as double numbers. In the case of
an NER task, the results in my case are of type
array<struct<annotatorType:string,begin:int,end:int,result:string,metadata:map<string,string>,embeddings:array<float>>>.
It would be possible, of course, to convert this format to the required
doubles. But is there a way to easily apply
MulticlassClassificationEvaluator to the NER task or is there maybe a
better evaluator? I haven't found anything yet (neither in Spark ML nor
in SparkNLP).
Thanks a lot.
Cheers,
Martin