OK, thank you, Gourav. I didn't realize that Spark works with numerical
formats only by design.
What I am trying to achieve is rather straight-forward: Evaluate a
trained model using the standard metrics provided by
MulticlassClassificationEvaluator. Since this isn't possible for text
Hi Martin,
okay, so you will ofcourse need to translate the NER string output to a
numerical format as you would do with any text data before feeding it to
SPARK ML. Please read SPARK ML documentation on this. I think that they are
quite clear on how to do that.
But more importantly please try to
Hi Gourav,
Mostly correct. The output of SparNLP here is a trained
pipeline/model/transformer. I am feeding this trained pipeline to the
MulticlassClassificationEvaluator for evaluation and this
MulticlassClassificationEvaluator only accepts floats or doubles are the
labels (instead of NER
Hi Martin,
just to confirm, you are taking the output of SPARKNLP, and then trying to
feed it to SPARK ML for running algorithms on the output of NERgenerated by
SPARKNLP right?
Regards,
Gourav Sengupta
On Thu, Nov 11, 2021 at 8:00 AM wrote:
> Hi Sean,
>
> Apologies for the delayed reply.
Hi Sean,
Apologies for the delayed reply. I've been away on vacation and then
busy catching up afterwards.
Regarding the evalution using MulticlassClassificationEvaluator: This is
a about a sequence labeling task to identify specific non-standard named
entities. The training and evaluation
I don't think the question is representation as double. The question is how
this output represents a label? This looks like the result of an annotator.
What are you classifying? you need, first, ground truth and prediction
somewhere to use any utility to assess classification metrics.
On Mon, Oct
Hello,
I am using SparkNLP to do some NER. The result datastructure after
training and classification is a Dataset, with one column each for
labels and predictions. For evaluating the model, I would like to use
the Spark ML class