Re: Feature (?): Setting custom parameters for a Spark MLlib pipeline

2021-10-25 Thread Sean Owen
You can write a custom Transformer or Estimator? On Mon, Oct 25, 2021 at 7:37 AM Sonal Goyal wrote: > Hi Martin, > > Agree, if you don't need the other features of MLFlow then it is likely > overkill. > > Cheers, > Sonal > https://github.com/zinggAI/zingg > > > > On Mon, Oct 25, 2021 at 4:06 PM

Re: Feature (?): Setting custom parameters for a Spark MLlib pipeline

2021-10-25 Thread Sonal Goyal
Hi Martin, Agree, if you don't need the other features of MLFlow then it is likely overkill. Cheers, Sonal https://github.com/zinggAI/zingg On Mon, Oct 25, 2021 at 4:06 PM wrote: > Hi Sonal, > > Thanks a lot for this suggestion. I presume it might indeed be possible to > use MLFlow for this

Re: Using MulticlassClassificationEvaluator for NER evaluation

2021-10-25 Thread Sean Owen
I don't think the question is representation as double. The question is how this output represents a label? This looks like the result of an annotator. What are you classifying? you need, first, ground truth and prediction somewhere to use any utility to assess classification metrics. On Mon, Oct

Using MulticlassClassificationEvaluator for NER evaluation

2021-10-25 Thread martin
Hello, I am using SparkNLP to do some NER. The result datastructure after training and classification is a Dataset, with one column each for labels and predictions. For evaluating the model, I would like to use the Spark ML class

Re: Feature (?): Setting custom parameters for a Spark MLlib pipeline

2021-10-25 Thread martin
Hi Sonal, Thanks a lot for this suggestion. I presume it might indeed be possible to use MLFlow for this purpose, but at present it seems a bit too much to introduce another framework only for storing arbitrary meta-data with trained ML pipelines. I was hoping there might be a way to do this

[ANNOUNCE] Release Apache Kyuubi (Incubating) 1.3.1-incubating

2021-10-25 Thread XiDuo You
Hi Spark Community, The Apache Kyuubi (Incubating) community is pleased to announce that Apache Kyuubi (Incubating) 1.3.1-incubating has been released! Apache Kyuubi (Incubating) is a distributed multi-tenant JDBC server for large-scale data processing and analytics, built on top of Apache

SQLMetric & MetricsSystem

2021-10-25 Thread wu shaoj
Hi, folks, I’m creating a new sink extended org.apache.spark.metrics.sink.Sink referring to https://spark.apache.org/docs/latest/monitoring.html But I find that there are not any query plan metrics at all. SQLMetric is used in a SQL query plan, So my questions are 1. how can I get the query