Hello, This is wrt https://github.com/apache/spark/blob/master/mllib/src/main/scala/org/apache/spark/ml/feature/CountVectorizer.scala#L244
require(vocab.length > 0, "The vocabulary size should be > 0. Lower minDF as necessary.") Currently, if `CountVectorizer` is trained on an empty dataset results in the following exception. But it is perfectly valid use case to send it empty data (or if minDF filters everything). HashingTF works fine in such scenarios. CountVectorizer doesn't. Can we remove this constraint? Happy to send a pull-request java.lang.IllegalArgumentException: requirement failed: The vocabulary size should be > 0. Lower minDF as necessary. at scala.Predef$.require(Predef.scala:224) at org.apache.spark.ml.feature.CountVectorizer.fit(CountVectorizer.scala:236) at org.apache.spark.ml.feature.CountVectorizer.fit(CountVectorizer.scala:149) at org.apache.spark.ml.Pipeline$$anonfun$fit$2.apply(Pipeline.scala:153) at org.apache.spark.ml.Pipeline$$anonfun$fit$2.apply(Pipeline.scala:149) at scala.collection.Iterator$class.foreach(Iterator.scala:891) at scala.collection.AbstractIterator.foreach(Iterator.scala:1334)