Hi Marc,
I'm not much familiar with Spark on Java, but according to the doc
<https://spark.apache.org/docs/latest/sql-getting-started.html#creating-datasets>,
it should be:
Encoder<String> stringEncoder = Encoders.STRING();
dataset.as(stringEncoder);
For the record, it is much simpler in Scala:
dataset.as[String]
Of course, this will work if your DataFrame only contains one column of type
String, e.g.:
val df = spark.read.parquet("Cyrano_de_Bergerac_Acte_V.parquet")
df.printSchema
root
|-- line: string (nullable = true)
df.as[String]
Otherwise, you will have to convert somehow the Row to a String, e.g. in Scala:
case class Data(f1: String, f2: Int, f3: Long)
val df = Seq(Data("a", 1, 1L), Data("b", 2, 2L), Data("c", 3, 3L), Data("d", 4,
4L), Data("e", 5, 5L)).toDF
val ds = df.map(_.mkString(",")).as[String]
ds.show
+-----+
|value|
+-----+
|a,1,1|
|b,2,2|
|c,3,3|
|d,4,4|
|e,5,5|
+-----+
Regards,
Christophe.
On 6/4/22 14:38, marc nicole wrote:
> Hi,
> How to convert a Dataset<Row> to a Dataset<String>?
> What i have tried is:
>
> List<String> list = dataset.as
> <http://dataset.as>(Encoders.STRING()).collectAsList(); Dataset<String>
> datasetSt = spark.createDataset(list, Encoders.STRING()); // But this line
> raises a org.apache.spark.sql.AnalysisException: Try to map struct... to
> Tuple1, but failed as the number of fields does not line up
>
> Type of columns being String
> How to solve this?