It sounds like you want to interpret the input as strings, do some processing, then infer the schema. That has nothing to do with construing the entire row as a string like "Row[foo=bar, baz=1]"
On Sat, Jun 4, 2022 at 10:32 AM marc nicole <mk1853...@gmail.com> wrote: > Hi Sean, > > Thanks, actually I have a dataset where I want to inferSchema after > discarding the specific String value of "+". I do this because the column > would be considered StringType while if i remove that "+" value it will be > considered DoubleType for example or something else. Basically I want to > remove "+" from all dataset rows and then inferschema. > Here my idea is to filter the rows not equal to "+" for the target columns > (potentially all of them) and then use spark.read().csv() to read the new > filtered dataset with the option inferSchema which would then yield correct > column types. > What do you think? > > Le sam. 4 juin 2022 à 15:56, Sean Owen <sro...@gmail.com> a écrit : > >> I don't think you want to do that. You get a string representation of >> structured data without the structure, at best. This is part of the reason >> it doesn't work directly this way. >> You can use a UDF to call .toString on the Row of course, but, again >> what are you really trying to do? >> >> On Sat, Jun 4, 2022 at 7:35 AM marc nicole <mk1853...@gmail.com> wrote: >> >>> Hi, >>> How to convert a Dataset<Row> to a Dataset<String>? >>> What i have tried is: >>> >>> List<String> list = dataset.as(Encoders.STRING()).collectAsList(); >>> Dataset<String> datasetSt = spark.createDataset(list, Encoders.STRING()); >>> // But this line raises a org.apache.spark.sql.AnalysisException: Try to >>> map struct... to Tuple1, but failed as the number of fields does not line >>> up >>> >>> Type of columns being String >>> How to solve this? >>> >>