It sounds like you want to interpret the input as strings, do some
processing, then infer the schema. That has nothing to do with construing
the entire row as a string like "Row[foo=bar, baz=1]"

On Sat, Jun 4, 2022 at 10:32 AM marc nicole <mk1853...@gmail.com> wrote:

> Hi Sean,
>
> Thanks, actually I have a dataset where I want to inferSchema after
> discarding the specific String value of "+". I do this because the column
> would be considered StringType while if i remove that "+" value it will be
> considered DoubleType for example or something else. Basically I want to
> remove "+" from all dataset rows and then inferschema.
> Here my idea is to filter the rows not equal to "+" for the target columns
> (potentially all of them) and then use spark.read().csv() to read the new
> filtered dataset with the option inferSchema which would then yield correct
> column types.
> What do you think?
>
> Le sam. 4 juin 2022 à 15:56, Sean Owen <sro...@gmail.com> a écrit :
>
>> I don't think you want to do that. You get a string representation of
>> structured data without the structure, at best. This is part of the reason
>> it doesn't work directly this way.
>> You can use a UDF to call .toString on the Row of course, but, again
>> what are you really trying to do?
>>
>> On Sat, Jun 4, 2022 at 7:35 AM marc nicole <mk1853...@gmail.com> wrote:
>>
>>> Hi,
>>> How to convert a Dataset<Row> to a Dataset<String>?
>>> What i have tried is:
>>>
>>> List<String> list = dataset.as(Encoders.STRING()).collectAsList();
>>> Dataset<String> datasetSt = spark.createDataset(list, Encoders.STRING());
>>> // But this line raises a org.apache.spark.sql.AnalysisException: Try to
>>> map struct... to Tuple1, but failed as the number of fields does not line
>>> up
>>>
>>> Type of columns being String
>>> How to solve this?
>>>
>>

Reply via email to