RE: Difference between Typed and untyped transformation in dataset API

email Sat, 23 Feb 2019 16:24:40 -0800

>From what I understand , if the transformation is untyped it will return a 
>Dataframe , otherwise it will return a Dataset.  In the source code you will 
>see that return type is a Dataframe instead of a Dataset and they should also 
>be annotated with @group untypedrel. Thus , you could check the signature of 
>the method to determine if it is untyped or not.


 

In general , anything that changes the type of a column or adds a new column in 
a Dataset will be untyped. The idea of a Dataset is to stay constant when it 
comes to the schema. The moment you try to modify the schema , we need to 
fallback to a Dataframe. 

 

For example , withColumn is untyped because it transforms the Dataset(typed) to 
an untyped structure(Dataframe). 

 

From: Akhilanand <akhilanand...@gmail.com> 
Sent: Thursday, February 21, 2019 7:35 PM
To: user <user@spark.apache.org>
Subject: Difference between Typed and untyped transformation in dataset API

 

What is the key difference between Typed and untyped transformation in dataset 
API?

How do I determine if its typed or untyped?

Any gotchas when to use what apart from the reason that it does the job for me?

RE: Difference between Typed and untyped transformation in dataset API

Reply via email to