PM
To: Antoaneta Marinova
Cc: user
Subject: Re: Spark 2.0 - DataFrames vs Dataset performance
Hi Antoaneta,
I believe the difference is not due to Datasets being slower (DataFrames are
just an alias to Datasets now), but rather using a user defined function for
filtering vs using Spark builtins
Hi Antoaneta,
I believe the difference is not due to Datasets being slower (DataFrames
are just an alias to Datasets now), but rather using a user defined
function for filtering vs using Spark builtins. The builtin can use tricks
from Project Tungsten, such as only deserializing the "event_type" co
Hello,
I am using Spark 2.0 for performing filtering, grouping and counting
operations on events data saved in parquet files. As the events schema has
very nested structure I wanted to read them as scala beans to simplify the
code but I noticed a severe performance degradation. Below you can find