subject:"Spark 2.0 \- DataFrames vs Dataset performance"

RE: Spark 2.0 - DataFrames vs Dataset performance

2016-10-24 Thread Mendelson, Assaf

PM To: Antoaneta Marinova Cc: user Subject: Re: Spark 2.0 - DataFrames vs Dataset performance Hi Antoaneta, I believe the difference is not due to Datasets being slower (DataFrames are just an alias to Datasets now), but rather using a user defined function for filtering vs using Spark builtins

Re: Spark 2.0 - DataFrames vs Dataset performance

2016-10-24 Thread Daniel Darabos

Hi Antoaneta, I believe the difference is not due to Datasets being slower (DataFrames are just an alias to Datasets now), but rather using a user defined function for filtering vs using Spark builtins. The builtin can use tricks from Project Tungsten, such as only deserializing the "event_type" co

Spark 2.0 - DataFrames vs Dataset performance

2016-10-24 Thread Antoaneta Marinova

Hello, I am using Spark 2.0 for performing filtering, grouping and counting operations on events data saved in parquet files. As the events schema has very nested structure I wanted to read them as scala beans to simplify the code but I noticed a severe performance degradation. Below you can find

RE: Spark 2.0 - DataFrames vs Dataset performance

Re: Spark 2.0 - DataFrames vs Dataset performance

Spark 2.0 - DataFrames vs Dataset performance

3 matches

Site Navigation

Mail list logo

Footer information