Re: Latency with cross operation on Datasets

2018-05-11 Thread Fabian Hueske
Hi Varun, The focus of the DataSet execution is on robustness. The smaller DataSet is stored serialized in memory. Also most of the communication happens via serialization (instead of passing object references). The serialization overhead should have a significant overhead compared to a thread-loc

Latency with cross operation on Datasets

2018-05-10 Thread Varun Dhore
Hello flink community, I am trying to understand the latency involved in cross operation. Below are my tests. In plain Java: 1. Create 2D array 1 - populated with 1 million rows and 3 columns with randomly generated double values 2. Create 2D array 1 - populated with 100 rows and 3 columns with