Re: Data Locality in Flink

Flavio Pompermaier Mon, 29 Apr 2019 05:48:57 -0700

Hi Fabian, I wasn't aware that  "race-conditions may happen if your splits
are very small as the first data source task might rapidly request and
process all splits before the other source tasks do their first request".
What happens exactly when a race-condition arise? Is this exception
internally handled by Flink or not?


On Mon, Apr 29, 2019 at 11:51 AM Fabian Hueske <fhue...@gmail.com> wrote:

> Hi,
>
> The method that I described in the SO answer is still implemented in Flink.
> Flink tries to assign splits to tasks that run on local TMs.
> However, files are not split per line (this would be horribly inefficient)
> but in larger chunks depending on the number of subtasks (and in case of
> HDFS the file block size).
>
> Best, Fabian
>
> Am So., 28. Apr. 2019 um 18:48 Uhr schrieb Soheil Pourbafrani <
> soheil.i...@gmail.com>:
>
>> Hi
>>
>> I want to exactly how Flink read data in the both case of file in local
>> filesystem and file on distributed file system?
>>
>> In reading data from local file system I guess every line of the file
>> will be read by a slot (according to the job parallelism) for applying the
>> map logic.
>>
>> In reading from HDFS I read this
>> <https://stackoverflow.com/a/39153402/8110607> answer by Fabian Hueske
>> <https://stackoverflow.com/users/3609571/fabian-hueske> and i want to
>> know is that still the Flink strategy fro reading from distributed system
>> file?
>>
>> thanks
>>
>

Re: Data Locality in Flink

Reply via email to