First of all, split locality can make a huge difference. It will also enable a tighter integration, API-wise as well for the execution by pushing for example filters or projections directly into the data source and therefore reduce the data to be read from the file system.
2014-11-11 12:30 GMT+01:00 Flavio Pompermaier <[email protected]>: > Maybe this is a dumb question but could you explain me what are the > benefits of a dedicated Flink IF vs the one available by default in Hadoop > IF wrapper? > Is it just because of data locality of task slots? > > On Tue, Nov 11, 2014 at 12:16 PM, Fabian Hueske <[email protected]> > wrote: > >> Hi Flavio, >> >> I am not aware of a Flink InputFormat for Parquet. However, it should be >> hopefully covered by the Hadoop IF wrapper. >> A dedicated Flink IF would be great though, IMO. >> >> Best, Fabian >> >> 2014-11-11 12:10 GMT+01:00 Flavio Pompermaier <[email protected]>: >> >>> Hi to all, >>> >>> I'd like to know whether Flink is able exploit Parquet format to read >>> data efficiently from HDFS. >>> Is there any example available? >>> >>> Bets, >>> Flavio >>> >> >> >
