Hi, just want to let you know, that we opened a JIRA (FLINK-1236) to support local split assignment for the HadoopInputFormat. At least this performance issue should be easy to solve :-)
2014-11-11 12:44 GMT+01:00 Fabian Hueske <[email protected]>: > First of all, split locality can make a huge difference. > It will also enable a tighter integration, API-wise as well for the > execution by pushing for example filters or projections directly into the > data source and therefore reduce the data to be read from the file system. > > 2014-11-11 12:30 GMT+01:00 Flavio Pompermaier <[email protected]>: > >> Maybe this is a dumb question but could you explain me what are the >> benefits of a dedicated Flink IF vs the one available by default in Hadoop >> IF wrapper? >> Is it just because of data locality of task slots? >> >> On Tue, Nov 11, 2014 at 12:16 PM, Fabian Hueske <[email protected]> >> wrote: >> >>> Hi Flavio, >>> >>> I am not aware of a Flink InputFormat for Parquet. However, it should be >>> hopefully covered by the Hadoop IF wrapper. >>> A dedicated Flink IF would be great though, IMO. >>> >>> Best, Fabian >>> >>> 2014-11-11 12:10 GMT+01:00 Flavio Pompermaier <[email protected]>: >>> >>>> Hi to all, >>>> >>>> I'd like to know whether Flink is able exploit Parquet format to read >>>> data efficiently from HDFS. >>>> Is there any example available? >>>> >>>> Bets, >>>> Flavio >>>> >>> >>> >> >
