Hey Biplob, Yes, the file source will read all input. The first operator will add a combiner to the source for pre-aggregation and then shuffle everything to a single reduce instance, which emits the N first elements. Keep in mind that there is no strict order in which the records will be emitted.
If you need to optimize this you could write a custom File/TextInputFormat, which discards the lines at the sources. You can have a look at these classes and then get back with questions on the mailing list. – Ufuk On Sat, Apr 23, 2016 at 6:37 PM, Biplob Biswas <revolutioni...@gmail.com> wrote: > Hi, > > It might be a naive question but I was concerned as I am trying to read from > a file. > My question is if I have a file with n lines and i want m lines out of that > where n << m, would the first operator process only the first m lines or > would it go through the entire file? > > If it does go through the entire file, is there a better way to just get the > top m lines using readCsvFile function? > > Thanks & Regards > Biplob Biswas