Hey Biplob,

Yes, the file source will read all input. The first operator will add
a combiner to the source for pre-aggregation and then shuffle
everything to a single reduce instance, which emits the N first
elements. Keep in mind that there is no strict order in which the
records will be emitted.

If you need to optimize this you could write a custom
File/TextInputFormat, which discards the lines at the sources. You can
have a look at these classes and then get back with questions on the
mailing list.

– Ufuk

On Sat, Apr 23, 2016 at 6:37 PM, Biplob Biswas <revolutioni...@gmail.com> wrote:
> Hi,
>
> It might be a naive question but I was concerned as I am trying to read from
> a file.
> My question is if I have a file with n lines and i want m lines out of that
> where n << m, would the first operator process only the first m lines or
> would it go through the entire file?
>
> If it does go through the entire file, is there a better way to just get the
> top m lines using readCsvFile function?
>
> Thanks & Regards
> Biplob Biswas

Reply via email to