I had that problem/question some time ago, too.
The quick fix is to just put the line number in the line itself. Go for it.
However, we worked out a solution for another distributed processing
system, that did the following:
Read each partition, count the lines, broadcast a map
"partition->lineCount", re-read the data and attach the line-numbers.
This is basically how distributed zipWithIndex works, that is available
in Flink too.
But:
That only works if the data by both mapPartitions is read in the same
order and if the partitions used by both are in the same boundaries.
I don't now if you can get that guarantee in Flink without a
range-partition and sortPartition on the byte offset.
Doing that would work (I think), but it would add significant overhead,
that can be completely avoided by adding the line-numbers into the lines
in the first place.
I think it's just not worth it.
Am 4. Februar 2016 00:56:43 MEZ, schrieb Fabian Hueske <fhue...@gmail.com>:
Hi Anastasiia,
this is difficult because the input is usually read in parallel,
i.e., an input file is split into several blogs which are
independently read and processed by different threads (possibly on
different machines). So it is difficult to have a sequential row
number.
If all rows have the same length (number of bytes), you could
compute the row number from the byte offset. If this is not given,
you can only read the input sequentially.
Flink does not provide InputFormats for this. So you would need to
implement a custom InputFormat.
You can also keep track of the number of elements that you processed
in a Mapper, but this is probably not what you are looking for.
Best,
Fabian
2016-02-04 0:37 GMT+01:00 Анастасія Баша <nastja.ba...@mail.ru
<mailto:nastja.ba...@mail.ru>>:
Is there a way to get the current line number (or generally the
number of element currently being processed) inside a mapper?
The example is a matrix you read line-line by line from the file
and need both the row and the column numbers. Column number is
easy to get, but how to know the row number?
Thanks a lot in advance,
Anastasiia