IIUC, you can use mapPartitions transformation and pass a function f. The
function is used to map a tuple of input iterator to an output iterator. Upon
the input iterator, you can process multiple records at a time.
> 在 2019年5月6日,上午2:59,swastik mittal 写道:
>
> From my experience in spark, whe
>From my experience in spark, when working on hdfs data base, spark reads data
in form of records and does computation on every record as soon as it reads
it. I have multiple images as my data on hdfs, where each image is a record.
I want spark to read multiple records before doing any computation.