subject:"batch processing in spark"

Re: batch processing in spark

2019-05-05 Thread Genmao Yu

IIUC, you can use mapPartitions transformation and pass a function f. The function is used to map a tuple of input iterator to an output iterator. Upon the input iterator, you can process multiple records at a time. > 在 2019年5月6日，上午2:59，swastik mittal 写道： > > From my experience in spark, whe

batch processing in spark

2019-05-05 Thread swastik mittal

>From my experience in spark, when working on hdfs data base, spark reads data in form of records and does computation on every record as soon as it reads it. I have multiple images as my data on hdfs, where each image is a record. I want spark to read multiple records before doing any computation.