I have a data frame for which I apply an UDF that calls a REST web service. This web service is distributed in only a few nodes and it won't be able to handle a massive load from Spark.
Is it possible to rate limit this UDP? For example , something like 100 op/s. If not , what are the options? Is splitting the df an option? I've read a similar question in Stack overflow [1] and the solution suggests Spark Streaming , but my application does not involve streaming. Do I need to turn the operations into a streaming workflow to achieve something like that? Current Workflow : Hive -> Spark -> Service Thank you [1] https://stackoverflow.com/questions/43953882/how-to-rate-limit-a-spark-map-o peration