I have a data frame for which I apply an UDF that calls a REST web service.
This web service is distributed in only a few nodes and it won't be able to
handle a massive load from Spark. 

 

Is it possible to rate limit this UDP? For example , something like 100
op/s. 

 

If not , what are the options? Is splitting the df an option? 

 

I've read a similar question in Stack overflow [1] and the solution suggests
Spark Streaming , but my application does not involve streaming. Do I need
to turn the operations into a streaming workflow to achieve something like
that? 

 

Current Workflow : Hive -> Spark ->  Service

 

Thank you

 

[1]
https://stackoverflow.com/questions/43953882/how-to-rate-limit-a-spark-map-o
peration

Reply via email to