Not really, unless you’re doing something wrong (e.g. Call collect or similar).
In the foreach loop you’re typically registering a temp table, by converting an RDD to data frame. All the subsequent queries are executed in parallel on the workers. I haven’t built production apps with this pattern but I have successfully built a prototype where I execute dynamic SQL on top of a 15 minute window (obtained with .window on the Dstream) - and it works as expected. Check this out for code example: https://github.com/databricks/reference-apps/blob/master/logs_analyzer/chapter1/scala/src/main/scala/com/databricks/apps/logs/chapter1/LogAnalyzerStreamingSQL.scala -adrian From: Daniel Haviv Date: Monday, October 12, 2015 at 12:52 PM To: user Subject: SQLContext within foreachRDD Hi, As things that run inside foreachRDD run at the driver, does that mean that if we use SQLContext inside foreachRDD the data is sent back to the driver and only then the query is executed or is it executed at the executors? Thank you. Daniel