Re: RESTful Operations

2020-01-19 Thread Chris Teoh
Maybe something like Livy, otherwise roll your own REST API and have it start a Spark job. On Mon, 20 Jan 2020 at 06:55, wrote: > I am new to Spark. The task I want to accomplish is let client send http > requests, then spark process that request for further operations. However > searching

RESTful Operations

2020-01-19 Thread hamishberridge
I am new to Spark. The task I want to accomplish is let client send http requests, then spark process that request for further operations. However  searching Spark's website docs     https://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.package    

Re: Record count query parallel processing in databricks spark delta lake

2020-01-19 Thread Farhan Misarwala
Hi Anbutech, If I am not mistaken, I believe you are trying to read multiple dataframes from around 150 different paths (in your case the Kafka topics) to count their records. You have all these paths stored in a CSV with columns year, month, day and hour. Here is what I came up with; I have

Re: Does explode lead to more usage of memory

2020-01-19 Thread Chris Teoh
Depends on the use case, if you have to join, you're saving a join and a shuffle from having it already in an array. If you explode, at least sort within partitions to get you predicate pushdown when you read the data next time. On Sun, 19 Jan 2020, 1:19 pm Jörn Franke, wrote: > Why not two