date:20200119

Re: RESTful Operations

2020-01-19 Thread Chris Teoh

Maybe something like Livy, otherwise roll your own REST API and have it start a Spark job. On Mon, 20 Jan 2020 at 06:55, wrote: > I am new to Spark. The task I want to accomplish is let client send http > requests, then spark process that request for further operations. However > searching Spark

RESTful Operations

2020-01-19 Thread hamishberridge

I am new to Spark. The task I want to accomplish is let client send http requests, then spark process that request for further operations. However searching Spark's website docs https://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.package https://spark.apache.or

Re: Record count query parallel processing in databricks spark delta lake

2020-01-19 Thread Farhan Misarwala

Hi Anbutech, If I am not mistaken, I believe you are trying to read multiple dataframes from around 150 different paths (in your case the Kafka topics) to count their records. You have all these paths stored in a CSV with columns year, month, day and hour. Here is what I came up with; I have been

Re: Does explode lead to more usage of memory

2020-01-19 Thread Chris Teoh

Depends on the use case, if you have to join, you're saving a join and a shuffle from having it already in an array. If you explode, at least sort within partitions to get you predicate pushdown when you read the data next time. On Sun, 19 Jan 2020, 1:19 pm Jörn Franke, wrote: > Why not two tab