Finally got a toy version of Structured Streaming DataSource V2 version with
Apache Spark 2.3 working. Tested locally and on Databricks community
edition.
Source code is here - https://github.com/hienluu/wikiedit-streaming
--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/
-
Hi Matt,
unfortunately I have no code pointer at hand.
I will sketch how to accomplish this via the API, it will for sure at least
help you getting started.
1) ETL + vectorization (I assume your feature vector to be named "features")
2) You run a clustering algorithm (say KMeans:
https://spark.a
I recommend to run it with your unit tests executed with your build tool.
There is no need to have it in the ide running in the background.
> On 3. Mar 2018, at 17:57, sujeet jog wrote:
>
> Is there a way to run Spark-JobServer in eclipse ?.. any pointers in this
> regard ?
--