Hi,
We have reference data pulled in from an RDBMS through a Sqoop job, this
reference data is pulled into the Analytics platform once a day.
We have a Spark Streaming job, where at job bootup we read the reference
data, and then join this reference data with continuously flowing event
data. When t
Hi,
The aim here is as follows:
- read data from Socket using Spark Streaming every N seconds
- register received data as SQL table
- there will be more data read from HDFS etc as reference data, they will
also be registered as SQL tables
- the idea is to perform arbitrary SQL queries on the combi
Hi,
I'm in Spark 1.3.0 and my data is in DataFrames.
I need operations like sampleByKey(), sampleByKeyExact().
I saw the JIRA "Add approximate stratified sampling to DataFrame" (
https://issues.apache.org/jira/browse/SPARK-7157).
That's targeted for Spark 1.5, till that comes through, whats the eas
Hi,
We are building a machine learning platform based on ML-Lib in Spark. We
would be using Scala for the development.
We need a thin workflow layer where we can easily configure the different
actions to be done, configuration for the actions (like load-data,
clean-data, split-data etc), and the or