Hi Conner,
What is preventing you from using a cluster model?
I wonder if docker containers could help you here?
A quick internet search yielded Mist: https://github.com/Hydrospheredata/mist
Could be useful?
Taylor
-Original Message-
From: conner
Sent: Saturday, November 3, 2018 2:04
At first glance, I wonder if your tables are partitioned? There may not be
enough parallelism happening. You can also pass in the number of partitions
and/or a custom partitioner to help Spark “guess” how to organize the shuffle.
Have you seen any of these docs?
https://people.eecs.berkeley.edu/
Hey Nirav,
Here’s an idea:
Suppose your file.csv has N records, one for each line. Read the csv
line-by-line (without spark) and attempt to parse each line. If a record is
malformed, catch the exception and rethrow it with the line number. That should
show you where the problematic record(s) c
Have a look at this guide here:
https://databricks.com/blog/2017/04/26/processing-data-in-apache-kafka-with-structured-streaming-in-apache-spark-2-2.html
You should be able to send your sensor data to a Kafka topic, which Spark will
subscribe to. You may need to use an Input DStream to connect Ka
Hey Zeinab,
We may have to take a small step back here. The sliding window approach (ie:
the window operation) is unique to Data stream mining. So it makes sense that
window() is restricted to DStream.
It looks like you're not using a stream mining approach. From what I can see in
your code,