Have a look at this guide here:
https://databricks.com/blog/2017/04/26/processing-data-in-apache-kafka-with-structured-streaming-in-apache-spark-2-2.html
You should be able to send your sensor data to a Kafka topic, which Spark will
subscribe to. You may need to use an Input DStream to connect
Do your schema inference and then apply the JSON schema using withColumn
overwriting the String representation
From: Nirav Patel
Date: Tuesday, October 2, 2018 at 5:00 PM
To:
Cc: spark users
Subject: Re: CSV parser - how to parse column containing json data
I need to inferSchema from
Hi Mahendar,
Which version of spark and Hadoop are you using?
I tried it on spark2.3.1 with Hadoop 2.7.3 and it works for a folder containing
multiple gz files.
From: Mahender Sarangam
Sent: Monday, October 1, 2018 2:00 AM
To: user@spark.apache.org
Subject: Unable to read multiple JSON.Gz
I need to inferSchema from CSV as well. As per your solution, I am creating
SructType only for Json field. So how am I going to mix and match here?
i.e. do type inference for all fields but json field and use custom
json_schema for json field.
On Thu, Aug 30, 2018 at 5:29 PM Brandon Geise
Thank you, Taylor for your reply. The second solution doesn't work for my
case since my text files are getting updated every second. Actually, my
input data is live such that I'm getting 2 streams of data from 2 seismic
sensors and then I write them into 2 text files for simplicity and this is
Hey Zeinab,
We may have to take a small step back here. The sliding window approach (ie:
the window operation) is unique to Data stream mining. So it makes sense that
window() is restricted to DStream.
It looks like you're not using a stream mining approach. From what I can see in
your code,
Hello,
I have 2 text file in the following form and my goal is to calculate the
Pearson correlation between them using sliding window in pyspark:
123.00
-12.00
334.00
.
.
.
First I read these 2 text file and store them in RDD format and then I apply
the window operation on each RDD but I keep