Hi Daniela, This is trivial with Structured Streaming. If your Kafka cluster is 0.10.0 or above, you may use Spark 2.0.2 to create a Streaming DataFrame from Kafka, and then also create a DataFrame using the JDBC connection, and you may join those. In Spark 2.1, there's support for a function called "from_json", which should also help you easily parse your messages incoming from Kafka.
Best, Burak On Tue, Dec 6, 2016 at 2:16 AM, Daniela S <daniela_4...@gmx.at> wrote: > Hi > > I have some questions regarding Spark Streaming. > > I receive a stream of JSON messages from Kafka. > The messages consist of a timestamp and an ID. > > timestamp ID > 2016-12-06 13:00 1 > 2016-12-06 13:40 5 > ... > > In a database I have values for each ID: > > ID minute value > 1 0 3 > 1 1 5 > 1 2 7 > 1 3 8 > 5 0 6 > 5 1 6 > 5 2 8 > 5 3 5 > 5 4 6 > > So I would like to join each incoming JSON message with the corresponding > values. It should look as follows: > > timestamp ID minute value > 2016-12-06 13:00 1 0 3 > 2016-12-06 13:00 1 1 5 > 2016-12-06 13:00 1 2 7 > 2016-12-06 13:00 1 3 8 > 2016-12-06 13:40 5 0 6 > 2016-12-06 13:40 5 1 6 > 2016-12-06 13:40 5 2 8 > 2016-12-06 13:40 5 3 5 > 2016-12-06 13:40 5 4 6 > ... > > Then I would like to add the minute values to the timestamp. I only need > the computed timestamp and the values. So the result should look as follows: > > timestamp value > 2016-12-06 13:00 3 > 2016-12-06 13:01 5 > 2016-12-06 13:02 7 > 2016-12-06 13:03 8 > 2016-12-06 13:40 6 > 2016-12-06 13:41 6 > 2016-12-06 13:42 8 > 2016-12-06 13:43 5 > 2016-12-06 13:44 6 > ... > > Is this a possible use case for Spark Streaming? I thought I could join > the streaming data with the static data but I am not sure how to add the > minute values to the timestamp. Is this possible with Spark Streaming? > > Thank you in advance. > > Best regards, > Daniela > > --------------------------------------------------------------------- To > unsubscribe e-mail: user-unsubscr...@spark.apache.org