zhaP524 created SPARK-21948: ------------------------------- Summary: How to use spark streaming for deal two table from one topic of topic?? Key: SPARK-21948 URL: https://issues.apache.org/jira/browse/SPARK-21948 Project: Spark Issue Type: Bug Components: Spark Core, Spark Submit, Structured Streaming Affects Versions: 2.1.1 Environment: kafka:0.10.0 Spark:2.1.1 Reporter: zhaP524
Now, I have such A requirement, I want from A topic of kafka、 A group of receiving data, receive to DirectStream contains two tables of data (A<Master>, B<Slave>), my ultimate goal is the two tables of data according to some fields join operation, will produce the results into the Database; My previous operations are as follows: 1. Use batch processing directly at the DirectStream layer, filter out the data of A and B, produce their respective DF, and then use Spark SQL to perform the join operation, and the results are then entered into the library;But this kind of situation will exist problems, A and B table 1: N relationship, when selected A batch, may lead to A full load, and part B table loaded only, lead to the calculation results is only part of the next batch came in already consumed A data, table B subsequent data has no associated data, leading to loss of data;I don't know if you can store the A table data using the queue and so on, until the total load of B table data is loaded and processed to generate the complete result. In this case, the essence is similar to Spark Streaming Window Operation 2、 2, the use of Spark Streaming Window Operation for processing, I can isolate A and B table when DirectStream flow stream, but the join Operation shall be carried out in the Window, the Window Operation did not support the transform Operation such as map, lead to process cannot go through(java.io.NotSerializableException: org.apache.kafka.clients.consumer.ConsumerRecord Serialization stack: - object not serializable (class: org.apache.kafka.clients.consumer.ConsumerRecord, value: ConsumerRecord(topic = doc, partition = 0,...), I don't know what should I do?? So, I was wondering if there was a problem with my scene?Or do I have a technical problem?Is there a solution to my business scenario without introducing other components?Trouble to give directions.TKS -- This message was sent by Atlassian JIRA (v6.4.14#64029) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org