many questions (kafka table KafkaDeserilizationSchema support, recommended enrichment approach, prevent JDBC temporal dimension table from N +1 queries, etc.

Marco Villalobos Sun, 06 Dec 2020 22:04:44 -0800

1. How can I create a kafka table that can use headers and map them to columns? 
Currently, I am using KafkaDeserilizationSchema to create a DataStream, and 
then I convert that DataStream into a Table. I would like to use a more direct 
approach.


2. What is the recommended way to enrich a kafka table or data-stream with 
data-from postgres?
        a) kafka table and JDBC temporal dimension table with temporal join and 
lookup cache setup
        b) data-stream with async io which connects via JDBC.  (note that 
asycio does not support Keyed State cache)
        c) data-stream rich function or process function that uses Keyed State.

3. When using a kafka told and JDBC temporal dimension table how do I prevent N 
+ 1 queries per join row?

        When I issued a query such as this:

        SELECT k.name, t1.id, t2.metadata, SUM(k.cost)
        FROM kafka_table AS k
        JOIN jdbc_table_one AS t1 ON k.t1_id = t1.ID
        LEFT JOIN jdbc_table_two FOR SYSTEM_TIME AS OF k.proc_time AS t2 ON 
t1.t2_id = t2.id AND t2.name = k.name
        GROUP BY TUMBLE (k.proc_time, INTERVAL '3' MINUTE), k.name, t1.id, 
t2.metadata

        My PostgreSQL sql logs show that jdbc_table_two has a query per each 
distinct t2.name.

        In a real production system, that would be 200,000 queries!

4. When using a JDBC temporal dimension table does Flink retrieve the from the 
database asynchronously , or is it possible for Flink to multiple join rows at 
time with a IN (subquery) syntax?

many questions (kafka table KafkaDeserilizationSchema support, recommended enrichment approach, prevent JDBC temporal dimension table from N +1 queries, etc.

Reply via email to