Hello, For one of our legacy workloads we use spark thriftserver to retrieve data from Kafka. The pipeline is: Oracle -- odbc --> Spark Thrift --> Kafka
Spark is doing some transformation like: avro deserialize, array explode, etc, but no aggregation. The main issue we face is that thrift sends data back to Oracle after all records are processed. This approach results in high memory pressure when requesting big amounts (with possible OOM). Is there a possibility to configure thrift to work in "streaming" fashion, processing and sending intermediate chunks ? I've tried Hive 4.0.0-SNAPSHOT and it works in streaming fashion, receiving and transmitting in the same time, but yeah, its a single server with no clustering available. BR, Tomas