Hello,

For one of our legacy workloads we use spark thriftserver to retrieve data
from Kafka. The pipeline is:
Oracle -- odbc --> Spark Thrift --> Kafka

Spark is doing some transformation like: avro deserialize, array explode,
etc, but no aggregation.
The main issue we face is that thrift sends data back to Oracle after all
records are processed. This approach results in high memory pressure when
requesting big amounts (with possible OOM).
Is there a possibility to configure thrift to work in "streaming" fashion,
processing and sending intermediate chunks ?

I've tried Hive 4.0.0-SNAPSHOT and it works in streaming fashion, receiving
and transmitting in the same time, but yeah, its a single server with no
clustering available.

BR,
Tomas

Reply via email to