Hi All,

Here is my code.

Dataset<Row> df = ds.select(functions.from_json(new
Column("value").cast("string"), getSchema()).as("payload"));

Dataset<Row> df1 = df.selectExpr("payload.data.*");

StreamingQuery query =
df1.writeStream().outputMode("append").option("truncate",
"false").format("console").start();

query.awaitTermination();


payload.data is a struct with 75 fields so the above code runs without any
errors however the output doesn't get printed out at all even after waiting
for 5 minutes. But if I select say 3 columns like payload.data.col1,
payload.data.col2, payload.data.col3 then things work ok and I can see the
data that is read from Kafka however it still doesn't look instantaneous
(It still takes about a minute or two if I select less than 10 fields).
wondering why nothing gets printed to console if I select payload.data.* ?
The overall payload size is 2KB. I am using Spark 2.1.1 and I have enough
free memory and disk space left on my machines and I am running on a client
mode.

Thanks!

Reply via email to