How to use groupByKey() in spark structured streaming without aggregates

act_coder Tue, 27 Oct 2020 21:58:23 -0700

Is there a way through which we can use* groupByKey() Function in spark
structured streaming without aggregates ?*


I have a scenario like below, where we would like to group the items based
on a key without applying any aggregates.

*Sample incoming data:*



I would like to apply groupByKey on field - "device_id", so that i will be
getting an output like below.



I have also tried using collect_list() in the aggregate expression of
groupByKey, but that is taking more time to process the datasets.

Also, since we are aggregating - we could only use either 'Complete' or
'Update' in output modes, but 'Append' mode looks more suitable for our use
case.

I have also looked at the groupByKey(Num_Partitions) and reduceByKey()
functions available in Direct Dstream which gives results like in the form
of -> (String, Itreable[String,Int]) without doing any aggregates.

Is there something available similar to that in structured streaming ?



--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/

---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscr...@spark.apache.org

How to use groupByKey() in spark structured streaming without aggregates

Reply via email to