Say, *trainTimesDataset* is the streaming Dataset of schema *[train: Int, dest: String, time: Timestamp] *
*Scala*: *trainTimesDataset.groupBy("train", "dest").max("time")* *SQL*: *"select train, dest, max(time) from trainTimesView group by train, dest"* // after calling *trainTimesData.createOrReplaceTempView(trainTimesView)* On Tue, Aug 29, 2017 at 12:59 PM, kant kodali <kanth...@gmail.com> wrote: > Hi All, > > I am wondering what is the easiest and concise way to express the > computation below in Spark Structured streaming given that it supports both > imperative and declarative styles? > I am just trying to select rows that has max timestamp for each train? > Instead of doing some sort of nested queries like we normally do in any > relational database I am trying to see if I can leverage both imperative > and declarative at the same time. If nested queries or join are not > required then I would like to see how this can be possible? I am using > spark 2.1.1. > > Dataset > > Train Dest Time1 HK 10:001 SH 12:001 > SZ 14:002 HK 13:002 SH 09:002 SZ > 07:00 > > The desired result should be: > > Train Dest Time1 SZ 14:002 HK 13:00 > >