Hi all
This is the sample set of data that I used for this task
,[image: image.png]
My need is to pass count as the offset of lag() function. *[ lag(col(),
lag(count)).over(windowspec) ]* But as the lag function expects lag(Column,
Int) above code does not work.
So can you guys suggest a
Given, you are already stating the above can be imagined as a partition, I
can think of mapPartitions iterator.
val inputSchema = inputDf.schema
val outputRdd = inputDf.rdd.mapPartitions(rows => new SomeClass(rows))
val outputDf = sparkSession.createDataFrame(outputRdd,
Hi all,
This is the sample set of data that I used for this task
[image: image.png]
My expected output is as below
[image: image.png]
My scenario is if Type is M01 the count should be 0 and if Type is M02 it
should be incremented from 1 or 0 until the sequence of M02 is finished.
Imagine this
Thanks great Rauf.
Regards
On Tuesday, 23 May 2023 at 13:18:55 BST, Rauf Khan
wrote:
Hi ,
PartitionBy() is analogous to group by, all rows that will have the
same value in the specified column will form one window.The data will be
shuffled to form group.
RegardsRaouf
On Fri, May 12,
Hi ,
PartitionBy() is analogous to group by, all rows that will have
the same value in the specified column will form one window.
The data will be shuffled to form group.
Regards
Raouf
On Fri, May 12, 2023, 18:48 ashok34...@yahoo.com.INVALID
wrote:
> Hello,
>
> In Spark windowing does call
spark.sparkContext.textFile("s3a://a_bucket/models/random_forest_zepp/bestModel/metadata",
1).getNumPartitions()
when i run above code, i get below error. Can advice how to troubleshoot? i'
using spark 3.3.0. the above file path exist.