ordering of rows in dataframe

Som Lima Tue, 05 Dec 2023 09:25:25 -0800

want to maintain the order of the rows in the data frame in Pyspark. Is
there any way to achieve this for this function here we have the row ID
which will give numbering to each row. Currently, the below function
results in the rearrangement of the row in the data frame.


def createRowIdColumn( new_column, position, start_value):
    row_count = df.count()
    row_ids = spark.range(int(start_value), int(start_value) +
row_count, 1).toDF(new_column)
    window = Window.orderBy(lit(1))
    df_row_ids = row_ids.withColumn("row_num", row_number().over(window) - 1)
    df_with_row_num = df.withColumn("row_num", row_number().over(window) - 1)

    if position == "Last Column":
        result = df_with_row_num.join(df_row_ids, on="row_num").drop("row_num")
    else:
        result = df_row_ids.join(df_with_row_num, on="row_num").drop("row_num")

    return result.orderBy(new_column)

Please let me know the solution if we can achieve this requirement.

ordering of rows in dataframe

Reply via email to