Hi @all, I am using monotonically_increasing_id(), in the pyspark function, for removing one field from json field in one column from the delta table, please refer the below code
df = spark.sql(f"SELECT * from {database}.{table}") df1 = spark.read.json(df.rdd.map(lambda x: x.data), multiLine = True) df1 = df1.drop('fw_createdts') df1 = df1.selectExpr('to_json(struct(*)) as data') df = df.withColumn('row_index', row_number().over(Window.orderBy(monotonically_increasing_id()))).withColumnRenamed( "data","data1") df1 = df1.withColumn('row_index', row_number().over(Window.orderBy(monotonically_increasing_id()))) df = df.join(df1, on=["row_index"]).drop("row_index","data1") df.createOrReplaceTempView('tempdf') Business Requirement: Need to remove one key value from json field from the json field in the delta table done steps: 1. reading the data 2. read the JSON data only in separate df and split using spark.read.json 3. I am removing the unwanted column from the df 4. again i am converting it into json field 5. (since we don't have any unique column between json column and other primary columns are not able to map with unique id) so, I am joining two dataframe using monotonically_increasing_id() (and also some of the fields present in the JSON field is present in the primitive level, so not able to split the json field in same dataframe) *Issue Faced:* *1. For small database and data it is working as expected* *2. for big database and data it is not working as expected and get mapping with different records in same table* *When I referred document, I could see the id are not consecutive, is there any limit?* https://spark.apache.org/docs/3.1.3/api/python/reference/api/pyspark.sql.functions.monotonically_increasing_id.html#pyspark-sql-functions-monotonically-increasing-id *Could you explain to us if there any constraints on it?* *How we can achieve this requirement by using any alternate methods?* Thanks in advance🙂