Hi All, I have two dataframe with below structure, i have to join these two dataframe - the scenario is one column is string in one dataframe and in other df join column is array of string, so we have to inner join two df and get the data if string value is present in any of the array of string value in another dataframe,
df1 = spark.sql(""" SELECT mr.id as mr_id, pv.id as pv_id, array(mr.id, pv.id) as combined_id FROM table1 mr INNER JOIN table2 pv ON pv.id = Mr.recordid where pv.id = '35122806-4cd2-4916-a149-24ea55c2dc36' or pv.id = 'a5f03625-6cc5-49df-95eb-df741fe9139b' """) # df1.display() # Your second query df2 = spark.sql(""" SELECT id FROM table2 WHERE id = '35122806-4cd2-4916-a149-24ea55c2dc36' """) Result data: 35122806-4cd2-4916-a149-24ea55c2dc36 only, because this records alone is common between string and array of string value. Can you share the sample snippet, how we can do the join for this two different datatype in the dataframe. if any clarification needed, pls feel free to ask. Thanks