Hi All,

I have two dataframe with below structure, i have to join these two
dataframe - the scenario is one column is string in one dataframe and in
other df join column is array of string, so we have to inner join two df
and get the data if string value is present in any of the array of string
value in another dataframe,


df1 = spark.sql("""
    SELECT
        mr.id as mr_id,
        pv.id as pv_id,
        array(mr.id, pv.id) as combined_id
    FROM
        table1 mr
        INNER JOIN table2 pv ON pv.id = Mr.recordid
       where
        pv.id = '35122806-4cd2-4916-a149-24ea55c2dc36'
        or pv.id = 'a5f03625-6cc5-49df-95eb-df741fe9139b'
""")

# df1.display()

# Your second query
df2 = spark.sql("""
    SELECT
        id
    FROM
        table2
    WHERE
        id = '35122806-4cd2-4916-a149-24ea55c2dc36'

""")



Result data:
35122806-4cd2-4916-a149-24ea55c2dc36 only, because this records alone is
common between string and array of string value.

Can you share the sample snippet, how we can do the join for this two
different datatype in the dataframe.

if any clarification needed, pls feel free to ask.

Thanks

Reply via email to