Re: Support Required: Issue with PySpark Code Execution Order

2025-08-15 Thread Mich Talebzadeh
You have got two separate things going on: You did materialize but your unionByName is failing because the two DataFrames don’t actually have the same column set/schema at the moment of the union, even if printSchema() you looked at earlier seemed to match. Why unionByName is complaining (common

Re: Support Required: Issue with PySpark Code Execution Order

2025-08-13 Thread Karthick N
Thanks for support. I tried using action(count()) after persist to materialize the DataFrame results. However, I’m still facing a column count mismatch issue when performing unionByName. I have verified the column count and names in both DataFrames using printSchema, and it shows that both DataFra

Re: Support Required: Issue with PySpark Code Execution Order

2025-08-11 Thread Mich Talebzadeh
Hi Karthick, The problem seems to be that you were performing transformation/recipe on three data frames without materialisation, then writing back to that target table. Each MERGE re-evaluated its “recipe” at a different time, so they saw different snapshots → flaky/empty results. Fix (short +

Re: Support Required: Issue with PySpark Code Execution Order

2025-08-11 Thread Karthick N
Hi *Ángel*, Thank you for checking on this. I’ll review the points you mentioned and get back to you with an update. Hi *Mich*, Looping you in here — could you please assist in reviewing this issue and share your inputs or suggestions? Your expertise would be really helpful in resolving it. Than

Re: Support Required: Issue with PySpark Code Execution Order

2025-08-10 Thread Ángel Álvarez Pascua
Have you tried disabling AQE? El dom, 10 ago 2025, 20:48, Karthick N escribió: > Hi Team, > > I’m facing an issue with the execution order in the PySpark code snippet > below. I’m not certain whether it’s caused by lazy evaluation, Spark plan > optimization, or something else. > > *Issue:* > Fo

Re: Support Required: Issue with PySpark Code Execution Order

2025-08-10 Thread Bjørn Jørgensen
Short: spark uses lazy eval. This is the long answer The whole of this is taken from Google Gemini https://aistudio.google.com/ Of course. This is a classic and often subtle issue in Spark

Support Required: Issue with PySpark Code Execution Order

2025-08-10 Thread Karthick N
Hi Team, I’m facing an issue with the execution order in the PySpark code snippet below. I’m not certain whether it’s caused by lazy evaluation, Spark plan optimization, or something else. *Issue:* For the same data and scenario, during some runs, one of the final views is not returning any data.