Cool thanks. Will give that a try... --Ron On Friday, July 21, 2017 8:09 PM, Keith Chapman <keithgchap...@gmail.com> wrote:
You could also enable it with --conf spark.logLineage=true if you do not want to change any code. Regards,Keith. http://keith-chapman.com On Fri, Jul 21, 2017 at 7:57 PM, Keith Chapman <keithgchap...@gmail.com> wrote: Hi Ron, You can try using the toDebugString method on the RDD, this will print the RDD lineage. Regards,Keith. http://keith-chapman.com On Fri, Jul 21, 2017 at 11:24 AM, Ron Gonzalez <zlgonza...@yahoo.com.invalid> wrote: Hi, Can someone point me to a test case or share sample code that is able to extract the RDD graph from a Spark job anywhere during its lifecycle? I understand that Spark has UI that can show the graph of the execution so I'm hoping that is using some API somewhere that I could use. I know RDD is the actual execution graph, so if there is also a more logical abstraction API closer to calls like map, filter, aggregate, etc., that would even be better. Appreciate any help... Thanks,Ron