I am interested in playing around with modeling Airflow DAGs along with their runs using a dynamic graph neural network. Why? It is an example, I just think it would be cool to inspect the DAGs, then choose a task like predicting runtimes and train a node embedding as part of a blog post. It would be great if the network was trained on a real-world workload, that way it could actually be useful as a starting point for people to do ML around orchestration with Airflow.
To do this, I need a paired repo / log dataset that includes both some Airflow DAGs and their associated run-logs. Does anyone know of an open source of this information? Is this something I could easily generate by executing the examples or unit tests? Thanks, Russell Jurney @rjurney <http://twitter.com/rjurney> [email protected] LI <http://linkedin.com/in/russelljurney> FB <http://facebook.com/jurney> datasyndrome.com
