Hey all,

I am wondering if there should be an "official" way to propagate a 
skipped state of the last downstream tasks in a subdag to the parent dag.

Simple Use Case:
I have a subdag with the following tasks "transfer data from an api to 
s3", "convert json to csv", "transform csv".
Sometimes the api returns a json with an empty data object. The 
"transfer" task succeeds which I think is correct. The "convert" task 
which actually does json to pands.DataFrame to csv could check if the 
data frame is empty before proceeding. At this point I am raising a 
AirflowSkipException to skip this task (and all downstream tasks) which 
need the data to work. I don't want the task to fail with a simple 
AirflowException since I think no data does not necessary mean it is an 
error. It just means there is no data available for the time period 
requested - I want to skip it. The problem comes when the subdag run has 
finished because its state is set to success like all other runs where 
data was available. That means you won't be able to see the difference 
from outside of the subdag (the parent dag).

I would be very interested in what you guys think about this. Should we 
add a feature to "propagate skipped state from inside subdag to parent 
dag" or could my problem just be solved easier / better? Please let me 
know :)

(P.S. I made it work but I am just thinking of an official way of doing 
it if you guys agree with the idea)

Kind regards,
Felix


Reply via email to