Joel Croteau created AIRFLOW-5046:
-------------------------------------

             Summary: Allow GoogleCloudStorageToBigQueryOperator to accept 
source_objects as a string or otherwise take input from XCom
                 Key: AIRFLOW-5046
                 URL: https://issues.apache.org/jira/browse/AIRFLOW-5046
             Project: Apache Airflow
          Issue Type: Improvement
          Components: contrib, gcp
    Affects Versions: 1.10.2
            Reporter: Joel Croteau


`GoogleCloudStorageToBigQueryOperator` should be able to have its 
`source_objects` dynamically determined by the results of a previous workflow. 
This is hard to do with it expecting a list, as any template expansion will 
render as a string. This could be implemented either as a check for whether 
`source_objects` is a string, and trying to parse it as a list if it is, or a 
separate argument for a string encoded as a list.

My particular use case for this is as follows:
 # A daily DAG scans a GCS bucket for all objects created in the last day and 
loads them into BigQuery.
 # To find these objects, a `PythonOperator` scans the bucket and returns a 
list of object names.
 # A `GoogleCloudStorageToBigQueryOperator` is used to load these objects into 
BigQuery.

The operator should be able to have its list of objects provided by XCom, but 
there is no functionality to do this, and trying to do a template expansion 
along the lines of `source_objects='\{{ task_instance.xcom_pull(key="KEY") }}'` 
doesn't work because this is rendered as a string, which 
`GoogleCloudStorageToBigQueryOperator` will try to treat as a list, with each 
character being a single item.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

Reply via email to