Kengo Seki created AIRFLOW-2382: ----------------------------------- Summary: Fix wrong description for delimiter Key: AIRFLOW-2382 URL: https://issues.apache.org/jira/browse/AIRFLOW-2382 Project: Apache Airflow Issue Type: Bug Components: aws, operators Reporter: Kengo Seki
The document for S3ListOperator says: {code} :param delimiter: The delimiter by which you want to filter the objects. For e.g to lists the CSV files from in a directory in S3 you would use delimiter='.csv'. {code} {code} **Example**: The following operator would list all the CSV files from the S3 ``customers/2018/04/`` key in the ``data`` bucket. :: s3_file = S3ListOperator( task_id='list_3s_files', bucket='data', prefix='customers/2018/04/', delimiter='.csv', aws_conn_id='aws_customers_conn' ) {code} but it actually behaves oppositely: {code} In [1]: from airflow.contrib.operators.s3_list_operator import S3ListOperator In [2]: S3ListOperator(task_id='t', bucket='bkt0', prefix='', aws_conn_id='s3').execute(None) [2018-04-26 10:34:27,001] {connectionpool.py:735} INFO - Starting new HTTPS connection (1): bkt0.s3.amazonaws.com [2018-04-26 10:34:27,711] {connectionpool.py:735} INFO - Starting new HTTPS connection (1): bkt0.s3-ap-northeast-1.amazonaws.com [2018-04-26 10:34:27,801] {connectionpool.py:735} INFO - Starting new HTTPS connection (1): bkt0.s3.ap-northeast-1.amazonaws.com Out[2]: ['0.csv', '1.txt', '2.jpg', '3.exe'] In [3]: S3ListOperator(task_id='t', bucket='bkt0', prefix='', aws_conn_id='s3', delimiter='.csv').execute(None) [2018-04-26 10:34:39,722] {connectionpool.py:735} INFO - Starting new HTTPS connection (1): bkt0.s3.amazonaws.com [2018-04-26 10:34:40,483] {connectionpool.py:735} INFO - Starting new HTTPS connection (1): bkt0.s3-ap-northeast-1.amazonaws.com [2018-04-26 10:34:40,569] {connectionpool.py:735} INFO - Starting new HTTPS connection (1): bkt0.s3.ap-northeast-1.amazonaws.com Out[3]: ['1.txt', '2.jpg', '3.exe'] {code} This is because that the 'delimiter' parameter is for representing path hierarchy (so '/' is used typically), not file extension. Also S3ToGoogleCloudStorageOperator has the same problem. -- This message was sent by Atlassian JIRA (v7.6.3#76005)