[ https://issues.apache.org/jira/browse/AIRFLOW-2771?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Work on AIRFLOW-2771 started by Micheal Ascah. ---------------------------------------------- > S3Hook Broad Exception Silent Failure > ------------------------------------- > > Key: AIRFLOW-2771 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2771 > Project: Apache Airflow > Issue Type: Bug > Components: hooks > Affects Versions: 1.9.0 > Reporter: Micheal Ascah > Assignee: Micheal Ascah > Priority: Minor > Labels: S3Hook, S3Sensor > > h2. Scenario > S3KeySensor is passed an invalid S3/AWS connection id name (doesn't exist or > bad permissions). There are also no credentials found under > ~/.aws/credentials for boto to fallback on. > > When poking for the key, it creates an S3Hook and calls `check_for_key` on > the hook. If the call to HeadObject fails, the call is caught by a generic > except clause that catches all exceptions, rather than the expected > botocore.exceptions.ClientError when an object is not found. > h2. Problem > This causes the sensor to return False and report no issue with the task > instance until it times out, rather than intuitively failing immediately if > the connection is incorrectly configured. The current logging output gives no > insight as to why the key is not being found. > h4. Current code > {code:python} > try: > self.get_conn().head_object(Bucket=bucket_name, Key=key) > return True > except: # <- This catches credential and connection exceptions that should > be raised > return False > {code} > {code:python} > from airflow.hooks.S3_hook import S3Hook > hook = S3Hook(aws_conn_id="conn_that_doesnt_exist") > hook.check_for_key(key="test", bucket="test") > False > {code} > {code:python} > [2018-07-18 18:57:26,652] {base_task_runner.py:98} INFO - Subtask: > [2018-07-18 18:57:26,651] {sensors.py:537} INFO - Poking for key : > s3://bucket/key.txt > [2018-07-18 18:57:26,681] {base_task_runner.py:98} INFO - Subtask: > [2018-07-18 18:57:26,680] {connectionpool.py:735} INFO - Starting new HTTPS > connection (1): bucket.s3.amazonaws.com > [2018-07-18 18:58:26,767] {base_task_runner.py:98} INFO - Subtask: > [2018-07-18 18:58:26,767] {sensors.py:537} INFO - Poking for key : > s3://bucket/key.txt > [2018-07-18 18:58:26,809] {base_task_runner.py:98} INFO - Subtask: > [2018-07-18 18:58:26,808] {connectionpool.py:735} INFO - Starting new HTTPS > connection (1): bucket.s3.amazonaws.com > {code} > h4. Expected > h5. No credentials > {code:python} > from airflow.hooks.S3_hook import S3Hook > hook = S3Hook(aws_conn_id="conn_that_doesnt_exist") > hook.check_for_key(key="test", bucket="test") > Traceback (most recent call last): > ... > botocore.exceptions.NoCredentialsError: Unable to locate credentials > {code} > h5. Good credentials > {code:python} > from airflow.hooks.S3_hook import S3Hook > hook = S3Hook(aws_conn_id="conn_that_does_exist") > hook.check_for_key(key="test", bucket="test") > False > {code} > h4. Proposed Change > Add a type to the except clause for botocore.exceptions.ClientError and log > the message for both check_for_key and check_for_bucket on S3Hook. > {code:python} > try: > self.get_conn().head_object(Bucket=bucket_name, Key=key) > return True > except ClientError as e: > self.log.info(e.response["Error"]["Message"]) > return False > {code} > -- This message was sent by Atlassian JIRA (v7.6.3#76005)