[ 
https://issues.apache.org/jira/browse/AIRFLOW-2771?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on AIRFLOW-2771 started by Micheal Ascah.
----------------------------------------------
> S3Hook Broad Exception Silent Failure
> -------------------------------------
>
>                 Key: AIRFLOW-2771
>                 URL: https://issues.apache.org/jira/browse/AIRFLOW-2771
>             Project: Apache Airflow
>          Issue Type: Bug
>          Components: hooks
>    Affects Versions: 1.9.0
>            Reporter: Micheal Ascah
>            Assignee: Micheal Ascah
>            Priority: Minor
>              Labels: S3Hook, S3Sensor
>
> h2. Scenario
> S3KeySensor is passed an invalid S3/AWS connection id name (doesn't exist or 
> bad permissions). There are also no credentials found under 
> ~/.aws/credentials for boto to fallback on.
>  
> When poking for the key, it creates an S3Hook and calls `check_for_key` on 
> the hook. If the call to HeadObject fails, the call is caught by a generic 
> except clause that catches all exceptions, rather than the expected 
> botocore.exceptions.ClientError when an object is not found.
> h2. Problem
> This causes the sensor to return False and report no issue with the task 
> instance until it times out, rather than intuitively failing immediately if 
> the connection is incorrectly configured. The current logging output gives no 
> insight as to why the key is not being found.
> h4. Current code
> {code:python}
> try:
>     self.get_conn().head_object(Bucket=bucket_name, Key=key)
>     return True
> except:  # <- This catches credential and connection exceptions that should 
> be raised
>     return False
> {code}
> {code:python}
> from airflow.hooks.S3_hook import S3Hook
> hook = S3Hook(aws_conn_id="conn_that_doesnt_exist")
> hook.check_for_key(key="test", bucket="test")
> False
> {code}
> {code:python}
> [2018-07-18 18:57:26,652] {base_task_runner.py:98} INFO - Subtask: 
> [2018-07-18 18:57:26,651] {sensors.py:537} INFO - Poking for key : 
> s3://bucket/key.txt
> [2018-07-18 18:57:26,681] {base_task_runner.py:98} INFO - Subtask: 
> [2018-07-18 18:57:26,680] {connectionpool.py:735} INFO - Starting new HTTPS 
> connection (1): bucket.s3.amazonaws.com
> [2018-07-18 18:58:26,767] {base_task_runner.py:98} INFO - Subtask: 
> [2018-07-18 18:58:26,767] {sensors.py:537} INFO - Poking for key : 
> s3://bucket/key.txt
> [2018-07-18 18:58:26,809] {base_task_runner.py:98} INFO - Subtask: 
> [2018-07-18 18:58:26,808] {connectionpool.py:735} INFO - Starting new HTTPS 
> connection (1): bucket.s3.amazonaws.com
> {code}
> h4. Expected
> h5. No credentials
> {code:python}
> from airflow.hooks.S3_hook import S3Hook
> hook = S3Hook(aws_conn_id="conn_that_doesnt_exist")
> hook.check_for_key(key="test", bucket="test")
> Traceback (most recent call last):
> ...
> botocore.exceptions.NoCredentialsError: Unable to locate credentials
> {code}
> h5. Good credentials
> {code:python}
> from airflow.hooks.S3_hook import S3Hook
> hook = S3Hook(aws_conn_id="conn_that_does_exist")
> hook.check_for_key(key="test", bucket="test")
> False
> {code}
> h4. Proposed Change
> Add a type to the except clause for botocore.exceptions.ClientError and log 
> the message for both check_for_key and check_for_bucket on S3Hook.
> {code:python}
> try:
>     self.get_conn().head_object(Bucket=bucket_name, Key=key)
>     return True
> except ClientError as e:
>     self.log.info(e.response["Error"]["Message"]) 
>     return False
> {code}
>   



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to