Re: [PR] Handle downloading github raw files gracefully for 403 status code [airflow]
amoghrajesh merged PR #44469: URL: https://github.com/apache/airflow/pull/44469 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] Handle downloading github raw files gracefully for 403 status code [airflow]
potiuk commented on PR #44469: URL: https://github.com/apache/airflow/pull/44469#issuecomment-2509472199 Rebased to account for the workaround in `main`. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] Handle downloading github raw files gracefully for 403 status code [airflow]
potiuk commented on code in PR #44469: URL: https://github.com/apache/airflow/pull/44469#discussion_r1864308815 ## dev/breeze/src/airflow_breeze/utils/github.py: ## @@ -54,6 +54,11 @@ def download_file_from_github(tag: str, path: str, output_file: Path, timeout: i if not get_dry_run(): try: response = requests.get(url, timeout=timeout) +if response.status_code == 403: +get_console().print( +f"[error]The {url} is not accessible.This may be caused by network issues or VPN settings" Review Comment: > @amoghrajesh can we use the github token as optional parameter for this (similar to what we do with issue generation command). In case of throttling we can recommand using it Not a good idea. This whole part of the installation process and retrieving constraints is supposed to work **without** `GITHUB_TOKEN`. Basically you are not supposed any headers when using `--constraints https://` And it's not needed. This is a ONE request when you install Airflow - i.e. run `pip install`. There is no scenario where throttling woudl be a limiting factor here, you would have to run 10s of pip installs at the same time to hit it - which even if you are behind proxy / NAT, is not going to realistically happen. Airlfow repo is under "Enterprise" account which has 15.000 requests per hour for primary limits and 900 points per minute (15 GET requests per second): https://docs.github.com/en/rest/using-the-rest-api/rate-limits-for-the-rest-api?apiVersion=2022-11-28#about-secondary-rate-limits So I think why 403 on throttling is theorethically possible, it's not a concern for constraint retrieval. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] Handle downloading github raw files gracefully for 403 status code [airflow]
vincbeck commented on code in PR #44469: URL: https://github.com/apache/airflow/pull/44469#discussion_r1863810134 ## dev/breeze/src/airflow_breeze/utils/github.py: ## @@ -54,6 +54,13 @@ def download_file_from_github(tag: str, path: str, output_file: Path, timeout: i if not get_dry_run(): try: response = requests.get(url, timeout=timeout) +if response.status_code == 403: +get_console().print( +f"[error]The {url} is not accessible.This may be caused by either of:\n" +f" 1. network issues or VPN settings\n" +f" 2. Github rate limiting limit" Review Comment: The double limit looks weird (nota. native english speaker here so I might be wrong) ```suggestion f" 2. Github rate limit" ``` ## dev/breeze/src/airflow_breeze/utils/github.py: ## @@ -54,6 +54,13 @@ def download_file_from_github(tag: str, path: str, output_file: Path, timeout: i if not get_dry_run(): try: response = requests.get(url, timeout=timeout) +if response.status_code == 403: +get_console().print( +f"[error]The {url} is not accessible.This may be caused by either of:\n" +f" 1. network issues or VPN settings\n" +f" 2. Github rate limiting limit" Review Comment: The double limit looks weird (not a native english speaker here so I might be wrong) ```suggestion f" 2. Github rate limit" ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] Handle downloading github raw files gracefully for 403 status code [airflow]
eladkal commented on code in PR #44469: URL: https://github.com/apache/airflow/pull/44469#discussion_r1863739968 ## dev/breeze/src/airflow_breeze/utils/github.py: ## @@ -54,6 +54,11 @@ def download_file_from_github(tag: str, path: str, output_file: Path, timeout: i if not get_dry_run(): try: response = requests.get(url, timeout=timeout) +if response.status_code == 403: +get_console().print( +f"[error]The {url} is not accessible.This may be caused by network issues or VPN settings" Review Comment: @amoghrajesh can we use the github token as optional parameter for this (similar to what we do with issue generation command). In case of throttling we can recommand using it -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] Handle downloading github raw files gracefully for 403 status code [airflow]
amoghrajesh commented on code in PR #44469: URL: https://github.com/apache/airflow/pull/44469#discussion_r1863704801 ## dev/breeze/src/airflow_breeze/utils/github.py: ## @@ -54,6 +54,11 @@ def download_file_from_github(tag: str, path: str, output_file: Path, timeout: i if not get_dry_run(): try: response = requests.get(url, timeout=timeout) +if response.status_code == 403: +get_console().print( +f"[error]The {url} is not accessible.This may be caused by network issues or VPN settings" Review Comment: Its actually pretty useless to print `response.content` - it deviates entirely from the actual error ## dev/breeze/src/airflow_breeze/utils/github.py: ## @@ -54,6 +54,11 @@ def download_file_from_github(tag: str, path: str, output_file: Path, timeout: i if not get_dry_run(): try: response = requests.get(url, timeout=timeout) +if response.status_code == 403: +get_console().print( +f"[error]The {url} is not accessible.This may be caused by network issues or VPN settings" Review Comment: Throttle limit, yes. Good idea -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] Handle downloading github raw files gracefully for 403 status code [airflow]
gopidesupavan commented on code in PR #44469: URL: https://github.com/apache/airflow/pull/44469#discussion_r1863695916 ## dev/breeze/src/airflow_breeze/utils/github.py: ## @@ -54,6 +54,11 @@ def download_file_from_github(tag: str, path: str, output_file: Path, timeout: i if not get_dry_run(): try: response = requests.get(url, timeout=timeout) +if response.status_code == 403: +get_console().print( +f"[error]The {url} is not accessible.This may be caused by network issues or VPN settings" Review Comment: May be worth adding this to print `response.content` so that we get to know what actually the error? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] Handle downloading github raw files gracefully for 403 status code [airflow]
vincbeck commented on code in PR #44469: URL: https://github.com/apache/airflow/pull/44469#discussion_r1863614387 ## dev/breeze/src/airflow_breeze/utils/github.py: ## @@ -54,6 +54,11 @@ def download_file_from_github(tag: str, path: str, output_file: Path, timeout: i if not get_dry_run(): try: response = requests.get(url, timeout=timeout) +if response.status_code == 403: +get_console().print( +f"[error]The {url} is not accessible.This may be caused by network issues or VPN settings" Review Comment: nit: We recently got some 403 from Github because we hit their throttle limit. Maybe update the message to also include this as possible cause? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] Handle downloading github raw files gracefully for 403 status code [airflow]
vincbeck commented on code in PR #44469: URL: https://github.com/apache/airflow/pull/44469#discussion_r1863614387 ## dev/breeze/src/airflow_breeze/utils/github.py: ## @@ -54,6 +54,11 @@ def download_file_from_github(tag: str, path: str, output_file: Path, timeout: i if not get_dry_run(): try: response = requests.get(url, timeout=timeout) +if response.status_code == 403: +get_console().print( +f"[error]The {url} is not accessible.This may be caused by network issues or VPN settings" Review Comment: nit: We recently got some 403 from Github because we hit their throttle limit. Maybe update the message to also this as possible cause? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org