[PR] Change default `parquet_row_group_size` in `BaseSQLToGCSOperator` [airflow]

2024-01-16 Thread via GitHub
renzepost opened a new pull request, #36817: URL: https://github.com/apache/airflow/pull/36817 closes: #36793 As mentioned in #36793, a default setting of 1 for `parquet_row_group_size` leads to quite a few problems. For example the output Parquet files become huge, the

Re: [PR] Change default `parquet_row_group_size` in `BaseSQLToGCSOperator` [airflow]

2024-01-16 Thread via GitHub
Taragolis commented on PR #36817: URL: https://github.com/apache/airflow/pull/36817#issuecomment-1894336872 > @Taragolis suggested a much lower value between 100 and 1000. This was a suggestion from the pessimist inside of me. 🤣 -- This is an automated message from the Apache Git

Re: [PR] Change default `parquet_row_group_size` in `BaseSQLToGCSOperator` [airflow]

2024-01-16 Thread via GitHub
potiuk commented on PR #36817: URL: https://github.com/apache/airflow/pull/36817#issuecomment-1894630369 It's borderline breaking change, but I'd hate to bump MAJOR version of google provider because of it - I think however it would be enough if there is sa STRONG mention in the Changelong

Re: [PR] Change default `parquet_row_group_size` in `BaseSQLToGCSOperator` [airflow]

2024-01-17 Thread via GitHub
renzepost commented on PR #36817: URL: https://github.com/apache/airflow/pull/36817#issuecomment-1896177270 Ah, got it! I've added a more verbose description in the changelog. Let me know if I missed anything or need to change the wording. -- This is an automated message from the Apache G

Re: [PR] Change default `parquet_row_group_size` in `BaseSQLToGCSOperator` [airflow]

2024-01-17 Thread via GitHub
potiuk commented on PR #36817: URL: https://github.com/apache/airflow/pull/36817#issuecomment-1896225993 Nice -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe

Re: [PR] Change default `parquet_row_group_size` in `BaseSQLToGCSOperator` [airflow]

2024-01-17 Thread via GitHub
Taragolis commented on code in PR #36817: URL: https://github.com/apache/airflow/pull/36817#discussion_r1456304434 ## airflow/providers/google/CHANGELOG.rst: ## @@ -26,6 +26,14 @@ Changelog - +The default value of ``parquet_row_group_size`` in ``BaseSQLToGCSOperator`

Re: [PR] Change default `parquet_row_group_size` in `BaseSQLToGCSOperator` [airflow]

2024-01-17 Thread via GitHub
Taragolis commented on code in PR #36817: URL: https://github.com/apache/airflow/pull/36817#discussion_r1456304434 ## airflow/providers/google/CHANGELOG.rst: ## @@ -26,6 +26,14 @@ Changelog - +The default value of ``parquet_row_group_size`` in ``BaseSQLToGCSOperator`

Re: [PR] Change default `parquet_row_group_size` in `BaseSQLToGCSOperator` [airflow]

2024-01-18 Thread via GitHub
eladkal commented on code in PR #36817: URL: https://github.com/apache/airflow/pull/36817#discussion_r1457350670 ## airflow/providers/google/CHANGELOG.rst: ## @@ -27,6 +27,16 @@ Changelog - +.. note:: + The default value of ``parquet_row_group_size`` in ``BaseSQLToG

Re: [PR] Change default `parquet_row_group_size` in `BaseSQLToGCSOperator` [airflow]

2024-01-18 Thread via GitHub
eladkal merged PR #36817: URL: https://github.com/apache/airflow/pull/36817 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.