Re: [PR] Clarify documentation about gathering statistics for parquet files [datafusion]
xudong963 commented on code in PR #16157:
URL: https://github.com/apache/datafusion/pull/16157#discussion_r2112300128
##
docs/source/user-guide/sql/ddl.md:
##
@@ -91,6 +93,23 @@ STORED AS PARQUET
LOCATION '/mnt/nyctaxi/tripdata.parquet';
```
+:::{note}
Review Comment:
> Here is an example of what this looks like rendered
TIL
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]
Re: [PR] Clarify documentation about gathering statistics for parquet files [datafusion]
xudong963 merged PR #16157: URL: https://github.com/apache/datafusion/pull/16157 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [PR] Clarify documentation about gathering statistics for parquet files [datafusion]
comphead commented on code in PR #16157:
URL: https://github.com/apache/datafusion/pull/16157#discussion_r2110078298
##
docs/source/user-guide/sql/ddl.md:
##
@@ -91,6 +93,23 @@ STORED AS PARQUET
LOCATION '/mnt/nyctaxi/tripdata.parquet';
```
+:::{note}
+Statistics
+: By default, when a table is created, DataFusion will _NOT_ read the files
+to gather statistics, which can be expensive but can accelerate subsequent
+queries substantially. If you want to gather statistics
+when creating a table, set the `datafusion.explain.show_statistics`
+configuration option to `true` before creating the table. For example:
+
+```sql
+SET datafusion.explain.show_statistics = true;
Review Comment:
```suggestion
SET datafusion.execution.collect_statistics = true;
```
?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]
Re: [PR] Clarify documentation about gathering statistics for parquet files [datafusion]
alamb commented on code in PR #16157:
URL: https://github.com/apache/datafusion/pull/16157#discussion_r2103274433
##
docs/source/user-guide/sql/ddl.md:
##
@@ -91,6 +93,23 @@ STORED AS PARQUET
LOCATION '/mnt/nyctaxi/tripdata.parquet';
```
+:::{note}
Review Comment:
Here is an example of what this looks like rendered

--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]
Re: [PR] Clarify documentation about gathering statistics for parquet files [datafusion]
comphead commented on code in PR #16157:
URL: https://github.com/apache/datafusion/pull/16157#discussion_r2110073876
##
docs/source/user-guide/sql/ddl.md:
##
@@ -91,6 +93,23 @@ STORED AS PARQUET
LOCATION '/mnt/nyctaxi/tripdata.parquet';
```
+:::{note}
+Statistics
+: By default, when a table is created, DataFusion will _NOT_ read the files
+to gather statistics, which can be expensive but can accelerate subsequent
+queries substantially. If you want to gather statistics
+when creating a table, set the `datafusion.explain.show_statistics`
Review Comment:
```suggestion
when creating a table, set the `datafusion.execution.collect_statistics`
```
?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]
Re: [PR] Clarify documentation about gathering statistics for parquet files [datafusion]
comphead commented on code in PR #16157:
URL: https://github.com/apache/datafusion/pull/16157#discussion_r2110073876
##
docs/source/user-guide/sql/ddl.md:
##
@@ -91,6 +93,23 @@ STORED AS PARQUET
LOCATION '/mnt/nyctaxi/tripdata.parquet';
```
+:::{note}
+Statistics
+: By default, when a table is created, DataFusion will _NOT_ read the files
+to gather statistics, which can be expensive but can accelerate subsequent
+queries substantially. If you want to gather statistics
+when creating a table, set the `datafusion.explain.show_statistics`
Review Comment:
```suggestion
when creating a table, set the `datafusion.explain.collect_statistics`
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]
Re: [PR] Clarify documentation about gathering statistics for parquet files [datafusion]
xudong963 commented on code in PR #16157:
URL: https://github.com/apache/datafusion/pull/16157#discussion_r2107141726
##
docs/source/user-guide/sql/ddl.md:
##
@@ -91,6 +93,23 @@ STORED AS PARQUET
LOCATION '/mnt/nyctaxi/tripdata.parquet';
```
+:::{note}
+Statistics
+: By default, when a table is created, DataFusion will _NOT_ read the files
+to gather statistics, which can be expensive but can accelerate subsequent
+queries substantially. If you want to gather statistics
+when creating a table, set the `datafusion.explain.show_statistics`
Review Comment:
`datafusion.explain.collect_statistics`?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]
