Re: [PR] Add support for `TABLESAMPLE` pipe operator [datafusion-sqlparser-rs]
alamb commented on PR #1860: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1860#issuecomment-2930569699 🚀 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [PR] Add support for `TABLESAMPLE` pipe operator [datafusion-sqlparser-rs]
iffyio merged PR #1860: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1860 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [PR] Add support for `TABLESAMPLE` pipe operator [datafusion-sqlparser-rs]
alamb commented on PR #1860: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1860#issuecomment-2920524140 I restarted the checks -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [PR] Add support for `TABLESAMPLE` pipe operator [datafusion-sqlparser-rs]
hendrikmakait commented on PR #1860: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1860#issuecomment-2919634086 @iffyio: CI should be fixed now -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [PR] Add support for `TABLESAMPLE` pipe operator [datafusion-sqlparser-rs]
iffyio commented on PR #1860: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1860#issuecomment-2918865945 @hendrikmakait could you take a look at the CI failures? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [PR] Add support for `TABLESAMPLE` pipe operator [datafusion-sqlparser-rs]
hendrikmakait commented on code in PR #1860:
URL:
https://github.com/apache/datafusion-sqlparser-rs/pull/1860#discussion_r2111252622
##
src/ast/query.rs:
##
@@ -2680,28 +2680,32 @@ pub enum PipeOperator {
full_table_exprs: Vec,
group_by_expr: Vec,
},
+/// Selects a random sample of rows from the input table.
+/// Syntax: `|> TABLESAMPLE ( {ROWS | PERCENT})`
+/// See more at
+TableSample { sample: Box },
}
impl fmt::Display for PipeOperator {
fn fmt(&self, f: &mut fmt::Formatter) -> fmt::Result {
match self {
PipeOperator::Select { exprs } => {
-write!(f, "SELECT {}",
display_comma_separated(exprs.as_slice()))
+write!(f, " SELECT {}",
display_comma_separated(exprs.as_slice()))
Review Comment:
Good point, I found a way to fix the formatting for `TableSample` in a
different way that requires fewer changes and follows this general principle.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]
Re: [PR] Add support for `TABLESAMPLE` pipe operator [datafusion-sqlparser-rs]
hendrikmakait commented on code in PR #1860:
URL:
https://github.com/apache/datafusion-sqlparser-rs/pull/1860#discussion_r2111257608
##
src/ast/query.rs:
##
@@ -1559,7 +1559,7 @@ impl fmt::Display for TableSampleBucket {
}
impl fmt::Display for TableSample {
fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result {
-write!(f, " {}", self.modifier)?;
+write!(f, "{}", self.modifier)?;
Review Comment:
For better composition of AST building blocks, I remove the whitespace here
and introduce it again for the structs that contain a TableSample.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]
Re: [PR] Add support for `TABLESAMPLE` pipe operator [datafusion-sqlparser-rs]
iffyio commented on code in PR #1860:
URL:
https://github.com/apache/datafusion-sqlparser-rs/pull/1860#discussion_r2110927693
##
src/ast/query.rs:
##
@@ -2680,28 +2680,32 @@ pub enum PipeOperator {
full_table_exprs: Vec,
group_by_expr: Vec,
},
+/// Selects a random sample of rows from the input table.
+/// Syntax: `|> TABLESAMPLE ( {ROWS | PERCENT})`
+/// See more at
+TableSample { sample: Box },
}
impl fmt::Display for PipeOperator {
fn fmt(&self, f: &mut fmt::Formatter) -> fmt::Result {
match self {
PipeOperator::Select { exprs } => {
-write!(f, "SELECT {}",
display_comma_separated(exprs.as_slice()))
+write!(f, " SELECT {}",
display_comma_separated(exprs.as_slice()))
Review Comment:
Oh I think we'd ideally undo these changes, its better for the operator
display to be standalone where possible (i.e. they shouldn't assume surrounding
space formatting, rather it should be left up to the caller), it makes it
easier to compose nodes in the AST when displaying
##
src/ast/query.rs:
##
@@ -2680,28 +2680,32 @@ pub enum PipeOperator {
full_table_exprs: Vec,
group_by_expr: Vec,
},
+/// Selects a random sample of rows from the input table.
+/// Syntax: `|> TABLESAMPLE ( {ROWS | PERCENT})`
Review Comment:
I think the doc comment doesn't necessarily need to spell out the full spec.
usually its enough to give a rough example of what the statement looks like e.g.
```sql
Syntax: `|> TABLESAMPLE BERNOULLI(50)
```
##
tests/sqlparser_common.rs:
##
@@ -15155,6 +15155,12 @@ fn parse_pipeline_operator() {
dialects.verified_stmt("SELECT * FROM users |> ORDER BY id DESC");
dialects.verified_stmt("SELECT * FROM users |> ORDER BY id DESC, name
ASC");
+// tablesample pipe operator
+dialects.verified_stmt("SELECT * FROM tbl |> TABLESAMPLE BERNOULLI (50)");
+dialects.verified_stmt("SELECT * FROM tbl |> TABLESAMPLE SYSTEM (50)");
+// TODO: Technically, REPEATABLE is not available in BigQuery, but it is
used with TABLESAMPLE in other dialects
Review Comment:
```suggestion
```
Yeah I think this behavior would be fine
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]
Re: [PR] Add support for `TABLESAMPLE` pipe operator [datafusion-sqlparser-rs]
hendrikmakait commented on code in PR #1860:
URL:
https://github.com/apache/datafusion-sqlparser-rs/pull/1860#discussion_r2106680846
##
src/ast/query.rs:
##
@@ -2680,28 +2680,32 @@ pub enum PipeOperator {
full_table_exprs: Vec,
group_by_expr: Vec,
},
+/// Selects a random sample of rows from the input table.
+/// Syntax: `|> TABLESAMPLE ( {ROWS | PERCENT})`
Review Comment:
This is the officially supported spec from the paper but doesn't cover
everything that is technically supported in this PR. I'm wondering how to best
deal with that (see the open question in the PR description).
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]
[PR] Add support for `TABLESAMPLE` pipe operator [datafusion-sqlparser-rs]
hendrikmakait opened a new pull request, #1860: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1860 Part of #1758 This PR adds support for the `|> TABLESAMPLE ...` pipe operator. Open question: [BigQuery's `TABLESAMPLE` operator](https://cloud.google.com/bigquery/docs/reference/standard-sql/pipe-syntax#tablesample_pipe_operator) does not support the full [SQL standard spec](https://jakewheat.github.io/sql-overview/sql-2016-foundation-grammar.html#sample-clause). This PR adds support for a superset of the BigQuery syntax which, e.g., includes the `REPEATABLE` clause and reuses the existing code for parsing `TABLESAMPLE` operators. Is this desired from the perspective of the project or is it preferred to stick closely to the BigQuery syntax? If so, I'm happy to change this. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
