Re: [PR] Add support for `TABLESAMPLE` pipe operator [datafusion-sqlparser-rs]

2025-06-02 Thread via GitHub


alamb commented on PR #1860:
URL: 
https://github.com/apache/datafusion-sqlparser-rs/pull/1860#issuecomment-2930569699

   🚀 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] Add support for `TABLESAMPLE` pipe operator [datafusion-sqlparser-rs]

2025-05-30 Thread via GitHub


iffyio merged PR #1860:
URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1860


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] Add support for `TABLESAMPLE` pipe operator [datafusion-sqlparser-rs]

2025-05-29 Thread via GitHub


alamb commented on PR #1860:
URL: 
https://github.com/apache/datafusion-sqlparser-rs/pull/1860#issuecomment-2920524140

   I restarted the checks


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] Add support for `TABLESAMPLE` pipe operator [datafusion-sqlparser-rs]

2025-05-29 Thread via GitHub


hendrikmakait commented on PR #1860:
URL: 
https://github.com/apache/datafusion-sqlparser-rs/pull/1860#issuecomment-2919634086

   @iffyio: CI should be fixed now


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] Add support for `TABLESAMPLE` pipe operator [datafusion-sqlparser-rs]

2025-05-29 Thread via GitHub


iffyio commented on PR #1860:
URL: 
https://github.com/apache/datafusion-sqlparser-rs/pull/1860#issuecomment-2918865945

   @hendrikmakait could you take a look at the CI failures?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] Add support for `TABLESAMPLE` pipe operator [datafusion-sqlparser-rs]

2025-05-28 Thread via GitHub


hendrikmakait commented on code in PR #1860:
URL: 
https://github.com/apache/datafusion-sqlparser-rs/pull/1860#discussion_r2111252622


##
src/ast/query.rs:
##
@@ -2680,28 +2680,32 @@ pub enum PipeOperator {
 full_table_exprs: Vec,
 group_by_expr: Vec,
 },
+/// Selects a random sample of rows from the input table.
+/// Syntax: `|> TABLESAMPLE  ( {ROWS | PERCENT})`
+/// See more at 

+TableSample { sample: Box  },
 }
 
 impl fmt::Display for PipeOperator {
 fn fmt(&self, f: &mut fmt::Formatter) -> fmt::Result {
 match self {
 PipeOperator::Select { exprs } => {
-write!(f, "SELECT {}", 
display_comma_separated(exprs.as_slice()))
+write!(f, " SELECT {}", 
display_comma_separated(exprs.as_slice()))

Review Comment:
   Good point, I found a way to fix the formatting for `TableSample` in a 
different way that requires fewer changes and follows this general principle.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] Add support for `TABLESAMPLE` pipe operator [datafusion-sqlparser-rs]

2025-05-28 Thread via GitHub


hendrikmakait commented on code in PR #1860:
URL: 
https://github.com/apache/datafusion-sqlparser-rs/pull/1860#discussion_r2111257608


##
src/ast/query.rs:
##
@@ -1559,7 +1559,7 @@ impl fmt::Display for TableSampleBucket {
 }
 impl fmt::Display for TableSample {
 fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result {
-write!(f, " {}", self.modifier)?;
+write!(f, "{}", self.modifier)?;

Review Comment:
   For better composition of AST building blocks, I remove the whitespace here 
and introduce it again for the structs that contain a TableSample.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] Add support for `TABLESAMPLE` pipe operator [datafusion-sqlparser-rs]

2025-05-27 Thread via GitHub


iffyio commented on code in PR #1860:
URL: 
https://github.com/apache/datafusion-sqlparser-rs/pull/1860#discussion_r2110927693


##
src/ast/query.rs:
##
@@ -2680,28 +2680,32 @@ pub enum PipeOperator {
 full_table_exprs: Vec,
 group_by_expr: Vec,
 },
+/// Selects a random sample of rows from the input table.
+/// Syntax: `|> TABLESAMPLE  ( {ROWS | PERCENT})`
+/// See more at 

+TableSample { sample: Box  },
 }
 
 impl fmt::Display for PipeOperator {
 fn fmt(&self, f: &mut fmt::Formatter) -> fmt::Result {
 match self {
 PipeOperator::Select { exprs } => {
-write!(f, "SELECT {}", 
display_comma_separated(exprs.as_slice()))
+write!(f, " SELECT {}", 
display_comma_separated(exprs.as_slice()))

Review Comment:
   Oh I think we'd ideally undo these changes, its better for the operator 
display to be standalone where possible (i.e. they shouldn't assume surrounding 
space formatting, rather it should be left up to the caller), it makes it 
easier to compose nodes in the AST when displaying



##
src/ast/query.rs:
##
@@ -2680,28 +2680,32 @@ pub enum PipeOperator {
 full_table_exprs: Vec,
 group_by_expr: Vec,
 },
+/// Selects a random sample of rows from the input table.
+/// Syntax: `|> TABLESAMPLE  ( {ROWS | PERCENT})`

Review Comment:
   I think the doc comment doesn't necessarily need to spell out the full spec. 
usually its enough to give a rough example of what the statement looks like e.g.
   ```sql
   Syntax: `|> TABLESAMPLE BERNOULLI(50)
   ```
   



##
tests/sqlparser_common.rs:
##
@@ -15155,6 +15155,12 @@ fn parse_pipeline_operator() {
 dialects.verified_stmt("SELECT * FROM users |> ORDER BY id DESC");
 dialects.verified_stmt("SELECT * FROM users |> ORDER BY id DESC, name 
ASC");
 
+// tablesample pipe operator
+dialects.verified_stmt("SELECT * FROM tbl |> TABLESAMPLE BERNOULLI (50)");
+dialects.verified_stmt("SELECT * FROM tbl |> TABLESAMPLE SYSTEM (50)");
+// TODO: Technically, REPEATABLE is not available in BigQuery, but it is 
used with TABLESAMPLE in other dialects

Review Comment:
   ```suggestion
   ```
   Yeah I think this behavior would be fine



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] Add support for `TABLESAMPLE` pipe operator [datafusion-sqlparser-rs]

2025-05-26 Thread via GitHub


hendrikmakait commented on code in PR #1860:
URL: 
https://github.com/apache/datafusion-sqlparser-rs/pull/1860#discussion_r2106680846


##
src/ast/query.rs:
##
@@ -2680,28 +2680,32 @@ pub enum PipeOperator {
 full_table_exprs: Vec,
 group_by_expr: Vec,
 },
+/// Selects a random sample of rows from the input table.
+/// Syntax: `|> TABLESAMPLE  ( {ROWS | PERCENT})`

Review Comment:
   This is the officially supported spec from the paper but doesn't cover 
everything that is technically supported in this PR. I'm wondering how to best 
deal with that (see the open question in the PR description).



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



[PR] Add support for `TABLESAMPLE` pipe operator [datafusion-sqlparser-rs]

2025-05-25 Thread via GitHub


hendrikmakait opened a new pull request, #1860:
URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1860

   Part of #1758
   
   This PR adds support for the `|> TABLESAMPLE ...` pipe operator.
   
   Open question:
   
   [BigQuery's `TABLESAMPLE` 
operator](https://cloud.google.com/bigquery/docs/reference/standard-sql/pipe-syntax#tablesample_pipe_operator)
 does not support the full [SQL standard 
spec](https://jakewheat.github.io/sql-overview/sql-2016-foundation-grammar.html#sample-clause).
 This PR adds support for a superset of the BigQuery syntax which, e.g., 
includes the `REPEATABLE` clause and reuses the existing code for parsing 
`TABLESAMPLE` operators. Is this desired from the perspective of the project or 
is it preferred to stick closely to the BigQuery syntax? If so, I'm happy to 
change this.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]