Re: [PR] feat: support table sample [datafusion]

2025-09-26 Thread via GitHub


github-actions[bot] closed pull request #16505: feat: support table sample
URL: https://github.com/apache/datafusion/pull/16505


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] feat: support table sample [datafusion]

2025-09-18 Thread via GitHub


github-actions[bot] commented on PR #16505:
URL: https://github.com/apache/datafusion/pull/16505#issuecomment-3310246462

   Thank you for your contribution. Unfortunately, this pull request is stale 
because it has been open 60 days with no activity. Please remove the stale 
label or comment or this will be closed in 7 days.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] feat: support table sample [datafusion]

2025-06-25 Thread via GitHub


chenkovsky commented on PR #16505:
URL: https://github.com/apache/datafusion/pull/16505#issuecomment-3004038741

   some comments were added in cargo file today. 
   
https://github.com/apache/datafusion/blob/20a723b7b6d91da57fe6abea8ecac08ea5267a89/datafusion/sql/Cargo.toml#L49
  .
   it makes sense to me. changing dependency in datafusion-sql should be 
careful.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] feat: support table sample [datafusion]

2025-06-25 Thread via GitHub


chenkovsky commented on PR #16505:
URL: https://github.com/apache/datafusion/pull/16505#issuecomment-3003924958

   > @2010YOUY01 thank you for pointing this out.
   > 
   > @chenkovsky, it looks like both our PRs solve the same sampling problem 
from different approaches. The direction of my PR is to continue improving 
random filtering (as in #13268) by enhancing a predicate-based sampling, as 
previously discussed with @alamb 
[here](https://github.com/apache/datafusion/issues/13563#issuecomment-2498989436).
   > 
   > The sampling logic differs between databases, and in my PR implementation 
and review process, we have already begun addressing some subtle semantics 
differences for Postgres, DuckDB, Hive etc.
   
   I considered random filtering before, but I found it's hard to implement 
poisson sample and seed. then I bring spark's design here.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] feat: support table sample [datafusion]

2025-06-25 Thread via GitHub


theirix commented on PR #16505:
URL: https://github.com/apache/datafusion/pull/16505#issuecomment-3003701483

   @2010YOUY01 thank you for pointing this out.
   
   @chenkovsky, it looks like both our PRs solve the same sampling problem from 
different approaches. The direction of my PR is to continue improving random 
filtering (as in #13268) by enhancing a predicate-based sampling, as previously 
discussed with @alamb 
[here](https://github.com/apache/datafusion/issues/13563#issuecomment-2498989436).
   
   The sampling logic differs between databases, and in my PR implementation 
and review process, we have already begun addressing some subtle semantics 
differences for Postgres, DuckDB, Hive etc.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] feat: support table sample [datafusion]

2025-06-24 Thread via GitHub


chenkovsky commented on PR #16505:
URL: https://github.com/apache/datafusion/pull/16505#issuecomment-2999488349

   > I suggest to first open an issue to describe full syntax and semantics of 
this table sample feature, and also include the reference system (like 
postgres). After we have reached some agreement, then we can start implementing.
   > 
   > There is another implementation that seems to have several syntax 
difference than this PR #16325 @theirix
   > 
   > We had a previous discussion that DF can include features for postgres 
syntax. However if it's referencing other systems, then it might need more 
discussion and wider approval.
   
   Updated, and this PR implements Spark style sample.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] feat: support table sample [datafusion]

2025-06-24 Thread via GitHub


2010YOUY01 commented on PR #16505:
URL: https://github.com/apache/datafusion/pull/16505#issuecomment-2999386422

   I suggest to first open an issue to describe full syntax and semantics of 
this table sample feature, and also include the reference system (like 
postgres). After we have reached some agreement, then we can start implementing.
   
   There is another implementation that seems to have several syntax difference 
than this PR https://github.com/apache/datafusion/pull/16325 @theirix 
   
   We had a previous discussion that DF can include features for postgres 
syntax. However if it's referencing other systems, then it might need more 
discussion and wider approval.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] feat: support table sample [datafusion]

2025-06-23 Thread via GitHub


chenkovsky commented on PR #16505:
URL: https://github.com/apache/datafusion/pull/16505#issuecomment-2996411604

   > It would be better to add more details about the PR, such as: sample 
levels: block level or row level sample ways: fixed row counts or percent?
   @xudong963  updated
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] feat: support table sample [datafusion]

2025-06-23 Thread via GitHub


xudong963 commented on PR #16505:
URL: https://github.com/apache/datafusion/pull/16505#issuecomment-2996237484

   It would be better to add more details about the PR, such as:
   sample levels: block level or row level
   sample ways: fixed row counts or percent?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]