Re: [PR] feat: support table sample [datafusion]
github-actions[bot] closed pull request #16505: feat: support table sample URL: https://github.com/apache/datafusion/pull/16505 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [PR] feat: support table sample [datafusion]
github-actions[bot] commented on PR #16505: URL: https://github.com/apache/datafusion/pull/16505#issuecomment-3310246462 Thank you for your contribution. Unfortunately, this pull request is stale because it has been open 60 days with no activity. Please remove the stale label or comment or this will be closed in 7 days. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [PR] feat: support table sample [datafusion]
chenkovsky commented on PR #16505: URL: https://github.com/apache/datafusion/pull/16505#issuecomment-3004038741 some comments were added in cargo file today. https://github.com/apache/datafusion/blob/20a723b7b6d91da57fe6abea8ecac08ea5267a89/datafusion/sql/Cargo.toml#L49 . it makes sense to me. changing dependency in datafusion-sql should be careful. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [PR] feat: support table sample [datafusion]
chenkovsky commented on PR #16505: URL: https://github.com/apache/datafusion/pull/16505#issuecomment-3003924958 > @2010YOUY01 thank you for pointing this out. > > @chenkovsky, it looks like both our PRs solve the same sampling problem from different approaches. The direction of my PR is to continue improving random filtering (as in #13268) by enhancing a predicate-based sampling, as previously discussed with @alamb [here](https://github.com/apache/datafusion/issues/13563#issuecomment-2498989436). > > The sampling logic differs between databases, and in my PR implementation and review process, we have already begun addressing some subtle semantics differences for Postgres, DuckDB, Hive etc. I considered random filtering before, but I found it's hard to implement poisson sample and seed. then I bring spark's design here. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [PR] feat: support table sample [datafusion]
theirix commented on PR #16505: URL: https://github.com/apache/datafusion/pull/16505#issuecomment-3003701483 @2010YOUY01 thank you for pointing this out. @chenkovsky, it looks like both our PRs solve the same sampling problem from different approaches. The direction of my PR is to continue improving random filtering (as in #13268) by enhancing a predicate-based sampling, as previously discussed with @alamb [here](https://github.com/apache/datafusion/issues/13563#issuecomment-2498989436). The sampling logic differs between databases, and in my PR implementation and review process, we have already begun addressing some subtle semantics differences for Postgres, DuckDB, Hive etc. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [PR] feat: support table sample [datafusion]
chenkovsky commented on PR #16505: URL: https://github.com/apache/datafusion/pull/16505#issuecomment-2999488349 > I suggest to first open an issue to describe full syntax and semantics of this table sample feature, and also include the reference system (like postgres). After we have reached some agreement, then we can start implementing. > > There is another implementation that seems to have several syntax difference than this PR #16325 @theirix > > We had a previous discussion that DF can include features for postgres syntax. However if it's referencing other systems, then it might need more discussion and wider approval. Updated, and this PR implements Spark style sample. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [PR] feat: support table sample [datafusion]
2010YOUY01 commented on PR #16505: URL: https://github.com/apache/datafusion/pull/16505#issuecomment-2999386422 I suggest to first open an issue to describe full syntax and semantics of this table sample feature, and also include the reference system (like postgres). After we have reached some agreement, then we can start implementing. There is another implementation that seems to have several syntax difference than this PR https://github.com/apache/datafusion/pull/16325 @theirix We had a previous discussion that DF can include features for postgres syntax. However if it's referencing other systems, then it might need more discussion and wider approval. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [PR] feat: support table sample [datafusion]
chenkovsky commented on PR #16505: URL: https://github.com/apache/datafusion/pull/16505#issuecomment-2996411604 > It would be better to add more details about the PR, such as: sample levels: block level or row level sample ways: fixed row counts or percent? @xudong963 updated -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [PR] feat: support table sample [datafusion]
xudong963 commented on PR #16505: URL: https://github.com/apache/datafusion/pull/16505#issuecomment-2996237484 It would be better to add more details about the PR, such as: sample levels: block level or row level sample ways: fixed row counts or percent? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
