Re: [PR] feat(c/include/arrow-adbc): Add AdbcStatementRequestSchema [arrow-adbc]
paleolimbot commented on PR #3623: URL: https://github.com/apache/arrow-adbc/pull/3623#issuecomment-3617230265 Agreed (that's what I have always liked about schema negotiation). Should I put that bit back into this PR or is that a separate documentation update? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
Re: [PR] feat(c/include/arrow-adbc): Add AdbcStatementRequestSchema [arrow-adbc]
lidavidm commented on PR #3623: URL: https://github.com/apache/arrow-adbc/pull/3623#issuecomment-3615500567 My concern with options is that I'd want them to be reasonably consistent across drivers. Maybe it doesn't have to be part of this function, but I also don't want every driver to have its own slightly different way of choosing between string/string_view/large_string. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
Re: [PR] feat(c/include/arrow-adbc): Add AdbcStatementRequestSchema [arrow-adbc]
paleolimbot commented on PR #3623: URL: https://github.com/apache/arrow-adbc/pull/3623#issuecomment-3555491528 I can add that bit back in too...whatever is less confusing! We can also experiment with schema negotiation afterward (I don't think it affects the behaviour of the function but I could be missing something). I think options are orthogonal...options affect the guessing of the schema in the absence of a request and we could have those, too. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
Re: [PR] feat(c/include/arrow-adbc): Add AdbcStatementRequestSchema [arrow-adbc]
lidavidm commented on PR #3623: URL: https://github.com/apache/arrow-adbc/pull/3623#issuecomment-3555270014 Well, maybe we want to cover cases like explicitly choosing between string/large_string/string_view (in principle the driver would have picked one, but maybe the user wants one of the other types and this can avoid a later conversion). Or is the thought that we need to add driver-specific flags for all of that? @CurtHagenlocher had commented before that with JDBC/ODBC, you get to pick the type you want when you get the data - that's obviously too much flexibility for an Arrow stream, but I don't see why we should limit this to only cases where it is ambiguous to the driver. (In the first place, the underlying data isn't Arrow much of the time; even if it is, ideally the request would be pushed back to the backend.) The driver could certainly ignore requests to cast, of course, especially if it's actually just going to be a cast internally. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
Re: [PR] feat(c/include/arrow-adbc): Add AdbcStatementRequestSchema [arrow-adbc]
paleolimbot commented on PR #3623: URL: https://github.com/apache/arrow-adbc/pull/3623#issuecomment-3521890311 > Is the expectation that the driver itself would cast/convert the types as they come back from the source? Yes, it's intended as guidance where a driver would otherwise have to guess. I wrote it up here as best-effort (same as PyCapsule `requested_schema`)...the other choice would be that the driver is required to produce that schema. I don't happen think that's particularly useful (whatever schema is produced by the output stream is always the stream that should be used to interpret the data), but for the immediate use cases (SQLite and Postgres drivers) it would also not be a problem. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
Re: [PR] feat(c/include/arrow-adbc): Add AdbcStatementRequestSchema [arrow-adbc]
zeroshade commented on PR #3623: URL: https://github.com/apache/arrow-adbc/pull/3623#issuecomment-3519905815 Mostly I'm just trying to cut down the number of extra functions to create. If the idea is that we wouldn't need to create a new function to handle multiple result sets, then I'm okay with that. And that just leaves my first question -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
Re: [PR] feat(c/include/arrow-adbc): Add AdbcStatementRequestSchema [arrow-adbc]
lidavidm commented on PR #3623: URL: https://github.com/apache/arrow-adbc/pull/3623#issuecomment-3519320223 Wouldn't you just call this in between NextResultSetSchema and NextResultSet? We'd just generalize NextResultSetSchema. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
Re: [PR] feat(c/include/arrow-adbc): Add AdbcStatementRequestSchema [arrow-adbc]
zeroshade commented on PR #3623: URL: https://github.com/apache/arrow-adbc/pull/3623#issuecomment-3518831086 two questions: 1. Is the expectation that the driver itself would cast/convert the types as they come back from the source? 2. Since we're also working on and discussing handling multiple result sets, how would this work/handle the case of multiple result sets? Would we need an entirely new function once we add support for multiple result sets to handle this for that case? Is there any way we can adapt this function so that it can work in both cases somehow instead of needing to create an entirely new function afterwards? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
Re: [PR] feat(c/include/arrow-adbc): Add AdbcStatementRequestSchema [arrow-adbc]
paleolimbot commented on PR #3623: URL: https://github.com/apache/arrow-adbc/pull/3623#issuecomment-3518246126 I think the reason for adding this would be for a user who executed and collected a query but got an error along the lines of "can't append string to column of type int64", which is what would happen today if someone issued a query with the sqllite driver and first few thousand values were null. The second reason is for people who already know the arrow type they want, perhaps because they're using a tool like ibis that already knows or because they executed the query and the driver guessed something they didn't like or that was inconsistent with a previous query against the same data. I should probably take out the negotiation bit since it seems to be distracting from the intent...it's orthogonal to options, which would only affect the default guessing (but still have no way to specifically specify type parameters or specific problematic columns). (These are all roughly the same reason why a requested schema is provided for a csv reader.) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
Re: [PR] feat(c/include/arrow-adbc): Add AdbcStatementRequestSchema [arrow-adbc]
amoeba commented on PR #3623: URL: https://github.com/apache/arrow-adbc/pull/3623#issuecomment-3518006425 This looks pretty good. On one hand, it's not the most ergonomic way to handle this but it's the most regular with the existing API. Am I right that the most convenient way to use this API may be to first call `AdbcStatementExecuteSchema` (and hope it's implemented), then modify the returned schema to my desire, call `AdbcStatementRequestSchema` with the modified schema, and then execute my statement? Another question, should it be spelled out that the schema can be partial or not? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
Re: [PR] feat(c/include/arrow-adbc): Add AdbcStatementRequestSchema [arrow-adbc]
paleolimbot commented on PR #3623: URL: https://github.com/apache/arrow-adbc/pull/3623#issuecomment-3454380931 > I wonder if we also want to somehow allow things like "read 'field_name' as string view, but leave other fields alone" or "read all 'database_type' as string view, but leave other fields alone" Sure, or some version of an option get/set with of a list of supported or preferred types. Happy to take out the negotiation bit if that's confusing (the main purpose here is to help the driver make better choices for loose or row-based typing). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
Re: [PR] feat(c/include/arrow-adbc): Add AdbcStatementRequestSchema [arrow-adbc]
lidavidm commented on PR #3623: URL: https://github.com/apache/arrow-adbc/pull/3623#issuecomment-3453816347 Thanks for this! I wonder if we also want to somehow allow things like "read 'field_name' as string view, but leave other fields alone" or "read all 'database_type' as string view, but leave other fields alone" -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
