[PR] feat: support parsing for parquet writer option [arrow-rs]

2023-10-15 Thread via GitHub
fansehep opened a new pull request, #4938: URL: https://github.com/apache/arrow-rs/pull/4938 # Which issue does this PR close? part https://github.com/apache/arrow-rs/issues/4693 # Rationale for this change # What changes are included in this PR?

Re: [PR] Minor: Assert `streaming_merge` has non empty sort exprs [arrow-datafusion]

2023-10-15 Thread via GitHub
viirya commented on code in PR #7795: URL: https://github.com/apache/arrow-datafusion/pull/7795#discussion_r1360173168 ## datafusion/physical-plan/src/sorts/streaming_merge.rs: ## @@ -60,6 +60,11 @@ pub fn streaming_merge( fetch: Option, reservation: MemoryReservation,

Re: [PR] Minor: Assert `streaming_merge` has non empty sort exprs [arrow-datafusion]

2023-10-15 Thread via GitHub
viirya commented on code in PR #7795: URL: https://github.com/apache/arrow-datafusion/pull/7795#discussion_r1360172294 ## datafusion/physical-plan/src/sorts/cursor.rs: ## @@ -207,8 +211,12 @@ pub struct FieldCursor { } impl FieldCursor { -/// Create a new [`FieldCursor`]

Re: [PR] Minor: Assert `streaming_merge` has non empty sort exprs [arrow-datafusion]

2023-10-15 Thread via GitHub
jackwener commented on code in PR #7795: URL: https://github.com/apache/arrow-datafusion/pull/7795#discussion_r1360151197 ## datafusion/physical-plan/src/sorts/cursor.rs: ## @@ -48,16 +48,17 @@ impl std::fmt::Debug for RowCursor { impl RowCursor { /// Create a new SortKe

Re: [PR] Add documentation and usability for prepared parameters [arrow-datafusion]

2023-10-15 Thread via GitHub
jackwener commented on code in PR #7785: URL: https://github.com/apache/arrow-datafusion/pull/7785#discussion_r1360149190 ## datafusion/expr/src/expr.rs: ## @@ -599,10 +600,13 @@ impl InSubquery { } } -/// Placeholder +/// Placeholder, representing bind parameter values

Re: [I] How to parallelize RecordBatch reading? [arrow]

2023-10-15 Thread via GitHub
mapleFU commented on issue #38275: URL: https://github.com/apache/arrow/issues/38275#issuecomment-1763767160 https://stackoverflow.com/questions/18883414/evaluation-of-list-comprehensions-in-python From the link above, I've a long time doesn't write Python, but I remember only compre

Re: [PR] GH-37753: [C++][Gandiva] Add external function registry support [arrow]

2023-10-15 Thread via GitHub
kou commented on PR #38116: URL: https://github.com/apache/arrow/pull/38116#issuecomment-1763761792 I like the API set. We may need to improve the signatures but the core concept (users can register function metadata and function implementation at once) will not be changed. -- This is an

Re: [PR] GH-38228: [R] Fence examples that need dataset with `examplesIf` [arrow]

2023-10-15 Thread via GitHub
conbench-apache-arrow[bot] commented on PR #38229: URL: https://github.com/apache/arrow/pull/38229#issuecomment-176375 After merging your PR, Conbench analyzed the 6 benchmarking runs that have been run so far on merge-commit 84a4ab18209ae3677a1227cc42e5a52d91fc6f86. There was 1 b

Re: [I] [R] Rename read_ipc_file to read_arrow_file & highlight arrow over feather [arrow]

2023-10-15 Thread via GitHub
Divyansh200102 commented on issue #20472: URL: https://github.com/apache/arrow/issues/20472#issuecomment-1763707160 take -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

Re: [PR] GH-20472:[Docs] Renamed read_ipc_file to read_arrow_file & highlighted arrow over feather [arrow]

2023-10-15 Thread via GitHub
github-actions[bot] commented on PR #38278: URL: https://github.com/apache/arrow/pull/38278#issuecomment-1763685566 :warning: GitHub issue #20472 **has been automatically assigned in GitHub** to PR creator. -- This is an automated message from the Apache Git Service. To respond to the mes

[PR] GH-20472:[Docs] Renamed read_ipc_file to read_arrow_file & highlighted arrow over feather [arrow]

2023-10-15 Thread via GitHub
Divyansh200102 opened a new pull request, #38278: URL: https://github.com/apache/arrow/pull/38278 * closes: #20472 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubs

Re: [PR] GH-37753: [C++][Gandiva] Add external function registry support [arrow]

2023-10-15 Thread via GitHub
js8544 commented on PR #38116: URL: https://github.com/apache/arrow/pull/38116#issuecomment-1763652509 This looks very reasonable. Having a new class that orchestrates these registration processes ensures both convenience and modularity. -- This is an automated message from the Apache Git

Re: [PR] [GH-37751] [C++][Gandiva] Avoid registering exported functions multiple times in gandiva [arrow]

2023-10-15 Thread via GitHub
js8544 commented on PR #37752: URL: https://github.com/apache/arrow/pull/37752#issuecomment-1763643078 Thanks! LGTM now. Will need @pitrou's approval to merge. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL ab

Re: [PR] GH-38271: [C++][Parquet] Support reading parquet files with multiple gzip members [arrow]

2023-10-15 Thread via GitHub
mapleFU commented on PR #38272: URL: https://github.com/apache/arrow/pull/38272#issuecomment-1763641141 @amassalha How does the gzip streaming parquet file generated? This patch looks general ok but I want to know how other system support file like this -- This is an automated message fro

Re: [PR] feat(7181): add cursor slicing [arrow-datafusion]

2023-10-15 Thread via GitHub
wiedld commented on code in PR #7798: URL: https://github.com/apache/arrow-datafusion/pull/7798#discussion_r1360039880 ## datafusion/physical-plan/src/sorts/cursor.rs: ## @@ -432,4 +501,119 @@ mod tests { b.advance(); assert_eq!(a.cmp(&b), Ordering::Less);

Re: [PR] feat(7181): add cursor slicing [arrow-datafusion]

2023-10-15 Thread via GitHub
wiedld commented on code in PR #7798: URL: https://github.com/apache/arrow-datafusion/pull/7798#discussion_r1360039803 ## datafusion/physical-plan/src/sorts/cursor.rs: ## @@ -275,6 +324,26 @@ impl Cursor for FieldCursor { self.offset += 1; t } + +fn sl

Re: [PR] feat(7181): add cursor slicing [arrow-datafusion]

2023-10-15 Thread via GitHub
wiedld commented on code in PR #7798: URL: https://github.com/apache/arrow-datafusion/pull/7798#discussion_r1360039344 ## datafusion/physical-plan/src/sorts/cursor.rs: ## @@ -432,4 +501,119 @@ mod tests { b.advance(); assert_eq!(a.cmp(&b), Ordering::Less);

Re: [PR] GH-37753: [C++][Gandiva] Add external function registry support [arrow]

2023-10-15 Thread via GitHub
niyue commented on PR #38116: URL: https://github.com/apache/arrow/pull/38116#issuecomment-1763638136 > as you mentioned there is also the dynamic library approach which also doesn't require IRs, but IMO it won't be as convenient as using function stubs I see. I am aware of the functi

Re: [PR] GH-37979: [C++] Add support for specifying custom Array opening and closing delimiters to `arrow::PrettyPrintDelimiters` [arrow]

2023-10-15 Thread via GitHub
conbench-apache-arrow[bot] commented on PR #38187: URL: https://github.com/apache/arrow/pull/38187#issuecomment-1763633326 After merging your PR, Conbench analyzed the 6 benchmarking runs that have been run so far on merge-commit cc1dc6a63f4f4128d110d86ca8a21693c894cb9f. There were no

Re: [PR] GH-37753: [C++][Gandiva] Add external function registry support [arrow]

2023-10-15 Thread via GitHub
niyue commented on code in PR #38116: URL: https://github.com/apache/arrow/pull/38116#discussion_r1360007697 ## cpp/src/gandiva/cmake/GenerateBitcode.cmake: ## Review Comment: > Can I push a change to this branch directly? Sure. I believe the change is minor and it i

Re: [PR] [GH-37751] [C++][Gandiva] Avoid registering exported functions multiple times in gandiva [arrow]

2023-10-15 Thread via GitHub
niyue commented on code in PR #37752: URL: https://github.com/apache/arrow/pull/37752#discussion_r1360006637 ## cpp/src/gandiva/exported_funcs_registry.h: ## @@ -21,34 +21,31 @@ #include #include +#include namespace gandiva { class ExportedFuncsBase; /// Registry

Re: [PR] GH-38271: [C++,Parquet] Support reading parquet files with multiple gzip members [arrow]

2023-10-15 Thread via GitHub
wgtmac commented on code in PR #38272: URL: https://github.com/apache/arrow/pull/38272#discussion_r1360005410 ## cpp/src/arrow/util/compression_zlib.cc: ## @@ -392,37 +395,44 @@ class GZipCodec : public Codec { return 0; } -// Reset the stream for this block -

[I] Reading Avro files supports other types [arrow-datafusion]

2023-10-15 Thread via GitHub
Asura7969 opened a new issue, #7828: URL: https://github.com/apache/arrow-datafusion/issues/7828 ### Is your feature request related to a problem or challenge? I am now integrating [incubator-paimon](https://github.com/apache/incubator-paimon)(it is a streaming data lake platform), w

Re: [PR] GH-37655: [C++] Allow joins of large tables in Acero [arrow]

2023-10-15 Thread via GitHub
amoeba commented on PR #37709: URL: https://github.com/apache/arrow/pull/37709#issuecomment-1763573360 Hey @oliviermeslin, I ran your example from https://github.com/apache/arrow/issues/37655 w/o this patch and got the expected error after the script printed "Doing the join with 9 variables

Re: [PR] fix reStructuredText link markup [arrow]

2023-10-15 Thread via GitHub
kou commented on PR #38276: URL: https://github.com/apache/arrow/pull/38276#issuecomment-1763569640 Could you read the auto-generated comment? https://github.com/apache/arrow/pull/38276#issuecomment-1763484822 And could you read our pull request template and follow it instead of rem

Re: [PR] GH-38243: [CI][Python] Add missing dataset marker for dataset encryption tests [arrow]

2023-10-15 Thread via GitHub
conbench-apache-arrow[bot] commented on PR #38244: URL: https://github.com/apache/arrow/pull/38244#issuecomment-1763539773 After merging your PR, Conbench analyzed the 5 benchmarking runs that have been run so far on merge-commit 8d9bab3234f4d7c747b32b82de03011663479b57. There were no

Re: [I] Use https://arrow.apache.org/julia/ as the official Website URL [arrow-julia]

2023-10-15 Thread via GitHub
Moelf commented on issue #470: URL: https://github.com/apache/arrow-julia/issues/470#issuecomment-1763509272 do you not have commit right to do that? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

Re: [I] Use https://arrow.apache.org/julia/ as the official Website URL [arrow-julia]

2023-10-15 Thread via GitHub
kou commented on issue #470: URL: https://github.com/apache/arrow-julia/issues/470#issuecomment-1763505241 No. We need to update URLs at least in the followings as I wrote in the description: > * `.asf.yaml` > > * `README.md` > > * `docs/make.jl`

Re: [PR] CompatHelper: bump compat for CodecZstd to 0.8, (keep existing compat) [arrow-julia]

2023-10-15 Thread via GitHub
baumgold closed pull request #483: CompatHelper: bump compat for CodecZstd to 0.8, (keep existing compat) URL: https://github.com/apache/arrow-julia/pull/483 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above t

Re: [PR] CompatHelper: bump compat for CodecZstd to 0.8, (keep existing compat) [arrow-julia]

2023-10-15 Thread via GitHub
baumgold commented on PR #483: URL: https://github.com/apache/arrow-julia/pull/483#issuecomment-1763503444 Superseded by #488 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

Re: [PR] bump [arrow-julia]

2023-10-15 Thread via GitHub
baumgold merged PR #488: URL: https://github.com/apache/arrow-julia/pull/488 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.ap

Re: [PR] ARROW-2034: [C++] Filesystem implementation for Azure Blob Storage [arrow]

2023-10-15 Thread via GitHub
Tom-Newton commented on PR #12914: URL: https://github.com/apache/arrow/pull/12914#issuecomment-1763501839 > So where are things? The last I recall someone was planning on contributing the skeleton iteration. Did that come to pass? Thanks Skeleton: https://github.com/apache/arrow/pull

Re: [PR] GH-38226: [R] Remove R 3.5 from test-r-versions [arrow]

2023-10-15 Thread via GitHub
conbench-apache-arrow[bot] commented on PR #38230: URL: https://github.com/apache/arrow/pull/38230#issuecomment-1763501342 After merging your PR, Conbench analyzed the 6 benchmarking runs that have been run so far on merge-commit ec08625717c209ff3dfa88f107f9675ba39ce97e. There were no

Re: [PR] CompatHelper: bump compat for CodecZstd to 0.8, (keep existing compat) [arrow-julia]

2023-10-15 Thread via GitHub
Moelf commented on PR #483: URL: https://github.com/apache/arrow-julia/pull/483#issuecomment-1763501036 can we merge this? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. T

Re: [PR] ARROW-2034: [C++] Filesystem implementation for Azure Blob Storage [arrow]

2023-10-15 Thread via GitHub
av8or1 commented on PR #12914: URL: https://github.com/apache/arrow/pull/12914#issuecomment-1763498370 After a considerable amount of working through the legal aspect of contributing back to open source that is in place at the company where I am presently employed, I am cleared to work on t

Re: [PR] ARROW-2034: [C++] Filesystem implementation for Azure Blob Storage [arrow]

2023-10-15 Thread via GitHub
kou commented on PR #12914: URL: https://github.com/apache/arrow/pull/12914#issuecomment-1763497567 15.0.0 or 16.0.0? If you join developing this, you may be able to control it. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

Re: [PR] Update dataset.rst [arrow]

2023-10-15 Thread via GitHub
github-actions[bot] commented on PR #38277: URL: https://github.com/apache/arrow/pull/38277#issuecomment-1763489273 Thanks for opening a pull request! If this is not a [minor PR](https://github.com/apache/arrow/blob/main/CONTRIBUTING.md#Minor-Fixes). Could you open an issue f

[PR] Update dataset.rst [arrow]

2023-10-15 Thread via GitHub
Divyansh200102 opened a new pull request, #38277: URL: https://github.com/apache/arrow/pull/38277 closes #36044 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscri

Re: [I] [Python][Docs] Add ParquetFileFragment to the API reference docs [arrow]

2023-10-15 Thread via GitHub
Divyansh200102 commented on issue #36044: URL: https://github.com/apache/arrow/issues/36044#issuecomment-1763488947 take -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

Re: [I] [Python][Docs] Add ParquetFileFragment to the API reference docs [arrow]

2023-10-15 Thread via GitHub
Divyansh200102 commented on issue #36044: URL: https://github.com/apache/arrow/issues/36044#issuecomment-1763486694 take -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

Re: [PR] fix reStructuredText link markup [arrow]

2023-10-15 Thread via GitHub
github-actions[bot] commented on PR #38276: URL: https://github.com/apache/arrow/pull/38276#issuecomment-1763484822 Thanks for opening a pull request! If this is not a [minor PR](https://github.com/apache/arrow/blob/main/CONTRIBUTING.md#Minor-Fixes). Could you open an issue f

[PR] fix reStructuredText link markup [arrow]

2023-10-15 Thread via GitHub
Divyansh200102 opened a new pull request, #38276: URL: https://github.com/apache/arrow/pull/38276 fixes #35369 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscrib

Re: [I] [Docs] reStructuredText link markup is broken [arrow]

2023-10-15 Thread via GitHub
Divyansh200102 commented on issue #35369: URL: https://github.com/apache/arrow/issues/35369#issuecomment-1763478578 hi can you please tell me how do i procced with this issue? @kou -- This is an automated message from the Apache Git Service. To respond to the message, please log on to Git

Re: [I] [Docs] reStructuredText link markup is broken [arrow]

2023-10-15 Thread via GitHub
Divyansh200102 commented on issue #35369: URL: https://github.com/apache/arrow/issues/35369#issuecomment-1763478277 take -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

Re: [PR] GH-37655: [C++] Allow joins of large tables in Acero [arrow]

2023-10-15 Thread via GitHub
oliviermeslin commented on PR #37709: URL: https://github.com/apache/arrow/pull/37709#issuecomment-1763474237 Thanks @westonpace ! I already prepared some tests (one using artificial data, the other one using real data). Unfortunately, I could not install `arrow` as an `R` _package_ from my

Re: [I] [R][Docs] Version list is not updated [arrow]

2023-10-15 Thread via GitHub
Divyansh200102 commented on issue #38273: URL: https://github.com/apache/arrow/issues/38273#issuecomment-1763473620 i think the website you just mentioned mentions the older version of the docs for the users. @eitsupi r\_pkgdown.yml: 352components: 353: older_versi

Re: [I] [R][Docs] Version list is not updated [arrow]

2023-10-15 Thread via GitHub
Divyansh200102 commented on issue #38273: URL: https://github.com/apache/arrow/issues/38273#issuecomment-1763473088 take -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

Re: [PR] GH-35260: [C++][Python][R] Allow users to adjust S3 log level by environment variable [arrow]

2023-10-15 Thread via GitHub
thisisnic commented on PR #38267: URL: https://github.com/apache/arrow/pull/38267#issuecomment-1763472342 > Failed checks look unrelated: > > * https://github.com/apache/arrow/actions/runs/6522175664/job/17711741947?pr=38267 looks like bioconductor was down > > * http

Re: [PR] Implement GetIndexedField for map-typed columns [arrow-datafusion]

2023-10-15 Thread via GitHub
swgillespie commented on PR #7825: URL: https://github.com/apache/arrow-datafusion/pull/7825#issuecomment-1763468069 @alamb no problem - done! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

Re: [I] [R][Docs] Version list is not updated [arrow]

2023-10-15 Thread via GitHub
Divyansh200102 commented on issue #38273: URL: https://github.com/apache/arrow/issues/38273#issuecomment-1763466991 in which file exactly do you want me to make the changes?? @eitsupi -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

Re: [I] How to parallelize RecordBatch reading? [arrow]

2023-10-15 Thread via GitHub
Luosuu commented on issue #38275: URL: https://github.com/apache/arrow/issues/38275#issuecomment-1763465190 Hi, I have some confusions here. I think actual data reading only happens at when I execute take operation for each RecordBatch and `pa.ipc.open_stream(memory_mapped_stream).re

[I] Add sql support for `DISTINCT ON` [arrow-datafusion]

2023-10-15 Thread via GitHub
universalmind303 opened a new issue, #7827: URL: https://github.com/apache/arrow-datafusion/issues/7827 ### Is your feature request related to a problem or challenge? It'd be nice to be able to use `DISTINCT ON` expressions ### Describe the solution you'd like sql support

Re: [PR] GH-38200: [CI][Release][Go] Ensure removing all module caches [arrow]

2023-10-15 Thread via GitHub
conbench-apache-arrow[bot] commented on PR #38222: URL: https://github.com/apache/arrow/pull/38222#issuecomment-1763460368 After merging your PR, Conbench analyzed the 5 benchmarking runs that have been run so far on merge-commit bb5ebbf0debd5851aa11c97ba432ecc3fa7424cb. There were no

Re: [I] How to parallelize RecordBatch reading? [arrow]

2023-10-15 Thread via GitHub
mapleFU commented on issue #38275: URL: https://github.com/apache/arrow/issues/38275#issuecomment-1763453732 What's the version of arrow are you using? 1. During read a single ipc file, there is a `use_thread` argument, which enable using the user-passed executor or system default exe

Re: [I] [Python] Read table stuck and hangs forever [arrow]

2023-10-15 Thread via GitHub
chriss1245 commented on issue #37139: URL: https://github.com/apache/arrow/issues/37139#issuecomment-1763451260 Something similar happens in my case. I am using iter_batches from a ParquetFile in order to create a generator for tensorflow. The loop is quite simple. ```python for

Re: [I] How to parallelize RecordBatch reading? [arrow]

2023-10-15 Thread via GitHub
Luosuu commented on issue #38275: URL: https://github.com/apache/arrow/issues/38275#issuecomment-1763450564 @mapleFU Thank you for reply! Actually in my scenario I have a very long list of indices lists so I will need to repeat the I/O operation many times. The major performance conce

Re: [I] How to parallelize RecordBatch reading? [arrow]

2023-10-15 Thread via GitHub
mapleFU commented on issue #38275: URL: https://github.com/apache/arrow/issues/38275#issuecomment-1763448513 How long does this actually takes: ``` mmap_files = [pa.memory_map(os.path.join(dir_path, file_name), 'r') for file_name in file_names] mmap_tables = [pa.ipc.open_stream(

Re: [PR] GH-35260: [C++][Python][R] Allow users to adjust S3 log level by environment variable [arrow]

2023-10-15 Thread via GitHub
amoeba commented on PR #38267: URL: https://github.com/apache/arrow/pull/38267#issuecomment-1763444036 Failed checks look unrelated: - https://github.com/apache/arrow/actions/runs/6522175664/job/17711741947?pr=38267 looks like bioconductor was down - https://github.com/apache/arrow/

Re: [PR] add regr_* functions [arrow-datafusion-python]

2023-10-15 Thread via GitHub
andygrove merged PR #499: URL: https://github.com/apache/arrow-datafusion-python/pull/499 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubsc

Re: [PR] Refactor Statistics [arrow-datafusion]

2023-10-15 Thread via GitHub
ozankabak commented on code in PR #7793: URL: https://github.com/apache/arrow-datafusion/pull/7793#discussion_r1359346687 ## datafusion/physical-expr/src/intervals/interval_aritmetic.rs: ## @@ -1730,20 +1757,23 @@ mod tests { ), ]; for interval in

Re: [PR] GH-37510: [C++] Don't install bundled Azure SDK for C++ [arrow]

2023-10-15 Thread via GitHub
conbench-apache-arrow[bot] commented on PR #38176: URL: https://github.com/apache/arrow/pull/38176#issuecomment-1763417089 After merging your PR, Conbench analyzed the 6 benchmarking runs that have been run so far on merge-commit 7047e63f6f5fca43f6f5f58cf0f711b4590f92b4. There were no

Re: [I] [R][Docs] Version list is not updated [arrow]

2023-10-15 Thread via GitHub
Divyansh200102 commented on issue #38273: URL: https://github.com/apache/arrow/issues/38273#issuecomment-1763388972 take -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

Re: [I] [R][Docs] Version list is not updated [arrow]

2023-10-15 Thread via GitHub
eitsupi commented on issue #38273: URL: https://github.com/apache/arrow/issues/38273#issuecomment-1763388486 @Divyansh200102 Glad to hear that. I think you can assign this issue for you via commenting "take" on this issue. https://github.com/apache/arrow/blob/c5bce96ba626ad255

Re: [I] [R][Docs] Version list is not updated [arrow]

2023-10-15 Thread via GitHub
Divyansh200102 commented on issue #38273: URL: https://github.com/apache/arrow/issues/38273#issuecomment-1763386932 i want to work on this @eitsupi -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go t

Re: [I] Run DataFusion benchmarks regularly and track performance history over time [arrow-datafusion]

2023-10-15 Thread via GitHub
epompeii commented on issue #5504: URL: https://github.com/apache/arrow-datafusion/issues/5504#issuecomment-1763385917 @alamb the port of InfluxDB over to Rust is super cool. Congrats! I'm considering using it long term, if/when Bencher needs a supplemental backend for results storage.

Re: [I] [Python] pyarrow.compute.Expression is in datasets API instead of compute API [arrow]

2023-10-15 Thread via GitHub
Divyansh200102 commented on issue #36428: URL: https://github.com/apache/arrow/issues/36428#issuecomment-1763384250 i want to work on this @westonpace -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to g

Re: [PR] GH:38211 [MATLAB] Add support for creating an empty arrow.tabular.RecordBatch by calling arrow.recordBatch with no input arguments [arrow]

2023-10-15 Thread via GitHub
github-actions[bot] commented on PR #38274: URL: https://github.com/apache/arrow/pull/38274#issuecomment-1763370733 Thanks for opening a pull request! If this is not a [minor PR](https://github.com/apache/arrow/blob/main/CONTRIBUTING.md#Minor-Fixes). Could you open an issue f

[PR] GH:38211 [MATLAB] Add support for creating an empty arrow.tabular.RecordBatch by calling arrow.recordBatch with no input arguments [arrow]

2023-10-15 Thread via GitHub
Divyansh200102 opened a new pull request, #38274: URL: https://github.com/apache/arrow/pull/38274 closes #38211 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscri

Re: [PR] GH-37592: [MATLAB] Add `NumRows` property to `arrow.tabular.RecordBatch` [arrow]

2023-10-15 Thread via GitHub
conbench-apache-arrow[bot] commented on PR #38215: URL: https://github.com/apache/arrow/pull/38215#issuecomment-1763370626 After merging your PR, Conbench analyzed the 5 benchmarking runs that have been run so far on merge-commit ef02417d40af7b970c5eafe809f256380cda2fab. There were no

Re: [I] [MATLAB] Add support for creating an empty `arrow.tabular.RecordBatch` by calling `arrow.recordBatch` with no input arguments [arrow]

2023-10-15 Thread via GitHub
Divyansh200102 commented on issue #38211: URL: https://github.com/apache/arrow/issues/38211#issuecomment-1763368574 i want to work on this @kevingurney -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

Re: [PR] GH-38216: [R] open_dataset(format = "json") not documented [arrow]

2023-10-15 Thread via GitHub
Divyansh200102 commented on PR #38258: URL: https://github.com/apache/arrow/pull/38258#issuecomment-1763363192 is it fine now? and since i am editing on the github editor can u please run devtools::document() in R and push the changes in open_dataset.Rd in yourself -- This is an automated

Re: [PR] bump [arrow-julia]

2023-10-15 Thread via GitHub
ericphanson commented on code in PR #488: URL: https://github.com/apache/arrow-julia/pull/488#discussion_r1359859971 ## Project.toml: ## @@ -50,5 +50,5 @@ PooledArrays = "0.5, 1.0" SentinelArrays = "1" Tables = "1.1" TimeZones = "1" -TranscodingStreams = "0.9.12" +Transcoding

Re: [PR] bump [arrow-julia]

2023-10-15 Thread via GitHub
ericphanson commented on code in PR #488: URL: https://github.com/apache/arrow-julia/pull/488#discussion_r1359859837 ## Project.toml: ## @@ -50,5 +50,5 @@ PooledArrays = "0.5, 1.0" SentinelArrays = "1" Tables = "1.1" TimeZones = "1" -TranscodingStreams = "0.9.12" +Transcoding

Re: [I] Empty strings not interpreted as null when reading CSV files [arrow-datafusion]

2023-10-15 Thread via GitHub
haohuaijin commented on issue #7797: URL: https://github.com/apache/arrow-datafusion/issues/7797#issuecomment-1763358333 I find arrow-csv also have the above problem and seem like arrow-csv never set string to null, see below link https://github.com/apache/arrow-rs/blob/bb8e42f6392284f4

Re: [PR] Extract ReceiverStreamBuilder [arrow-datafusion]

2023-10-15 Thread via GitHub
alamb commented on PR #7817: URL: https://github.com/apache/arrow-datafusion/pull/7817#issuecomment-1763344062 I plan to merge this PR when the CI passes -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above t

Re: [PR] Updated sort.rs to show `TopK` [arrow-datafusion]

2023-10-15 Thread via GitHub
alamb closed pull request #7751: Updated sort.rs to show `TopK` URL: https://github.com/apache/arrow-datafusion/pull/7751 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To uns

Re: [I] Update explain plan to show when topk operator is used [arrow-datafusion]

2023-10-15 Thread via GitHub
alamb closed issue #7750: Update explain plan to show when topk operator is used URL: https://github.com/apache/arrow-datafusion/issues/7750 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specif

Re: [PR] Update explain plan to show `TopK` operator [arrow-datafusion]

2023-10-15 Thread via GitHub
alamb merged PR #7826: URL: https://github.com/apache/arrow-datafusion/pull/7826 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arro

Re: [PR] Implement GetIndexedField for map-typed columns [arrow-datafusion]

2023-10-15 Thread via GitHub
alamb commented on PR #7825: URL: https://github.com/apache/arrow-datafusion/pull/7825#issuecomment-1763343451 @swgillespie I think you could avoid adding the arrow-ord dependency, but I don't think that is critical (as arrow-ord is already a transitive dependency via arrow anyways) --

Re: [PR] Implement GetIndexedField for map-typed columns [arrow-datafusion]

2023-10-15 Thread via GitHub
alamb commented on code in PR #7825: URL: https://github.com/apache/arrow-datafusion/pull/7825#discussion_r1359846961 ## datafusion/physical-expr/src/expressions/get_indexed_field.rs: ## @@ -183,6 +186,14 @@ impl PhysicalExpr for GetIndexedFieldExpr { let array = self.a

Re: [PR] GH-36831: [C++] DictionaryArray support for MinMax Function [arrow]

2023-10-15 Thread via GitHub
js8544 commented on code in PR #37100: URL: https://github.com/apache/arrow/pull/37100#discussion_r1359847075 ## cpp/src/arrow/compute/kernels/aggregate_basic.cc: ## @@ -516,7 +537,10 @@ void AddMinOrMaxAggKernel(ScalarAggregateFunction* func, // Note SIMD level is always N

Re: [PR] Add GetOptions::head [arrow-rs]

2023-10-15 Thread via GitHub
tustvold merged PR #4931: URL: https://github.com/apache/arrow-rs/pull/4931 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apa

Re: [PR] Use code block for better formatting of rustdoc for PhysicalGroupBy [arrow-datafusion]

2023-10-15 Thread via GitHub
alamb merged PR #7823: URL: https://github.com/apache/arrow-datafusion/pull/7823 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arro

Re: [PR] WIP: Simplify Substrait join logic [arrow-datafusion]

2023-10-15 Thread via GitHub
alamb closed pull request #7819: WIP: Simplify Substrait join logic URL: https://github.com/apache/arrow-datafusion/pull/7819 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

Re: [I] Run DataFusion benchmarks regularly and track performance history over time [arrow-datafusion]

2023-10-15 Thread via GitHub
alamb commented on issue #5504: URL: https://github.com/apache/arrow-datafusion/issues/5504#issuecomment-1763339527 Hi @Smurphy000 that would be amazing 🙏 . This is one of the issues I think is critical to the long term success of DataFusion but has been hard to attract attention for.

Re: [PR] Minor: Move `Monotonicity` to `expr` crate [arrow-datafusion]

2023-10-15 Thread via GitHub
alamb merged PR #7820: URL: https://github.com/apache/arrow-datafusion/pull/7820 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arro

Re: [PR] GH-38263 [C++]: Prefer to call string_view::data() instead of begin() where a char pointer is expected [arrow]

2023-10-15 Thread via GitHub
raulcd commented on PR #38265: URL: https://github.com/apache/arrow/pull/38265#issuecomment-1763332622 > This should IMO not miss the 14.0 release I agree, I'll add it to 14.0.0 -- This is an automated message from the Apache Git Service. To respond to the message, please log on

Re: [PR] GH-37378: [C++] Add A Dictionary Compaction Function For DictionaryArray [arrow]

2023-10-15 Thread via GitHub
conbench-apache-arrow[bot] commented on PR #37418: URL: https://github.com/apache/arrow/pull/37418#issuecomment-1763330417 After merging your PR, Conbench analyzed the 6 benchmarking runs that have been run so far on merge-commit 73454b7040fbea3a187c1bfabd7ea02d46ca3c41. There were 4

Re: [PR] GH-38263 [C++]: Prefer to call string_view::data() instead of begin() where a char pointer is expected [arrow]

2023-10-15 Thread via GitHub
h-vetinari commented on PR #38265: URL: https://github.com/apache/arrow/pull/38265#issuecomment-1763314780 FYI @jorisvandenbossche @raulcd This should IMO not miss the 14.0 release; risk should be minimal (for reference, I backported it all the way back to 10.x on the conda-forge feedst