[GitHub] [arrow] cyb70289 commented on pull request #7603: ARROW-9206: [C++][Flight] Add latency benchmark

2020-07-01 Thread GitBox
cyb70289 commented on pull request #7603: URL: https://github.com/apache/arrow/pull/7603#issuecomment-652237692 CI failure reproduces [[Python][C++] Non-deterministic segfault in "AMD64 MacOS 10.15 Python 3.7" build](https://issues.apache.org/jira/browse/ARROW-8999) --

[GitHub] [arrow] liyafan82 commented on a change in pull request #7544: ARROW-7285: [C++] ensure C++ implementation meets clarified dictionary spec

2020-07-01 Thread GitBox
liyafan82 commented on a change in pull request #7544: URL: https://github.com/apache/arrow/pull/7544#discussion_r448166294 ## File path: cpp/src/arrow/ipc/reader.cc ## @@ -684,7 +685,19 @@ Status ReadDictionary(const Buffer& metadata, DictionaryMemo* dictionary_memo, ret

[GitHub] [arrow] liyafan82 commented on a change in pull request #7544: ARROW-7285: [C++] ensure C++ implementation meets clarified dictionary spec

2020-07-01 Thread GitBox
liyafan82 commented on a change in pull request #7544: URL: https://github.com/apache/arrow/pull/7544#discussion_r448166650 ## File path: cpp/src/arrow/ipc/reader.cc ## @@ -684,7 +685,19 @@ Status ReadDictionary(const Buffer& metadata, DictionaryMemo* dictionary_memo, ret

[GitHub] [arrow] liyafan82 commented on a change in pull request #7544: ARROW-7285: [C++] ensure C++ implementation meets clarified dictionary spec

2020-07-01 Thread GitBox
liyafan82 commented on a change in pull request #7544: URL: https://github.com/apache/arrow/pull/7544#discussion_r448167615 ## File path: cpp/src/arrow/ipc/read_write_test.cc ## @@ -1228,6 +1228,152 @@ TEST_P(TestFileFormat, RoundTrip) { TestZeroLengthRoundTrip(*GetParam(),

[GitHub] [arrow] liyafan82 commented on a change in pull request #7544: ARROW-7285: [C++] ensure C++ implementation meets clarified dictionary spec

2020-07-01 Thread GitBox
liyafan82 commented on a change in pull request #7544: URL: https://github.com/apache/arrow/pull/7544#discussion_r448168519 ## File path: cpp/src/arrow/ipc/metadata_internal.h ## @@ -198,7 +198,7 @@ Status WriteDictionaryMessage( const int64_t id, const int64_t length, con

[GitHub] [arrow] liyafan82 commented on a change in pull request #7544: ARROW-7285: [C++] ensure C++ implementation meets clarified dictionary spec

2020-07-01 Thread GitBox
liyafan82 commented on a change in pull request #7544: URL: https://github.com/apache/arrow/pull/7544#discussion_r448168067 ## File path: cpp/src/arrow/ipc/read_write_test.cc ## @@ -1228,6 +1228,152 @@ TEST_P(TestFileFormat, RoundTrip) { TestZeroLengthRoundTrip(*GetParam(),

[GitHub] [arrow] liyafan82 commented on a change in pull request #7544: ARROW-7285: [C++] ensure C++ implementation meets clarified dictionary spec

2020-07-01 Thread GitBox
liyafan82 commented on a change in pull request #7544: URL: https://github.com/apache/arrow/pull/7544#discussion_r448168805 ## File path: cpp/src/arrow/ipc/writer.h ## @@ -341,6 +343,29 @@ class ARROW_EXPORT IpcPayloadWriter { virtual Status Close() = 0; }; +/// Create a

[GitHub] [arrow] liyafan82 commented on a change in pull request #7544: ARROW-7285: [C++] ensure C++ implementation meets clarified dictionary spec

2020-07-01 Thread GitBox
liyafan82 commented on a change in pull request #7544: URL: https://github.com/apache/arrow/pull/7544#discussion_r448168706 ## File path: cpp/src/arrow/ipc/writer.h ## @@ -341,6 +343,29 @@ class ARROW_EXPORT IpcPayloadWriter { virtual Status Close() = 0; }; +/// Create a

[GitHub] [arrow] xhochy commented on pull request #7593: ARROW-9160: [C++] Implement contains for exact matches

2020-07-01 Thread GitBox
xhochy commented on pull request #7593: URL: https://github.com/apache/arrow/pull/7593#issuecomment-652244376 > You match a regex, you don't contain it. That is one of the name clashes, we already have a match kernel.

[GitHub] [arrow] pitrou commented on pull request #7593: ARROW-9160: [C++] Implement contains for exact matches

2020-07-01 Thread GitBox
pitrou commented on pull request #7593: URL: https://github.com/apache/arrow/pull/7593#issuecomment-652244689 `match_regex` then? :-) This is an automated message from the Apache Git Service. To respond to the message, please

[GitHub] [arrow] jorisvandenbossche commented on pull request #7519: ARROW-9017: [C++][Python] Refactor scalar bindings

2020-07-01 Thread GitBox
jorisvandenbossche commented on pull request #7519: URL: https://github.com/apache/arrow/pull/7519#issuecomment-652245303 > the null equality tests look like a nuisance for regular Python usage (`__eq__` should return a boolean) The reason that those scalars return null on equality c

[GitHub] [arrow] pitrou commented on pull request #7519: ARROW-9017: [C++][Python] Refactor scalar bindings

2020-07-01 Thread GitBox
pitrou commented on pull request #7519: URL: https://github.com/apache/arrow/pull/7519#issuecomment-652247947 The problem is that it breaks Python semantics in potentially annoying places: ```python >>> import pyarrow as pa

[GitHub] [arrow] jorisvandenbossche commented on pull request #7519: ARROW-9017: [C++][Python] Refactor scalar bindings

2020-07-01 Thread GitBox
jorisvandenbossche commented on pull request #7519: URL: https://github.com/apache/arrow/pull/7519#issuecomment-652249133 We could "fix" that one by raising in `__bool__` (meaning: it will at least give an error instead of silently returning a wrong answer) ---

[GitHub] [arrow] maartenbreddels commented on pull request #7593: ARROW-9160: [C++] Implement contains for exact matches

2020-07-01 Thread GitBox
maartenbreddels commented on pull request #7593: URL: https://github.com/apache/arrow/pull/7593#issuecomment-652254096 I like the prefixing by `string`. I'm a big fan of ordering 'words' in snake or camel casing for good tab completion and alphabetic ordering, so I agree with @wesm 's prop

[GitHub] [arrow] maartenbreddels edited a comment on pull request #7593: ARROW-9160: [C++] Implement contains for exact matches

2020-07-01 Thread GitBox
maartenbreddels edited a comment on pull request #7593: URL: https://github.com/apache/arrow/pull/7593#issuecomment-652254096 I like the prefixing by `string`. I'm a big fan of ordering 'words' in snake or camel casing for good tab completion and alphabetic ordering, so I agree with @wesm

[GitHub] [arrow] pitrou commented on pull request #7519: ARROW-9017: [C++][Python] Refactor scalar bindings

2020-07-01 Thread GitBox
pitrou commented on pull request #7519: URL: https://github.com/apache/arrow/pull/7519#issuecomment-652256345 Yes, we could. That may have other annoying implications, though (such as `__contains__` not working anymore). I've started a ML discussion. --

[GitHub] [arrow] pitrou closed pull request #7594: ARROW-7654: [Python] Ability to set column_types to a Schema in csv.ConvertOptions is undocumented

2020-07-01 Thread GitBox
pitrou closed pull request #7594: URL: https://github.com/apache/arrow/pull/7594 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [arrow] emkornfield commented on pull request #7604: ARROW-9223: [Python] Propagate Timzone information in pandas conversion

2020-07-01 Thread GitBox
emkornfield commented on pull request #7604: URL: https://github.com/apache/arrow/pull/7604#issuecomment-652259406 CC @wesm @BryanCutler Not sure if doing this correctly will break spark in unintentional ways. This is

[GitHub] [arrow] emkornfield opened a new pull request #7604: ARROW-9223: [Python] Propagate Timzone information in pandas conversion

2020-07-01 Thread GitBox
emkornfield opened a new pull request #7604: URL: https://github.com/apache/arrow/pull/7604 - Ports string_to_timezone to C++ - Causes nested timestamp columns within structs to use conversion to object path. - Copy timezone on to_object path. Open to other suggestions on h

[GitHub] [arrow] pitrou commented on pull request #7603: ARROW-9206: [C++][Flight] Add latency benchmark

2020-07-01 Thread GitBox
pitrou commented on pull request #7603: URL: https://github.com/apache/arrow/pull/7603#issuecomment-652260499 You can't compute latency like that if the test is multithreaded. This is an automated message from the Apache Git

[GitHub] [arrow] kou commented on a change in pull request #7589: ARROW-9276: [Release] Enforce CUDA device for updating the api documentations

2020-07-01 Thread GitBox
kou commented on a change in pull request #7589: URL: https://github.com/apache/arrow/pull/7589#discussion_r448188490 ## File path: dev/release/post-09-docs.sh ## @@ -42,20 +47,20 @@ popd pushd "${ARROW_DIR}" git checkout "${release_tag}" Review comment: FYI: We don't

[GitHub] [arrow] github-actions[bot] commented on pull request #7604: ARROW-9223: [Python] Propagate Timzone information in pandas conversion

2020-07-01 Thread GitBox
github-actions[bot] commented on pull request #7604: URL: https://github.com/apache/arrow/pull/7604#issuecomment-652262316 https://issues.apache.org/jira/browse/ARROW-9223 This is an automated message from the Apache Git Serv

[GitHub] [arrow] maartenbreddels commented on pull request #7593: ARROW-9160: [C++] Implement contains for exact matches

2020-07-01 Thread GitBox
maartenbreddels commented on pull request #7593: URL: https://github.com/apache/arrow/pull/7593#issuecomment-652263794 While at the difficult topic of naming, is there a conversion (agreed or emerging) for naming the functors/ArrayKernelExec implementations? I see in `scalar_string.cc` we

[GitHub] [arrow] cyb70289 commented on pull request #7603: ARROW-9206: [C++][Flight] Add latency benchmark

2020-07-01 Thread GitBox
cyb70289 commented on pull request #7603: URL: https://github.com/apache/arrow/pull/7603#issuecomment-652289408 > You can't compute latency like that if the test is multithreaded. Assume two threads, one finishes `n1` IO in `t1` time, another finishes `n2` IO in `t2` time. So the ave

[GitHub] [arrow] rymurr commented on pull request #7347: ARROW-8230: [Java] Remove netty dependency from arrow-memory

2020-07-01 Thread GitBox
rymurr commented on pull request #7347: URL: https://github.com/apache/arrow/pull/7347#issuecomment-652291140 Thanks a lot @liyafan82 I have addressed your suggestions and rebased This is an automated message from the Apache

[GitHub] [arrow] rymurr commented on a change in pull request #7275: ARROW-6110: [Java][Integration] Support LargeList Type and add integration test with C++

2020-07-01 Thread GitBox
rymurr commented on a change in pull request #7275: URL: https://github.com/apache/arrow/pull/7275#discussion_r448222516 ## File path: java/vector/src/main/java/org/apache/arrow/vector/BitVectorHelper.java ## @@ -73,6 +87,28 @@ public static void setBit(ArrowBuf validityBuffer

[GitHub] [arrow] tobim commented on pull request #7315: ARROW-7605: [C++] Bundle jemalloc into static libarrow

2020-07-01 Thread GitBox
tobim commented on pull request #7315: URL: https://github.com/apache/arrow/pull/7315#issuecomment-652292544 > CMake 3.9 is a bit problematic since we've tried to support CMake >= 3.2 and definitely >= 3.7 In that case the other approach should be pursued. -

[GitHub] [arrow] rymurr commented on a change in pull request #7275: ARROW-6110: [Java][Integration] Support LargeList Type and add integration test with C++

2020-07-01 Thread GitBox
rymurr commented on a change in pull request #7275: URL: https://github.com/apache/arrow/pull/7275#discussion_r448224180 ## File path: java/vector/src/main/java/org/apache/arrow/vector/complex/LargeListVector.java ## @@ -0,0 +1,991 @@ +/* + * Licensed to the Apache Software Fo

[GitHub] [arrow] rymurr commented on a change in pull request #7275: ARROW-6110: [Java][Integration] Support LargeList Type and add integration test with C++

2020-07-01 Thread GitBox
rymurr commented on a change in pull request #7275: URL: https://github.com/apache/arrow/pull/7275#discussion_r448223616 ## File path: java/vector/src/main/java/org/apache/arrow/vector/BitVectorHelper.java ## @@ -37,6 +37,20 @@ private BitVectorHelper() {} + /** + *

[GitHub] [arrow] rymurr commented on a change in pull request #7275: ARROW-6110: [Java][Integration] Support LargeList Type and add integration test with C++

2020-07-01 Thread GitBox
rymurr commented on a change in pull request #7275: URL: https://github.com/apache/arrow/pull/7275#discussion_r448225368 ## File path: java/vector/src/main/java/org/apache/arrow/vector/complex/LargeListVector.java ## @@ -0,0 +1,991 @@ +/* + * Licensed to the Apache Software Fo

[GitHub] [arrow] rymurr commented on a change in pull request #7275: ARROW-6110: [Java][Integration] Support LargeList Type and add integration test with C++

2020-07-01 Thread GitBox
rymurr commented on a change in pull request #7275: URL: https://github.com/apache/arrow/pull/7275#discussion_r448225742 ## File path: java/vector/src/main/java/org/apache/arrow/vector/complex/LargeListVector.java ## @@ -0,0 +1,991 @@ +/* + * Licensed to the Apache Software Fo

[GitHub] [arrow] rymurr commented on a change in pull request #7275: ARROW-6110: [Java][Integration] Support LargeList Type and add integration test with C++

2020-07-01 Thread GitBox
rymurr commented on a change in pull request #7275: URL: https://github.com/apache/arrow/pull/7275#discussion_r448226183 ## File path: java/vector/src/main/java/org/apache/arrow/vector/complex/LargeListVector.java ## @@ -0,0 +1,991 @@ +/* + * Licensed to the Apache Software Fo

[GitHub] [arrow] rymurr commented on a change in pull request #7275: ARROW-6110: [Java][Integration] Support LargeList Type and add integration test with C++

2020-07-01 Thread GitBox
rymurr commented on a change in pull request #7275: URL: https://github.com/apache/arrow/pull/7275#discussion_r448226339 ## File path: java/vector/src/main/java/org/apache/arrow/vector/complex/LargeListVector.java ## @@ -0,0 +1,991 @@ +/* + * Licensed to the Apache Software Fo

[GitHub] [arrow] rymurr commented on a change in pull request #7275: ARROW-6110: [Java][Integration] Support LargeList Type and add integration test with C++

2020-07-01 Thread GitBox
rymurr commented on a change in pull request #7275: URL: https://github.com/apache/arrow/pull/7275#discussion_r448226970 ## File path: java/vector/src/main/java/org/apache/arrow/vector/complex/LargeListVector.java ## @@ -0,0 +1,991 @@ +/* + * Licensed to the Apache Software Fo

[GitHub] [arrow] rymurr commented on a change in pull request #7275: ARROW-6110: [Java][Integration] Support LargeList Type and add integration test with C++

2020-07-01 Thread GitBox
rymurr commented on a change in pull request #7275: URL: https://github.com/apache/arrow/pull/7275#discussion_r448226505 ## File path: java/vector/src/main/java/org/apache/arrow/vector/complex/LargeListVector.java ## @@ -0,0 +1,991 @@ +/* + * Licensed to the Apache Software Fo

[GitHub] [arrow] rymurr commented on a change in pull request #7275: ARROW-6110: [Java][Integration] Support LargeList Type and add integration test with C++

2020-07-01 Thread GitBox
rymurr commented on a change in pull request #7275: URL: https://github.com/apache/arrow/pull/7275#discussion_r448227452 ## File path: java/vector/src/main/java/org/apache/arrow/vector/complex/LargeListVector.java ## @@ -0,0 +1,991 @@ +/* + * Licensed to the Apache Software Fo

[GitHub] [arrow] rymurr commented on a change in pull request #7275: ARROW-6110: [Java][Integration] Support LargeList Type and add integration test with C++

2020-07-01 Thread GitBox
rymurr commented on a change in pull request #7275: URL: https://github.com/apache/arrow/pull/7275#discussion_r448227944 ## File path: java/vector/src/main/java/org/apache/arrow/vector/complex/LargeListVector.java ## @@ -0,0 +1,991 @@ +/* + * Licensed to the Apache Software Fo

[GitHub] [arrow] rymurr commented on a change in pull request #7275: ARROW-6110: [Java][Integration] Support LargeList Type and add integration test with C++

2020-07-01 Thread GitBox
rymurr commented on a change in pull request #7275: URL: https://github.com/apache/arrow/pull/7275#discussion_r448227771 ## File path: java/vector/src/main/java/org/apache/arrow/vector/complex/LargeListVector.java ## @@ -0,0 +1,991 @@ +/* + * Licensed to the Apache Software Fo

[GitHub] [arrow] mrkn commented on pull request #7539: ARROW-9156: [C++] Reducing the code size of the tensor module

2020-07-01 Thread GitBox
mrkn commented on pull request #7539: URL: https://github.com/apache/arrow/pull/7539#issuecomment-652298958 I wrote a benchmark code that measures the performance of conversion from Tensor to SparseTensor. And I run this code with `--repetitions=10` and got the following result. -

[GitHub] [arrow] rymurr commented on a change in pull request #7275: ARROW-6110: [Java][Integration] Support LargeList Type and add integration test with C++

2020-07-01 Thread GitBox
rymurr commented on a change in pull request #7275: URL: https://github.com/apache/arrow/pull/7275#discussion_r448232201 ## File path: java/vector/src/main/java/org/apache/arrow/vector/complex/LargeListVector.java ## @@ -0,0 +1,991 @@ +/* + * Licensed to the Apache Software Fo

[GitHub] [arrow] rymurr commented on a change in pull request #7275: ARROW-6110: [Java][Integration] Support LargeList Type and add integration test with C++

2020-07-01 Thread GitBox
rymurr commented on a change in pull request #7275: URL: https://github.com/apache/arrow/pull/7275#discussion_r448233329 ## File path: java/vector/src/main/java/org/apache/arrow/vector/ipc/JsonFileReader.java ## @@ -564,8 +564,10 @@ private ArrowBuf readIntoBuffer(BufferAlloca

[GitHub] [arrow] cyb70289 edited a comment on pull request #7603: ARROW-9206: [C++][Flight] Add latency benchmark

2020-07-01 Thread GitBox
cyb70289 edited a comment on pull request #7603: URL: https://github.com/apache/arrow/pull/7603#issuecomment-652289408 > You can't compute latency like that if the test is multithreaded. Assume two threads, one finishes `n1` IO in `t1` time, another finishes `n2` IO in `t2` time. So

[GitHub] [arrow] pitrou commented on pull request #7603: ARROW-9206: [C++][Flight] Add latency benchmark

2020-07-01 Thread GitBox
pitrou commented on pull request #7603: URL: https://github.com/apache/arrow/pull/7603#issuecomment-652307944 IMHO it should be `(t1+t2) / (n1+n2)`. Is it what the code is doing? This is an automated message from the Apache G

[GitHub] [arrow] cyb70289 commented on pull request #7603: ARROW-9206: [C++][Flight] Add latency benchmark

2020-07-01 Thread GitBox
cyb70289 commented on pull request #7603: URL: https://github.com/apache/arrow/pull/7603#issuecomment-652311392 > IMHO it should be `(t1+t2) / (n1+n2)`. Is it what the code is doing? Yes, this is what the code is doing. ---

[GitHub] [arrow] pitrou commented on pull request #7603: ARROW-9206: [C++][Flight] Add latency benchmark

2020-07-01 Thread GitBox
pitrou commented on pull request #7603: URL: https://github.com/apache/arrow/pull/7603#issuecomment-652316126 Ok, then I was mistaken. Sorry :-) This is an automated message from the Apache Git Service. To respond to the mess

[GitHub] [arrow] liyafan82 commented on pull request #7347: ARROW-8230: [Java] Remove netty dependency from arrow-memory

2020-07-01 Thread GitBox
liyafan82 commented on pull request #7347: URL: https://github.com/apache/arrow/pull/7347#issuecomment-652319908 > Thanks a lot @liyafan82 I have addressed your suggestions and rebased @rymurr Thanks for your work. Will merge when it turns green. ---

[GitHub] [arrow] rymurr commented on a change in pull request #7275: ARROW-6110: [Java][Integration] Support LargeList Type and add integration test with C++

2020-07-01 Thread GitBox
rymurr commented on a change in pull request #7275: URL: https://github.com/apache/arrow/pull/7275#discussion_r448255547 ## File path: java/vector/src/main/codegen/templates/UnionLargeListWriter.java ## @@ -0,0 +1,232 @@ +/* + * Licensed to the Apache Software Foundation (ASF)

[GitHub] [arrow] rymurr commented on pull request #7275: ARROW-6110: [Java][Integration] Support LargeList Type and add integration test with C++

2020-07-01 Thread GitBox
rymurr commented on pull request #7275: URL: https://github.com/apache/arrow/pull/7275#issuecomment-652323175 > Thanks for working on this @rymurr ! Apologies for taking so long to review.. It looks pretty good, but I saw what looked like inconsistencies in the `LargeListVector` APIs using

[GitHub] [arrow] pitrou opened a new pull request #7605: ARROW-9283: [Python] Expose build info

2020-07-01 Thread GitBox
pitrou opened a new pull request #7605: URL: https://github.com/apache/arrow/pull/7605 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

[GitHub] [arrow] pitrou commented on pull request #7605: ARROW-9283: [Python] Expose build info

2020-07-01 Thread GitBox
pitrou commented on pull request #7605: URL: https://github.com/apache/arrow/pull/7605#issuecomment-652330730 There's a problem where we already generate `__version__` and it ends up different, for example: ```python >>> import pyarrow as pa

[GitHub] [arrow] github-actions[bot] commented on pull request #7605: ARROW-9283: [Python] Expose build info

2020-07-01 Thread GitBox
github-actions[bot] commented on pull request #7605: URL: https://github.com/apache/arrow/pull/7605#issuecomment-652332966 https://issues.apache.org/jira/browse/ARROW-9283 This is an automated message from the Apache Git Serv

[GitHub] [arrow] lidavidm commented on pull request #7543: ARROW-9221: [Java] account for big-endian buffers in ArrowBuf.setBytes

2020-07-01 Thread GitBox
lidavidm commented on pull request #7543: URL: https://github.com/apache/arrow/pull/7543#issuecomment-652379935 @liyafan82 The problem actually isn't with big-endian platforms! It's because Java's ByteBuffer [defaults to big-endian](https://docs.oracle.com/en/java/javase/11/docs/api/java.b

[GitHub] [arrow] pitrou commented on pull request #7544: ARROW-7285: [C++] ensure C++ implementation meets clarified dictionary spec

2020-07-01 Thread GitBox
pitrou commented on pull request #7544: URL: https://github.com/apache/arrow/pull/7544#issuecomment-652383485 Thanks for the update. I will merge this PR once CI is green. This is an automated message from the Apache Git Serv

[GitHub] [arrow] pitrou commented on a change in pull request #7593: ARROW-9160: [C++] Implement contains for exact matches

2020-07-01 Thread GitBox
pitrou commented on a change in pull request #7593: URL: https://github.com/apache/arrow/pull/7593#discussion_r448326108 ## File path: cpp/src/arrow/compute/kernels/scalar_string.cc ## @@ -297,6 +297,116 @@ void AddAsciiLength(FunctionRegistry* registry) { DCHECK_OK(registry

[GitHub] [arrow] liyafan82 commented on pull request #7544: ARROW-7285: [C++] ensure C++ implementation meets clarified dictionary spec

2020-07-01 Thread GitBox
liyafan82 commented on pull request #7544: URL: https://github.com/apache/arrow/pull/7544#issuecomment-652395253 @pitrou Thanks a lot for your effort. This is an automated message from the Apache Git Service. To respond to t

[GitHub] [arrow] liyafan82 commented on pull request #7543: ARROW-9221: [Java] account for big-endian buffers in ArrowBuf.setBytes

2020-07-01 Thread GitBox
liyafan82 commented on pull request #7543: URL: https://github.com/apache/arrow/pull/7543#issuecomment-652395634 > @liyafan82 The problem actually isn't with big-endian platforms! It's because Java's ByteBuffer [defaults to big-endian](https://docs.oracle.com/en/java/javase/11/docs/api/jav

[GitHub] [arrow] liyafan82 commented on a change in pull request #7543: ARROW-9221: [Java] account for big-endian buffers in ArrowBuf.setBytes

2020-07-01 Thread GitBox
liyafan82 commented on a change in pull request #7543: URL: https://github.com/apache/arrow/pull/7543#discussion_r448338460 ## File path: java/memory/src/main/java/org/apache/arrow/memory/ArrowBuf.java ## @@ -872,6 +874,7 @@ public void setBytes(long index, ByteBuffer src) {

[GitHub] [arrow] lidavidm commented on a change in pull request #7543: ARROW-9221: [Java] account for big-endian buffers in ArrowBuf.setBytes

2020-07-01 Thread GitBox
lidavidm commented on a change in pull request #7543: URL: https://github.com/apache/arrow/pull/7543#discussion_r448345297 ## File path: java/memory/src/main/java/org/apache/arrow/memory/ArrowBuf.java ## @@ -872,6 +874,7 @@ public void setBytes(long index, ByteBuffer src) {

[GitHub] [arrow] pitrou opened a new pull request #7606: ARROW-8434: [C++] Avoid multiple schema deserializations in RecordBatchFileReader

2020-07-01 Thread GitBox
pitrou opened a new pull request #7606: URL: https://github.com/apache/arrow/pull/7606 This doesn't seem to make a difference in the included benchmark. This is an automated message from the Apache Git Service. To respond to

[GitHub] [arrow] jianxind opened a new pull request #7607: ARROW-8996: [C++] Add AVX version for aggregate sum/mean

2020-07-01 Thread GitBox
jianxind opened a new pull request #7607: URL: https://github.com/apache/arrow/pull/7607 1. Add AVX2/AVX512 build version of aggregate sum/mean function. Use set_source_files_properties to append the SIMD build option. Register the SIMD path at runtime by CPU feature.

[GitHub] [arrow] jianxind commented on pull request #7607: ARROW-8996: [C++] Add AVX version for aggregate sum/mean

2020-07-01 Thread GitBox
jianxind commented on pull request #7607: URL: https://github.com/apache/arrow/pull/7607#issuecomment-652425725 @ursabot benchmark --suite-filter=arrow-compute-aggregate-benchmark This is an automated message from the Apache

[GitHub] [arrow] jianxind commented on pull request #7607: ARROW-8996: [C++] Add AVX version for aggregate sum/mean

2020-07-01 Thread GitBox
jianxind commented on pull request #7607: URL: https://github.com/apache/arrow/pull/7607#issuecomment-652431931 @emkornfield This is the new version for sum aggregate without intrinsic, could you help to review? The dense part nearly get the same scores with intrinsic for AVX2 on cla

[GitHub] [arrow] github-actions[bot] commented on pull request #7607: ARROW-8996: [C++] Add AVX version for aggregate sum/mean

2020-07-01 Thread GitBox
github-actions[bot] commented on pull request #7607: URL: https://github.com/apache/arrow/pull/7607#issuecomment-652433109 https://issues.apache.org/jira/browse/ARROW-8996 This is an automated message from the Apache Git Serv

[GitHub] [arrow] github-actions[bot] commented on pull request #7606: ARROW-8434: [C++] Avoid multiple schema deserializations in RecordBatchFileReader

2020-07-01 Thread GitBox
github-actions[bot] commented on pull request #7606: URL: https://github.com/apache/arrow/pull/7606#issuecomment-652433108 https://issues.apache.org/jira/browse/ARROW-8434 This is an automated message from the Apache Git Serv

[GitHub] [arrow] wesm commented on pull request #7593: ARROW-9160: [C++] Implement contains for exact matches

2020-07-01 Thread GitBox
wesm commented on pull request #7593: URL: https://github.com/apache/arrow/pull/7593#issuecomment-652450464 > Could you elaborate? Why is this not a problem with the lower/upper kernels? The data preallocation is only for fixed size outputs (eg boolean, integers, floating point, etc

[GitHub] [arrow] emkornfield commented on a change in pull request #7604: ARROW-9223: [Python] Propagate Timzone information in pandas conversion

2020-07-01 Thread GitBox
emkornfield commented on a change in pull request #7604: URL: https://github.com/apache/arrow/pull/7604#discussion_r448413419 ## File path: cpp/src/arrow/python/datetime.cc ## @@ -262,6 +265,42 @@ int64_t PyDate_to_days(PyDateTime_Date* pydate) { Py

[GitHub] [arrow] jorisvandenbossche opened a new pull request #7608: ARROW-9288: [C++][Dataset] Fix PartitioningFactory with dictionary encoding for HivePartioning

2020-07-01 Thread GitBox
jorisvandenbossche opened a new pull request #7608: URL: https://github.com/apache/arrow/pull/7608 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[GitHub] [arrow] github-actions[bot] commented on pull request #7608: ARROW-9288: [C++][Dataset] Fix PartitioningFactory with dictionary encoding for HivePartioning

2020-07-01 Thread GitBox
github-actions[bot] commented on pull request #7608: URL: https://github.com/apache/arrow/pull/7608#issuecomment-652480774 https://issues.apache.org/jira/browse/ARROW-9288 This is an automated message from the Apache Git Serv

[GitHub] [arrow] kiszk commented on pull request #7507: ARROW-8797: [C++] Create test to receive RecordBatch for different endian

2020-07-01 Thread GitBox
kiszk commented on pull request #7507: URL: https://github.com/apache/arrow/pull/7507#issuecomment-652482871 @wesm Thank you for your suggestion. I will pursue the approach that you suggested. I will check the integration test command line tool and the integration test with the JSON_TO_ARR

[GitHub] [arrow] jorisvandenbossche commented on a change in pull request #7608: ARROW-9288: [C++][Dataset] Fix PartitioningFactory with dictionary encoding for HivePartioning

2020-07-01 Thread GitBox
jorisvandenbossche commented on a change in pull request #7608: URL: https://github.com/apache/arrow/pull/7608#discussion_r448438477 ## File path: cpp/src/arrow/dataset/partition.cc ## @@ -646,15 +657,26 @@ class HivePartitioningFactory : public PartitioningFactory { }

[GitHub] [arrow] emkornfield commented on a change in pull request #7604: ARROW-9223: [Python] Propagate Timzone information in pandas conversion

2020-07-01 Thread GitBox
emkornfield commented on a change in pull request #7604: URL: https://github.com/apache/arrow/pull/7604#discussion_r448445229 ## File path: cpp/src/arrow/python/arrow_to_pandas.cc ## @@ -951,8 +951,21 @@ struct ObjectWriterVisitor { template enable_if_timestamp Visit(con

[GitHub] [arrow] jorisvandenbossche commented on pull request #7536: ARROW-8647: [C++][Python][Dataset] Allow partitioning fields to be inferred with dictionary type

2020-07-01 Thread GitBox
jorisvandenbossche commented on pull request #7536: URL: https://github.com/apache/arrow/pull/7536#issuecomment-652493993 @bkietz thanks for the update ensuring all uniques as dictionary values! Testing this out, I ran into an issue with HivePartitioning -> ARROW-9288 / #7608

[GitHub] [arrow] jorisvandenbossche commented on pull request #7546: ARROW-8733: [C++][Dataset][Python] Expose RowGroupInfo statistics values

2020-07-01 Thread GitBox
jorisvandenbossche commented on pull request #7546: URL: https://github.com/apache/arrow/pull/7546#issuecomment-652497706 @rjzamora `num_rows` is already available on the RowGroupInfo object (https://github.com/apache/arrow/blob/cd3ed605857994575326c072bbfcf995541fa80e/python/pyarrow/_datas

[GitHub] [arrow] nealrichardson opened a new pull request #7609: ARROW-9289: [R] Remove deprecated functions

2020-07-01 Thread GitBox
nealrichardson opened a new pull request #7609: URL: https://github.com/apache/arrow/pull/7609 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL abov

[GitHub] [arrow] nealrichardson closed pull request #7602: ARROW-9083: [R] collect int64, uint32, uint64 as R integer type if not out of bounds

2020-07-01 Thread GitBox
nealrichardson closed pull request #7602: URL: https://github.com/apache/arrow/pull/7602 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to g

[GitHub] [arrow] jorisvandenbossche commented on pull request #7478: ARROW-9055: [C++] Add sum/mean/minmax kernels for Boolean type

2020-07-01 Thread GitBox
jorisvandenbossche commented on pull request #7478: URL: https://github.com/apache/arrow/pull/7478#issuecomment-652504934 > We will have to resolve the sum([]) -> null/0 by introducing a "minimum valid values" option. Do we already have a JIRA to track this? ---

[GitHub] [arrow] jorisvandenbossche edited a comment on pull request #7478: ARROW-9055: [C++] Add sum/mean/minmax kernels for Boolean type

2020-07-01 Thread GitBox
jorisvandenbossche edited a comment on pull request #7478: URL: https://github.com/apache/arrow/pull/7478#issuecomment-652504934 > We will have to resolve the sum([]) -> null/0 by introducing a "minimum valid values" option. Do we already have a JIRA to track this? EDIT -> it

[GitHub] [arrow] kiszk commented on pull request #7596: ARROW-9163: [C++] Validate UTF8 contents of a StringArray

2020-07-01 Thread GitBox
kiszk commented on pull request #7596: URL: https://github.com/apache/arrow/pull/7596#issuecomment-652505301 Looks good This is an automated message from the Apache Git Service. To respond to the message, please log on to Git

[GitHub] [arrow] github-actions[bot] commented on pull request #7609: ARROW-9289: [R] Remove deprecated functions

2020-07-01 Thread GitBox
github-actions[bot] commented on pull request #7609: URL: https://github.com/apache/arrow/pull/7609#issuecomment-652507951 https://issues.apache.org/jira/browse/ARROW-9289 This is an automated message from the Apache Git Serv

[GitHub] [arrow] rymurr commented on pull request #7290: ARROW-1692: [Java] UnionArray round trip not working

2020-07-01 Thread GitBox
rymurr commented on pull request #7290: URL: https://github.com/apache/arrow/pull/7290#issuecomment-652516529 This has been modified to incorporate the changes to Unions as proposed on the mailing list This is an automated m

[GitHub] [arrow] nealrichardson commented on a change in pull request #7514: ARROW-6235: [R] Implement conversion from arrow::BinaryArray to R character vector

2020-07-01 Thread GitBox
nealrichardson commented on a change in pull request #7514: URL: https://github.com/apache/arrow/pull/7514#discussion_r448483516 ## File path: r/src/array_from_vector.cpp ## @@ -918,6 +923,97 @@ class Time64Converter : public TimeConverter { } }; +template +class BinaryV

[GitHub] [arrow] nealrichardson commented on a change in pull request #7514: ARROW-6235: [R] Implement conversion from arrow::BinaryArray to R character vector

2020-07-01 Thread GitBox
nealrichardson commented on a change in pull request #7514: URL: https://github.com/apache/arrow/pull/7514#discussion_r448490350 ## File path: r/src/array_to_vector.cpp ## @@ -693,6 +741,9 @@ std::shared_ptr Converter::Make(const std::shared_ptr& type case Type::BOOL:

[GitHub] [arrow] nealrichardson commented on a change in pull request #7514: ARROW-6235: [R] Implement conversion from arrow::BinaryArray to R character vector

2020-07-01 Thread GitBox
nealrichardson commented on a change in pull request #7514: URL: https://github.com/apache/arrow/pull/7514#discussion_r448491760 ## File path: r/src/array_from_vector.cpp ## @@ -918,6 +923,97 @@ class Time64Converter : public TimeConverter { } }; +template +class BinaryV

[GitHub] [arrow] BryanCutler commented on pull request #6316: ARROW-7717: [CI] Have nightly integration test for Spark's latest release

2020-07-01 Thread GitBox
BryanCutler commented on pull request #6316: URL: https://github.com/apache/arrow/pull/6316#issuecomment-652541712 @kszucs the nightly against Spark master have been passing. Do you think you could update this to just add the test against branch-3.0 and remove branch-2.4 for now? I'm not s

[GitHub] [arrow] nealrichardson closed pull request #7514: ARROW-6235: [R] Implement conversion from arrow::BinaryArray to R character vector

2020-07-01 Thread GitBox
nealrichardson closed pull request #7514: URL: https://github.com/apache/arrow/pull/7514 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to g

[GitHub] [arrow] BryanCutler commented on pull request #6316: ARROW-7717: [CI] Have nightly integration test for Spark's latest release

2020-07-01 Thread GitBox
BryanCutler commented on pull request #6316: URL: https://github.com/apache/arrow/pull/6316#issuecomment-652542479 @github-actions crossbow submit test-conda-python-3.7-spark-branch-3.0 This is an automated message from the A

[GitHub] [arrow] BryanCutler commented on a change in pull request #6316: ARROW-7717: [CI] Have nightly integration test for Spark's latest release

2020-07-01 Thread GitBox
BryanCutler commented on a change in pull request #6316: URL: https://github.com/apache/arrow/pull/6316#discussion_r448503228 ## File path: dev/tasks/tasks.yml ## @@ -1833,12 +1833,32 @@ tasks: HDFS: 2.9.2 run: conda-python-hdfs - test-conda-python-3.7-spark-

[GitHub] [arrow] nealrichardson closed pull request #7609: ARROW-9289: [R] Remove deprecated functions

2020-07-01 Thread GitBox
nealrichardson closed pull request #7609: URL: https://github.com/apache/arrow/pull/7609 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to g

[GitHub] [arrow] github-actions[bot] commented on pull request #6316: ARROW-7717: [CI] Have nightly integration test for Spark's latest release

2020-07-01 Thread GitBox
github-actions[bot] commented on pull request #6316: URL: https://github.com/apache/arrow/pull/6316#issuecomment-652544533 Revision: a914eea4f3ab16e359adee2f37a4fb30a1eba86c Submitted crossbow builds: [ursa-labs/crossbow @ actions-371](https://github.com/ursa-labs/crossbow/branches/a

[GitHub] [arrow] kiszk commented on a change in pull request #7593: ARROW-9160: [C++] Implement contains for exact matches

2020-07-01 Thread GitBox
kiszk commented on a change in pull request #7593: URL: https://github.com/apache/arrow/pull/7593#discussion_r448506425 ## File path: cpp/src/arrow/compute/kernels/scalar_string.cc ## @@ -297,6 +297,116 @@ void AddAsciiLength(FunctionRegistry* registry) { DCHECK_OK(registry-

[GitHub] [arrow] saethlin opened a new pull request #7610: ARROW-9290: [Rust] [Parquet] Add features to allow opting out of dependencies

2020-07-01 Thread GitBox
saethlin opened a new pull request #7610: URL: https://github.com/apache/arrow/pull/7610 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to g

[GitHub] [arrow] github-actions[bot] commented on pull request #7610: ARROW-9290: [Rust] [Parquet] Add features to allow opting out of dependencies

2020-07-01 Thread GitBox
github-actions[bot] commented on pull request #7610: URL: https://github.com/apache/arrow/pull/7610#issuecomment-652553010 https://issues.apache.org/jira/browse/ARROW-9290 This is an automated message from the Apache Git Serv

[GitHub] [arrow] nealrichardson opened a new pull request #7611: ARROW-3308: [R] Convert R character vector with data exceeding 2GB to Large type

2020-07-01 Thread GitBox
nealrichardson opened a new pull request #7611: URL: https://github.com/apache/arrow/pull/7611 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL abov

[GitHub] [arrow] github-actions[bot] commented on pull request #7611: ARROW-3308: [R] Convert R character vector with data exceeding 2GB to Large type

2020-07-01 Thread GitBox
github-actions[bot] commented on pull request #7611: URL: https://github.com/apache/arrow/pull/7611#issuecomment-652580102 https://issues.apache.org/jira/browse/ARROW-3308 This is an automated message from the Apache Git Serv

[GitHub] [arrow] pitrou closed pull request #7544: ARROW-7285: [C++] ensure C++ implementation meets clarified dictionary spec

2020-07-01 Thread GitBox
pitrou closed pull request #7544: URL: https://github.com/apache/arrow/pull/7544 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [arrow] pitrou opened a new pull request #7612: ARROW-7011: [C++] Implement casts from float/double to decimal

2020-07-01 Thread GitBox
pitrou opened a new pull request #7612: URL: https://github.com/apache/arrow/pull/7612 Also naturally available in Python using the Array.cast() method. This is an automated message from the Apache Git Service. To respond to

[GitHub] [arrow] kszucs commented on a change in pull request #7519: ARROW-9017: [C++][Python] Refactor scalar bindings

2020-07-01 Thread GitBox
kszucs commented on a change in pull request #7519: URL: https://github.com/apache/arrow/pull/7519#discussion_r448550619 ## File path: python/pyarrow/scalar.pxi ## @@ -16,1198 +16,748 @@ # under the License. -_NULL = NA = None +import collections cdef class Scalar:

[GitHub] [arrow] kszucs commented on a change in pull request #7519: ARROW-9017: [C++][Python] Refactor scalar bindings

2020-07-01 Thread GitBox
kszucs commented on a change in pull request #7519: URL: https://github.com/apache/arrow/pull/7519#discussion_r448550822 ## File path: python/pyarrow/scalar.pxi ## @@ -16,1198 +16,748 @@ # under the License. -_NULL = NA = None +import collections cdef class Scalar:

[GitHub] [arrow] kszucs commented on a change in pull request #7519: ARROW-9017: [C++][Python] Refactor scalar bindings

2020-07-01 Thread GitBox
kszucs commented on a change in pull request #7519: URL: https://github.com/apache/arrow/pull/7519#discussion_r448551033 ## File path: python/pyarrow/tests/test_scalars.py ## @@ -16,427 +16,443 @@ # under the License. import datetime +import decimal import pytest -import u

[GitHub] [arrow] kszucs commented on a change in pull request #7519: ARROW-9017: [C++][Python] Refactor scalar bindings

2020-07-01 Thread GitBox
kszucs commented on a change in pull request #7519: URL: https://github.com/apache/arrow/pull/7519#discussion_r448551482 ## File path: python/pyarrow/scalar.pxi ## @@ -16,1198 +16,745 @@ # under the License. -_NULL = NA = None +import collections cdef class Scalar:

  1   2   >