Re: [Discuss][Java] Check-style rules for methods

2019-05-06 Thread Jacques Nadeau
Per my comments on PR, I'm fine as well. On Tue, May 7, 2019 at 12:29 AM Bryan Cutler wrote: > I'm fine with not requiring param/return tags for now. It will be great to > enforce just having a javadoc and I think a good description is usually > enough. > > Bryan > > On Sun, May 5, 2019 at 3:49

RE: [DISCUSS][C++][Proposal] Threading engine for Arrow

2019-05-06 Thread Jed Brown
"Malakhov, Anton" writes: > Jed, > >> From: Jed Brown [mailto:j...@jedbrown.org] >> Sent: Friday, May 3, 2019 12:41 > >> You linked to a NumPy discussion >> (https://github.com/numpy/numpy/issues/11826) that is encountering the same >> issues, but proposing solutions based on the global

[jira] [Created] (ARROW-5274) [Javascript] Wrong array type for countBy

2019-05-06 Thread Yngve Kristiansen (JIRA)
Yngve Kristiansen created ARROW-5274: Summary: [Javascript] Wrong array type for countBy Key: ARROW-5274 URL: https://issues.apache.org/jira/browse/ARROW-5274 Project: Apache Arrow Issue

RE: [DISCUSS][C++][Proposal] Threading engine for Arrow

2019-05-06 Thread Malakhov, Anton
Jed, > From: Jed Brown [mailto:j...@jedbrown.org] > Sent: Friday, May 3, 2019 12:41 > You linked to a NumPy discussion > (https://github.com/numpy/numpy/issues/11826) that is encountering the same > issues, but proposing solutions based on the global environment. > That is perhaps acceptable for

[jira] [Created] (ARROW-5273) [C++] Valgrind failures in JSON tests

2019-05-06 Thread Antoine Pitrou (JIRA)
Antoine Pitrou created ARROW-5273: - Summary: [C++] Valgrind failures in JSON tests Key: ARROW-5273 URL: https://issues.apache.org/jira/browse/ARROW-5273 Project: Apache Arrow Issue Type: Bug

[jira] [Created] (ARROW-5272) [C++] [Gandiva] JIT code executed over uninitialized values

2019-05-06 Thread Antoine Pitrou (JIRA)
Antoine Pitrou created ARROW-5272: - Summary: [C++] [Gandiva] JIT code executed over uninitialized values Key: ARROW-5272 URL: https://issues.apache.org/jira/browse/ARROW-5272 Project: Apache Arrow

Re: [Discuss][Java] Check-style rules for methods

2019-05-06 Thread Bryan Cutler
I'm fine with not requiring param/return tags for now. It will be great to enforce just having a javadoc and I think a good description is usually enough. Bryan On Sun, May 5, 2019 at 3:49 PM Micah Kornfield wrote: > I've submitted a pull request [1] that enables the javadoc method check >

[jira] [Created] (ARROW-5271) [Python] Interface for converting pandas ExtensionArray / other custom array objects to pyarrow Array

2019-05-06 Thread Joris Van den Bossche (JIRA)
Joris Van den Bossche created ARROW-5271: Summary: [Python] Interface for converting pandas ExtensionArray / other custom array objects to pyarrow Array Key: ARROW-5271 URL:

RE: [DISCUSS][C++][Proposal] Threading engine for Arrow

2019-05-06 Thread Melik-Adamyan, Areg
> The question is whether you want to spend at least a month or more of > intense development on something else (a basic query engine, as we've been > discussing in [1]) before we are able to develop consensus about the > approach to threading. Personally, I would not make this choice given that >

[jira] [Created] (ARROW-5270) [C++] Reenable Valgrind on Travis-CI

2019-05-06 Thread Antoine Pitrou (JIRA)
Antoine Pitrou created ARROW-5270: - Summary: [C++] Reenable Valgrind on Travis-CI Key: ARROW-5270 URL: https://issues.apache.org/jira/browse/ARROW-5270 Project: Apache Arrow Issue Type: Bug

Re: ARROW-3191: Making ArrowBuf work with arbitrary memory and setting io.netty.tryReflectionSetAccessible to true for java builds

2019-05-06 Thread Siddharth Teotia
Hi Bryan, AFAIK, there is not other impact. So we should be good. The last few integration issues that I had been chasing are now fixed (got a clean build with my previous commit pushed over the weekend). I just pushed a new commit with some cleanup and the changes are now ready. We should plan

[jira] [Created] (ARROW-5269) [C++] Whitelist benchmarks candidates for regression checks

2019-05-06 Thread Francois Saint-Jacques (JIRA)
Francois Saint-Jacques created ARROW-5269: - Summary: [C++] Whitelist benchmarks candidates for regression checks Key: ARROW-5269 URL: https://issues.apache.org/jira/browse/ARROW-5269 Project:

Re: Stored state of incremental writes to fixed size Arrow buffer?

2019-05-06 Thread Wes McKinney
hi John -- again, I would caution you against using Feather files for issues of longevity -- the internal memory layout of those files is a "dead man walking" so to speak. I would advise against forking the project, IMHO that is a dark path that leads nowhere good. We have a large community here

Re: Stored state of incremental writes to fixed size Arrow buffer?

2019-05-06 Thread John Muehlhausen
François, Wes, Thanks for the feedback. I think the most practical thing for me to do is 1- write a Feather file that is structured to pre-allocate the space I need (e.g. initial variable-length strings are of average size) 2- come up with code to monkey around with the values contained in the

[jira] [Created] (ARROW-5268) [GLib] Add GArrowJSONReader

2019-05-06 Thread Yosuke Shiro (JIRA)
Yosuke Shiro created ARROW-5268: --- Summary: [GLib] Add GArrowJSONReader Key: ARROW-5268 URL: https://issues.apache.org/jira/browse/ARROW-5268 Project: Apache Arrow Issue Type: New Feature

Re: Stored state of incremental writes to fixed size Arrow buffer?

2019-05-06 Thread Francois Saint-Jacques
Hello John, Arrow is not yet suited for partial writes. The specification only talks about fully frozen/immutable objects, you're in implementation defined territory here. For example, the C++ library assumes the Array object is immutable; it memoize the null count, and likely more statistics in

Re: Stored state of incremental writes to fixed size Arrow buffer?

2019-05-06 Thread Wes McKinney
hi John, Feel free to open some JIRA issues to make a specific proposal about what you want to see in the libraries I would recommend not coupling yourself to the Feather format as it stands now, as I would like to change it as soon as > 90% of R users can successfully install the Arrow

Re: Stored state of incremental writes to fixed size Arrow buffer?

2019-05-06 Thread John Muehlhausen
Wes, I’m not afraid of writing my own C++ code to deal with all of this on the writer side. I just need a way to “append” (incrementally populate) e.g. feather files so that a person using e.g. pyarrow doesn’t suffer some catastrophic failure... and “on the side” I tell them which rows are junk

Re: [DISCUSS][C++][Proposal] Threading engine for Arrow

2019-05-06 Thread Wes McKinney
Anton, per your comment: > Sounds like a good way to go! We'll create a demo, as you suggested, > implementing a parallel execution model for a simple analytics pipeline that > reads and processes the files. My only concern is about adding more pipeline > breaker nodes and compute intensive

Re: Stored state of incremental writes to fixed size Arrow buffer?

2019-05-06 Thread John Muehlhausen
Thanks Jacques, Not what I had hoped, but assuming that I have some other mechanism for telling the reader which rows are junk, it seems like there is a follow-up question regarding adherence to specification for variable-width strings: Suppose I have 100 bytes for string storage and a vector of

Re: RecordBatch.length vs. Buffer.length?

2019-05-06 Thread Wes McKinney
hi Jeffrey, The sizing of each Buffer can vary significantly depending on what the schema is. For example, Binary or List have variable element sizes and so their buffers will also. I'm not sure about the exact details in the Java library but there should be some integrity verification whether

Re: Stored state of incremental writes to fixed size Arrow buffer?

2019-05-06 Thread Wes McKinney
hi John, In C++ the builder classes don't yet support writing into preallocated memory. It would be tricky for applications to determine a priori which segments of memory to pass to the builder. It seems only feasible for primitive / fixed-size types so my guess would be that a separate set of

Re: Stored state of incremental writes to fixed size Arrow buffer?

2019-05-06 Thread Jacques Nadeau
This is more of a question of implementation versus specification. An arrow buffer is generally built and then sealed. In different languages, this building process works differently (a concern of the language rather than the memory specification). We don't currently allow a half built vector to

Stored state of incremental writes to fixed size Arrow buffer?

2019-05-06 Thread John Muehlhausen
Hello, Glad to learn of this project— good work! If I allocate a single chunk of memory and start building Arrow format within it, does this chunk save any state regarding my progress? For example, suppose I allocate a column for floating point (fixed width) and a column for string (variable

[jira] [Created] (ARROW-5267) [Go] implement read/write IPC for dictionaries

2019-05-06 Thread Sebastien Binet (JIRA)
Sebastien Binet created ARROW-5267: -- Summary: [Go] implement read/write IPC for dictionaries Key: ARROW-5267 URL: https://issues.apache.org/jira/browse/ARROW-5267 Project: Apache Arrow

[jira] [Created] (ARROW-5266) [Go] implement read/write IPC for Float16

2019-05-06 Thread Sebastien Binet (JIRA)
Sebastien Binet created ARROW-5266: -- Summary: [Go] implement read/write IPC for Float16 Key: ARROW-5266 URL: https://issues.apache.org/jira/browse/ARROW-5266 Project: Apache Arrow Issue

[jira] [Created] (ARROW-5265) [Python/CI] Add integration test with kartothek

2019-05-06 Thread Uwe L. Korn (JIRA)
Uwe L. Korn created ARROW-5265: -- Summary: [Python/CI] Add integration test with kartothek Key: ARROW-5265 URL: https://issues.apache.org/jira/browse/ARROW-5265 Project: Apache Arrow Issue Type:

Re: [DISCUSS][JAVA]Support Fast/Unsafe Vector APIs for Arrow

2019-05-06 Thread Jacques Nadeau
I am still asking the same question: can you please analyze the assembly the JIT is producing and look to identify why the disabled bounds checking is at 30% and what types of things we can do to address. For example, we have talked before about a bytecode transformer that simply removes the

[jira] [Created] (ARROW-5264) Allow enabling/disabling boundary checking dynamically in the code

2019-05-06 Thread Liya Fan (JIRA)
Liya Fan created ARROW-5264: --- Summary: Allow enabling/disabling boundary checking dynamically in the code Key: ARROW-5264 URL: https://issues.apache.org/jira/browse/ARROW-5264 Project: Apache Arrow

Re: [DISCUSS][JAVA]Support Fast/Unsafe Vector APIs for Arrow

2019-05-06 Thread Fan Liya
Hi Jacques, Thank you so much for your kind reminder. To come up with some performance data, I have set up an environment and run some micro-benchmarks. The server runs Linux, has 64 cores and has 256 GB memory. The benchmarks are simple iterations over some double vectors (the source file is