Per my comments on PR, I'm fine as well.
On Tue, May 7, 2019 at 12:29 AM Bryan Cutler wrote:
> I'm fine with not requiring param/return tags for now. It will be great to
> enforce just having a javadoc and I think a good description is usually
> enough.
>
> Bryan
>
> On Sun, May 5, 2019 at 3:49
"Malakhov, Anton" writes:
> Jed,
>
>> From: Jed Brown [mailto:j...@jedbrown.org]
>> Sent: Friday, May 3, 2019 12:41
>
>> You linked to a NumPy discussion
>> (https://github.com/numpy/numpy/issues/11826) that is encountering the same
>> issues, but proposing solutions based on the global
Yngve Kristiansen created ARROW-5274:
Summary: [Javascript] Wrong array type for countBy
Key: ARROW-5274
URL: https://issues.apache.org/jira/browse/ARROW-5274
Project: Apache Arrow
Issue
Jed,
> From: Jed Brown [mailto:j...@jedbrown.org]
> Sent: Friday, May 3, 2019 12:41
> You linked to a NumPy discussion
> (https://github.com/numpy/numpy/issues/11826) that is encountering the same
> issues, but proposing solutions based on the global environment.
> That is perhaps acceptable for
Antoine Pitrou created ARROW-5273:
-
Summary: [C++] Valgrind failures in JSON tests
Key: ARROW-5273
URL: https://issues.apache.org/jira/browse/ARROW-5273
Project: Apache Arrow
Issue Type: Bug
Antoine Pitrou created ARROW-5272:
-
Summary: [C++] [Gandiva] JIT code executed over uninitialized
values
Key: ARROW-5272
URL: https://issues.apache.org/jira/browse/ARROW-5272
Project: Apache Arrow
I'm fine with not requiring param/return tags for now. It will be great to
enforce just having a javadoc and I think a good description is usually
enough.
Bryan
On Sun, May 5, 2019 at 3:49 PM Micah Kornfield
wrote:
> I've submitted a pull request [1] that enables the javadoc method check
>
Joris Van den Bossche created ARROW-5271:
Summary: [Python] Interface for converting pandas ExtensionArray /
other custom array objects to pyarrow Array
Key: ARROW-5271
URL:
> The question is whether you want to spend at least a month or more of
> intense development on something else (a basic query engine, as we've been
> discussing in [1]) before we are able to develop consensus about the
> approach to threading. Personally, I would not make this choice given that
>
Antoine Pitrou created ARROW-5270:
-
Summary: [C++] Reenable Valgrind on Travis-CI
Key: ARROW-5270
URL: https://issues.apache.org/jira/browse/ARROW-5270
Project: Apache Arrow
Issue Type: Bug
Hi Bryan,
AFAIK, there is not other impact. So we should be good.
The last few integration issues that I had been chasing are now fixed (got
a clean build with my previous commit pushed over the weekend). I just
pushed a new commit with some cleanup and the changes are now ready. We
should plan
Francois Saint-Jacques created ARROW-5269:
-
Summary: [C++] Whitelist benchmarks candidates for regression
checks
Key: ARROW-5269
URL: https://issues.apache.org/jira/browse/ARROW-5269
Project:
hi John -- again, I would caution you against using Feather files for
issues of longevity -- the internal memory layout of those files is a
"dead man walking" so to speak.
I would advise against forking the project, IMHO that is a dark path
that leads nowhere good. We have a large community here
François, Wes,
Thanks for the feedback. I think the most practical thing for me to do is
1- write a Feather file that is structured to pre-allocate the space I need
(e.g. initial variable-length strings are of average size)
2- come up with code to monkey around with the values contained in the
Yosuke Shiro created ARROW-5268:
---
Summary: [GLib] Add GArrowJSONReader
Key: ARROW-5268
URL: https://issues.apache.org/jira/browse/ARROW-5268
Project: Apache Arrow
Issue Type: New Feature
Hello John,
Arrow is not yet suited for partial writes. The specification only
talks about fully frozen/immutable objects, you're in implementation
defined territory here. For example, the C++ library assumes the Array
object is immutable; it memoize the null count, and likely more
statistics in
hi John,
Feel free to open some JIRA issues to make a specific proposal about
what you want to see in the libraries
I would recommend not coupling yourself to the Feather format as it
stands now, as I would like to change it as soon as > 90% of R users
can successfully install the Arrow
Wes,
I’m not afraid of writing my own C++ code to deal with all of this on the
writer side. I just need a way to “append” (incrementally populate) e.g.
feather files so that a person using e.g. pyarrow doesn’t suffer some
catastrophic failure... and “on the side” I tell them which rows are junk
Anton, per your comment:
> Sounds like a good way to go! We'll create a demo, as you suggested,
> implementing a parallel execution model for a simple analytics pipeline that
> reads and processes the files. My only concern is about adding more pipeline
> breaker nodes and compute intensive
Thanks Jacques,
Not what I had hoped, but assuming that I have some other mechanism for
telling the reader which rows are junk, it seems like there is a follow-up
question regarding adherence to specification for variable-width strings:
Suppose I have 100 bytes for string storage and a vector of
hi Jeffrey,
The sizing of each Buffer can vary significantly depending on what the
schema is. For example, Binary or List have variable element sizes and
so their buffers will also.
I'm not sure about the exact details in the Java library but there
should be some integrity verification whether
hi John,
In C++ the builder classes don't yet support writing into preallocated
memory. It would be tricky for applications to determine a priori
which segments of memory to pass to the builder. It seems only
feasible for primitive / fixed-size types so my guess would be that a
separate set of
This is more of a question of implementation versus specification. An arrow
buffer is generally built and then sealed. In different languages, this
building process works differently (a concern of the language rather than
the memory specification). We don't currently allow a half built vector to
Hello,
Glad to learn of this project— good work!
If I allocate a single chunk of memory and start building Arrow format
within it, does this chunk save any state regarding my progress?
For example, suppose I allocate a column for floating point (fixed width)
and a column for string (variable
Sebastien Binet created ARROW-5267:
--
Summary: [Go] implement read/write IPC for dictionaries
Key: ARROW-5267
URL: https://issues.apache.org/jira/browse/ARROW-5267
Project: Apache Arrow
Sebastien Binet created ARROW-5266:
--
Summary: [Go] implement read/write IPC for Float16
Key: ARROW-5266
URL: https://issues.apache.org/jira/browse/ARROW-5266
Project: Apache Arrow
Issue
Uwe L. Korn created ARROW-5265:
--
Summary: [Python/CI] Add integration test with kartothek
Key: ARROW-5265
URL: https://issues.apache.org/jira/browse/ARROW-5265
Project: Apache Arrow
Issue Type:
I am still asking the same question: can you please analyze the assembly
the JIT is producing and look to identify why the disabled bounds checking
is at 30% and what types of things we can do to address. For example, we
have talked before about a bytecode transformer that simply removes the
Liya Fan created ARROW-5264:
---
Summary: Allow enabling/disabling boundary checking dynamically in
the code
Key: ARROW-5264
URL: https://issues.apache.org/jira/browse/ARROW-5264
Project: Apache Arrow
Hi Jacques,
Thank you so much for your kind reminder.
To come up with some performance data, I have set up an environment and run
some micro-benchmarks.
The server runs Linux, has 64 cores and has 256 GB memory.
The benchmarks are simple iterations over some double vectors (the source
file is
30 matches
Mail list logo