I will restructure the existing PR and create new ones (with JIRA) for
JS/Java. Just haven't gotten around to it yet.
On Wed, May 22, 2019 at 9:10 PM Wes McKinney wrote:
> Yes, I think separate JIRA issues for Java and JS would be best. I'd
> recommend having one patch for each, so maybe we can
Yes, I think separate JIRA issues for Java and JS would be best. I'd
recommend having one patch for each, so maybe we can just sort out C++
for now
On Wed, May 22, 2019 at 3:03 PM John Muehlhausen wrote:
>
> I added this to https://github.com/apache/arrow/pull/4372 and am hoping CI
> will test it
I added this to https://github.com/apache/arrow/pull/4372 and am hoping CI
will test it for me. Do Java/JS require separate JIRA entries?
On Wed, May 22, 2019 at 2:53 PM Wes McKinney wrote:
> It doesn't appear that Java writes an EOS (4-byte 0) after the last
> record batch in ArrowFileWriter.j
It doesn't appear that Java writes an EOS (4-byte 0) after the last
record batch in ArrowFileWriter.java [1] so may want to open a JIRA to
make this simple change.
Note that this addition (not even a change) is backward compatible,
old readers will be unaffected
[1]:
https://github.com/apache/ar
https://github.com/apache/arrow/pull/4372
First contribution attempt... sorry in advance if I'm not coloring inside
the lines!
On Wed, May 22, 2019 at 9:06 AM John Muehlhausen wrote:
> I will submit a patch once I get set up for that. My crystal ball says
> that some people will rely on sequen
I will submit a patch once I get set up for that. My crystal ball says
that some people will rely on sequential rather than seeking access to File
data. E.g. a utility that can either write the same data to a File or to
stdout, piped to another process that can read a File from stdin and
therefor
The file format isn't intended to be used in a streaming setting. Use
RecordBatchStreamWriter if you need to be able to read a dataset as a
stream
That being said I don't have a problem with writing the EOS in the
file, but the current implementation is not "wrong".
On Wed, May 22, 2019 at 8:37 A
I believe the change involves updating the File format notes as above, as
well as something like the following. The format also mentions "there is
no requirement that dictionary keys should be defined in a DictionaryBatch
before they are used in a RecordBatch, as long as the keys are defined
somew
Hi John,
It the IPC code segfaults on invalid input then it's worth opening an
issue on JIRA.
Regards
Antoine.
Le 21/05/2019 à 23:52, John Muehlhausen a écrit :
> Wes,
>
> Check out reader.cpp. It seg faults when it gets to the next
> message-that-is-not-a-message... it is a footer. But I
This seems like a reasonable change. Is there any reason that we shouldnt
always append EOS?
On Tuesday, May 21, 2019, John Muehlhausen wrote:
> Wes,
>
> Check out reader.cpp. It seg faults when it gets to the next
> message-that-is-not-a-message... it is a footer. But I have no way to
> know
Wes,
Check out reader.cpp. It seg faults when it gets to the next
message-that-is-not-a-message... it is a footer. But I have no way to know
this in reader.cpp because I'm piping the File in via stdin.
In seeker.cpp I seek to the end and figure out where the footer is (this is
a py-arrow-writte
hi John,
I'm not sure I follow. The EOS you're referring to is part of the
streaming format. It's designed to be readable using an InputStream
interface that does not support seeking at all. You can see the core
logic where messages are popped off the InputStream here
https://github.com/apache/ar
https://arrow.apache.org/docs/format/IPC.html#file-format
If this stream marker is optional in the file format, doesn't this prevent
someone from reading the file without being able to seek() it, e.g. if it
is "piped in" to a program? Or otherwise they'll have to stream in the
entire thing befo
13 matches
Mail list logo