Hi All,
Sorry to send a request to all but just would like to ask if anyone could be
able to help finish the review for PR#7030[1].
As of now the PR contains following parts:
1. Base dataset API for Java language (which follows the shape of C++ API)
2. A JNI-based implementation of FileSyste
We should be extending the archery ipc integration tests for this (ideally
no files checked in)
On Thursday, January 28, 2021, Fan Liya wrote:
> Hi Joris,
>
> The Java support for lz4 compression is on-going (
> https://github.com/apache/arrow/pull/8949).
> Integration with C++/Python is not fin
Hi Joris,
The Java support for lz4 compression is on-going (
https://github.com/apache/arrow/pull/8949).
Integration with C++/Python is not finished yet.
We would appreciate it if you could share the file to help us with the
integration test.
Best,
Liya Fan
On Fri, Jan 29, 2021 at 2:41 AM Antoi
Hi All,
I've been working on a library for fast explanation of tabular data:
https://github.com/jjthomas/fast_data_explanation
I've implemented acceleration on GPU and FPGA (using the Amazon F1
platform).
I think this is an example of a pretty simple but useful workload going
from Pandas to acce
Le 28/01/2021 à 19:38, Wes McKinney a écrit :
> It still seems notable that our generic LZ4-compressed output stream
> cannot be read by Java (independent of Arrow and the Arrow IPC
> format).
That and the custom LZ4 framing used by Parquet-Java... Apparently the
Java ecosystem can't implement p
It still seems notable that our generic LZ4-compressed output stream
cannot be read by Java (independent of Arrow and the Arrow IPC
format).
On Thu, Jan 28, 2021 at 12:30 PM Antoine Pitrou wrote:
>
> On Thu, 28 Jan 2021 18:19:00 +
> Joris Peeters wrote:
>
> > To be fair, I'm happy to apply i
On Thu, 28 Jan 2021 18:19:00 +
Joris Peeters wrote:
> To be fair, I'm happy to apply it at IPC level. Just didn't realise that
> was a thing. IIUC what Antoine suggests, though, then just (leaving Python
> as-is and) changing my Java to
>
> var is = new FileInputStream(path.toFile());
>
Aha, OK!
Thanks for the help all. I'll keep an eye on the Java side for the IPC
compression, but for my current purpose doing full stream compression is
totally fine.
On Thu, Jan 28, 2021 at 6:22 PM Micah Kornfield
wrote:
> The application level compression Java support for compression is being
The application level compression Java support for compression is being
worked on (I would need to double check if the PR has been merged) and I
don't think its been integration tested with C++/Python I would imagine it
would run into a similar issue with not being able to decode linked blocks.
To be fair, I'm happy to apply it at IPC level. Just didn't realise that
was a thing. IIUC what Antoine suggests, though, then just (leaving Python
as-is and) changing my Java to
var is = new FileInputStream(path.toFile());
var reader = new ArrowStreamReader(is, allocator);
var schema
It might be worth opening up an issue with the lz4-java library. This
seems like the java implementation doesn't fully support the LZ4 stream
protocol?
Antoine in this case it looks like Joris is applying the compression and
decompression at the file level NOT the IPC level.
On Thu, Jan 28, 2021
Le 28/01/2021 à 17:59, Joris Peeters a écrit :
> From Python, I'm dumping an LZ4-compressed arrow stream to a file, using
>
> with pa.output_stream(path, compression = 'lz4') as fh:
> writer = pa.RecordBatchStreamWriter(fh, table.schema)
> writer.write_table(table)
>
My position on this is that we should work with the pandas community
to work toward elimination of the BlockManager data structure as this
will solve a multitude of problems and also make things better for
Arrow. I am not supportive of the IPC format changes in the PR.
On Wed, Jan 27, 2021 at 6:27
hi Joris -- this isn't a use case that we intend for most users (we
intend for users to instead use the LZ4 compression option that is
part of the IPC format itself, rather than something that is layered
on externally), but it would be good to make sure that our LZ4 streams
are interoperable across
I would, for one, enjoy such a presentation
On Thu, Jan 28, 2021 at 11:15 AM Rémi Dettai wrote:
> Thank you for the support! I might do a quick (5 min) presentation during
> the next Rust sync call if you are interested!
>
> Remi
>
> Le mer. 27 janv. 2021 à 19:40, Daniël Heres a
> écrit :
>
> >
Hi,
Really thanks Deepak!
I really want to edit the ORC reader to read ORC MAPs as Arrow MAPs now and
it’s not a serious hassle to do so. Is there anyone who needs the
read-ORC-maps-as-lists-of-structs functionality? If not I will do it likely in
my current PR.
Ying
> On Jan 19, 2021, at 8:4
>From Python, I'm dumping an LZ4-compressed arrow stream to a file, using
with pa.output_stream(path, compression = 'lz4') as fh:
writer = pa.RecordBatchStreamWriter(fh, table.schema)
writer.write_table(table)
writer.close()
I then try reading this file from Java, star
Thank you for the support! I might do a quick (5 min) presentation during
the next Rust sync call if you are interested!
Remi
Le mer. 27 janv. 2021 à 19:40, Daniël Heres a
écrit :
> This is really interesting Rémi!
>
> I like the interesting take on using "serverless" cloud components to build
Thanks Andrew and Jorge for the help.
I think the use of the ScalarValue enum is precisely what I want. I was
worried that downcasting the column every time you need to get a value
would be slow but I can see that you are doing that with the ScalarValue
enum (
https://github.com/apache/arrow/blob/
Hi Ying,
Le 28/01/2021 à 08:15, Ying Zhou a écrit :
>
>
> By the way I haven’t found any function that can directly generate an Arrow
> Table using a schema, size and null_probability. Is there any need for such
> functionality? If this is useful for purposes beyond ORC/Parquet/CSV/etc IO
>
In the application I'm working on I'm reading a parquet file and creating a
table to keep the records in memory.
This gist has the idea of it
https://gist.github.com/elferherrera/a2a796ae83a7203f58de704c178c44ef
I would like to keep it as pure Arrow because I have found that it is super
fast to c
I agree with Andrew (as usual) :)
Irrespectively, maybe it is easier if you could describe what you are
trying to accomplish, Fernando. There are possibly other ways of going
about this,
and maybe someone can help by knowing more context.
Best,
Jorge
On Thu, Jan 28, 2021 at 1:06 PM Andrew Lamb
Yes, I'm running my code with the --release flag.
I've been looking everywhere but I can't find a way to make the writing
faster. I dont know if it is a mistake I'm making with the structs or the
Parquet crate needs optimizations.
Fernando,
On Thu, Jan 28, 2021 at 12:02 PM Andrew Lamb wrote:
>
I think this approach would work (and we have something similar in
DataFusion (ScalarValue)
https://github.com/apache/arrow/blob/4b7cdcb9220b6d94b251aef32c21ef9b4097ecfa/rust/datafusion/src/scalar.rs#L46
-- though it is an enum rather than a Trait, I think the idea is basically
the same)
I think t
The first thing I would check is that you are using a release build (`cargo
build --release`)
If you are, there may be additional optimizations needed in the Rust
implementations
Andrew
On Thu, Jan 28, 2021 at 6:19 AM Fernando Herrera <
fernando.j.herr...@gmail.com> wrote:
> Hi,
>
> What is the
Hi,
What is the writing speed that we should expect from the Arrow Parquet
writer?
I'm writing a RecordBatch with two columns and 1,000,000 records and it
takes a lot of time to write the batch to the file (close to 2 secs).
This is what I'm doing
let schema = Schema::new(vec![
> Field::new
Hi Jorge,
What about making the Array::value return a &dyn ValueTrait. This new
ValueTrait would have to be implemented for all the possible values that
can be returned from the arrays
Fernando
On Thu, 28 Jan 2021, 08:42 Jorge Cardoso Leitão,
wrote:
> Hi Fernando,
>
> I tried that some time ag
Arrow Build Report for Job nightly-2021-01-28-0
All tasks:
https://github.com/ursacomputing/crossbow/branches/all?query=nightly-2021-01-28-0
Failed Tasks:
- centos-8-aarch64:
URL:
https://github.com/ursacomputing/crossbow/branches/all?query=nightly-2021-01-28-0-travis-centos-8-aarch64
- con
Hi Fernando,
I tried that some time ago, but I was unable to do so. The reason is that
Array is a trait that needs to support also being a trait object (i.e.
support `&dyn Array`).
Let's try here: what type should `Array::value` return? One option is to
make Array a generic. But if Array is a gen
I see what you mean. I was thinking that the function signature would have
to be something like this:
trait Array {
>fn value(&self) -> T
> }
Where T would have to implement another trait, call it ValueTrait, in order
to define how to extract the different values types, e.g. &str, u32, etc.
30 matches
Mail list logo