Sorry I didn't get to this, will try again tomorrow.
On Thu, Apr 30, 2020 at 11:09 AM Wes McKinney wrote:
> I'd be fine with a patch release addressing this so long as it's
> binary-only (to save us all time).
>
> On Thu, Apr 30, 2020, 12:30 PM Micah Kornfield
> wrote:
>
>> This sounds like som
Hi all,
I am trying to write a Julia parquet writer by leveraging the C++ arrow
library. I can build arrow and arrow/parquet and can write out a parquet
file successfully. The next part I need to do is to use the [CxxWrap.jl](
https://github.com/JuliaInterop/CxxWrap.jl) Julia package to call the C
Hi,
I am trying to integrate Arrow with an application that I am developing.
Here I build Arrow from the source (CPP) and use the API to develop some
custom functions to do a scientific calculation after data loaded with
Arrow table API. On top of this, I develop a Cython API to design a python
AP
Wes McKinney created ARROW-8661:
---
Summary: [C++][Gandiva] Reduce number of files and headers
Key: ARROW-8661
URL: https://issues.apache.org/jira/browse/ARROW-8661
Project: Apache Arrow
Issue Ty
Wes McKinney created ARROW-8660:
---
Summary: [C++][Gandiva] Reduce dependence on Boost
Key: ARROW-8660
URL: https://issues.apache.org/jira/browse/ARROW-8660
Project: Apache Arrow
Issue Type: Impr
Raphael Taylor-Davies created ARROW-8659:
Summary: ListBuilder and FixedSizeListBuilder capacity
Key: ARROW-8659
URL: https://issues.apache.org/jira/browse/ARROW-8659
Project: Apache Arrow
Hi folks,
In https://github.com/apache/arrow/pull/7060 I proposed an
(unavoidable) C++ API change related to the two types of intervals
that are in the Arrow columnar format.
As context, in the C++ library in almost all cases we use different
Type enum values for each "subtype" that has a differe
The vote carries with 7 binding +1 votes and 1 non-binding +1
On Fri, Apr 24, 2020 at 7:40 AM Francois Saint-Jacques
wrote:
>
> +1 (binding)
>
> On Fri, Apr 24, 2020 at 5:41 AM Krisztián Szűcs
> wrote:
> >
> > +1 (binding)
> >
> > On 2020. Apr 24., Fri at 1:51, Micah Kornfield
> > wrote:
> >
>
Francois,
Thanks for the pointers. I'll see if I can put together a
proof-of-concept, might that help discussion? I agree it would be good
to make it format-agnostic. I'm also curious what thoughts you'd have
on how to manage cross-file parallelism (coalescing only helps within
a file). If we just
Ben Kietzman created ARROW-8658:
---
Summary: [C++][Dataset] Implement subtree pruning for
FileSystemDataset::GetFragments
Key: ARROW-8658
URL: https://issues.apache.org/jira/browse/ARROW-8658
Project: Apa
I'd be fine with a patch release addressing this so long as it's
binary-only (to save us all time).
On Thu, Apr 30, 2020, 12:30 PM Micah Kornfield
wrote:
> This sounds like something we might want to do and issue a patch release.
> It seems bad to default to a non-production version?
>
> I can t
This sounds like something we might want to do and issue a patch release.
It seems bad to default to a non-production version?
I can try to take a look tonight at a patch of no gets to it before.
Thanks,
Micah
On Wednesday, April 29, 2020, Wes McKinney wrote:
> On Wed, Apr 29, 2020 at 6:15 PM
Pierre Belzile created ARROW-8657:
-
Summary: Distinguish parquet version 2 logical type vs DataPageV2
Key: ARROW-8657
URL: https://issues.apache.org/jira/browse/ARROW-8657
Project: Apache Arrow
Krisztian Szucs created ARROW-8656:
--
Summary: [Python] Switch to VS2017 in the windows wheel builds
Key: ARROW-8656
URL: https://issues.apache.org/jira/browse/ARROW-8656
Project: Apache Arrow
Joris Van den Bossche created ARROW-8655:
Summary: [C++][Dataset][Python][R] Preserve partitioning
information for a discovered Dataset
Key: ARROW-8655
URL: https://issues.apache.org/jira/browse/ARROW-8655
Mike Macpherson created ARROW-8654:
--
Summary: [Python] pyarrow 0.17.0 fails reading "wide" parquet files
Key: ARROW-8654
URL: https://issues.apache.org/jira/browse/ARROW-8654
Project: Apache Arrow
Krisztian Szucs created ARROW-8653:
--
Summary: [C++] Add support for gflags version detection
Key: ARROW-8653
URL: https://issues.apache.org/jira/browse/ARROW-8653
Project: Apache Arrow
Issue
Joris Van den Bossche created ARROW-8652:
Summary: [Python] Test error message when discovering dataset with
invalid files
Key: ARROW-8652
URL: https://issues.apache.org/jira/browse/ARROW-8652
Joris Van den Bossche created ARROW-8651:
Summary: [Python][Dataset] Support pickling of Dataset objects
Key: ARROW-8651
URL: https://issues.apache.org/jira/browse/ARROW-8651
Project: Apache Ar
Andy Grove created ARROW-8650:
-
Summary: [Rust] [Website] Add documentation to Arrow website
Key: ARROW-8650
URL: https://issues.apache.org/jira/browse/ARROW-8650
Project: Apache Arrow
Issue Type
Andy Grove created ARROW-8649:
-
Summary: [Java] [Website] Java documentation on website is hidden
Key: ARROW-8649
URL: https://issues.apache.org/jira/browse/ARROW-8649
Project: Apache Arrow
Issue
Mark Hildreth created ARROW-8648:
Summary: [Rust] Optimize Rust CI Build Times
Key: ARROW-8648
URL: https://issues.apache.org/jira/browse/ARROW-8648
Project: Apache Arrow
Issue Type: Improvem
The proposal is for any BUNDLED dependency to be merged into
libarrow.a (or another one of the static libraries if the dependency
is only used in e.g. one subcomponent), so this applies to the AWS SDK
also
On Thu, Apr 30, 2020 at 3:02 AM Rémi Dettai wrote:
>
> Hi!
>
> Does your point 1 also apply
Joris Van den Bossche created ARROW-8647:
Summary: [C++][Dataset] Optionally encode partition field values
as dictionary type
Key: ARROW-8647
URL: https://issues.apache.org/jira/browse/ARROW-8647
If we want to discuss IO APIs we should do that comprehensively.
There are various ways of expressing what we want to do (explicit
readahead, fadvise-like APIs, async APIs, etc.).
Regards
Antoine.
Le 30/04/2020 à 15:08, Francois Saint-Jacques a écrit :
> One more point,
>
> It would seem ben
One more point,
It would seem beneficial if we could express this in
`RandomAccessFile::ReadAhead(vector)` method: no async
buffering/coalescing would be needed. In the case of Parquet, we'd get
the _exact_ ranges computed from the medata.This method would also
possibly benefit other filesystems s
Hello David,
I think that what you ask is achievable with the dataset API without
much effort. You'd have to insert the pre-buffering at
ParquetFileFormat::ScanFile [1]. The top-level Scanner::Scan method is
essentially a generator that looks like
flatmap(Iterator>). It consumes the
fragment in-or
Sure, and we are still interested in collaborating. The main use case
we have is scanning datasets in order of the partition key; it seems
ordering is the only missing thing from Antoine's comments. However,
from briefly playing around with the Python API, an application could
manually order the fr
On Thu, 30 Apr 2020 at 04:06, Wes McKinney wrote:
> On Wed, Apr 29, 2020 at 6:54 PM David Li wrote:
> >
> > Ah, sorry, so I am being somewhat unclear here. Yes, you aren't
> > guaranteed to download all the files in order, but with more control,
> > you can make this more likely. You can also pr
I suggest to create a github actions workflow to trigger these integration
tests on pull requests when the relevant modules have changed:
parquet.py, dataset.pyx etc.
We have plenty of build failures, I'm trying to go through them.
Given the regularly occurring nightly errors we should move some
o
Arrow Build Report for Job nightly-2020-04-30-0
All tasks:
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-30-0
Failed Tasks:
- centos-6-amd64:
URL:
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-30-0-github-centos-6-amd64
- centos-7-amd64:
Thippana Vamsi Kalyan created ARROW-8646:
Summary: Allow UnionListWriter to write null values
Key: ARROW-8646
URL: https://issues.apache.org/jira/browse/ARROW-8646
Project: Apache Arrow
Krisztian Szucs created ARROW-8645:
--
Summary: [C++] Missing gflags dependency for plasma
Key: ARROW-8645
URL: https://issues.apache.org/jira/browse/ARROW-8645
Project: Apache Arrow
Issue Typ
I opened issues to track the failing dask and pandas-master integration
tests:
https://issues.apache.org/jira/browse/ARROW-8643
https://issues.apache.org/jira/browse/ARROW-8644
On Wed, 29 Apr 2020 at 12:09, Crossbow wrote:
>
> Arrow Build Report for Job nightly-2020-04-29-0
>
> All tasks:
> ht
Joris Van den Bossche created ARROW-8644:
Summary: [Python] Dask integration tests failing due to change in
not including partition columns
Key: ARROW-8644
URL: https://issues.apache.org/jira/browse/ARROW-
Joris Van den Bossche created ARROW-8643:
Summary: [Python] Tests with pandas master failing due to freq
assertion
Key: ARROW-8643
URL: https://issues.apache.org/jira/browse/ARROW-8643
Projec
Hi!
Does your point 1 also apply to the AWS SDK dependency ? Currently it seems
that it cannot be built in BUNDLED mode. As stated in
https://issues.apache.org/jira/browse/ARROW-8565 I struggled a lot to make
a static build with the S3 dependency activated ! I would really like to
help on this bec
Anish Biswas created ARROW-8642:
---
Summary: Is there a good way to convert data types from numpy
types to pyarrow DataType?
Key: ARROW-8642
URL: https://issues.apache.org/jira/browse/ARROW-8642
Project:
38 matches
Mail list logo