[jira] [Created] (ARROW-1503) [Python] Add serialization callbacks for pandas objects in pyarrow.serialize

2017-09-08 Thread Wes McKinney (JIRA)
Wes McKinney created ARROW-1503: --- Summary: [Python] Add serialization callbacks for pandas objects in pyarrow.serialize Key: ARROW-1503 URL: https://issues.apache.org/jira/browse/ARROW-1503 Project: Apa

[jira] [Created] (ARROW-1502) [JS] Define code entry point for integration testing

2017-09-08 Thread Wes McKinney (JIRA)
Wes McKinney created ARROW-1502: --- Summary: [JS] Define code entry point for integration testing Key: ARROW-1502 URL: https://issues.apache.org/jira/browse/ARROW-1502 Project: Apache Arrow Issue

[jira] [Created] (ARROW-1501) [JS] JavaScript integration tests

2017-09-08 Thread Wes McKinney (JIRA)
Wes McKinney created ARROW-1501: --- Summary: [JS] JavaScript integration tests Key: ARROW-1501 URL: https://issues.apache.org/jira/browse/ARROW-1501 Project: Apache Arrow Issue Type: Improvement

Re: spark error with reading parquet file created vis pandas/pyarrow

2017-09-08 Thread Julien Le Dem
The int96 deprecation is slowly bubbling up the stack. There are still discussions in spark on how to make the change. So for now even though it's deprecated it is still used in some places. This should get resolved in the near future. Julien > On Sep 8, 2017, at 14:12, Wes McKinney wrote: >

[jira] [Created] (ARROW-1500) [C++] Result of ftruncate ignored in MemoryMappedFile::Create

2017-09-08 Thread Wes McKinney (JIRA)
Wes McKinney created ARROW-1500: --- Summary: [C++] Result of ftruncate ignored in MemoryMappedFile::Create Key: ARROW-1500 URL: https://issues.apache.org/jira/browse/ARROW-1500 Project: Apache Arrow

Re: [DISCUSS] Publishing Arrow development artifacts more frequently for alpha stage components

2017-09-08 Thread Julian Hyde
See policy: http://www.apache.org/legal/release-policy.html#release-approval 3 vote majority is MUST. Individuals verifying by building is REQUIRED. 72 hours is SHOULD. So you see there is room to maneuver on the latter. Julia

Re: spark error with reading parquet file created vis pandas/pyarrow

2017-09-08 Thread Wes McKinney
Turning on int96 timestamps is the solution right now. To save yourself some typing, you could declare parquet_options = { 'compression': ..., 'use_deprecated_int96_timestamps': True } pq.write_table(..., **parquet_options) On Fri, Sep 8, 2017 at 5:08 PM, Brian Wylie wrote: > So, this i

Re: [DISCUSS] Publishing Arrow development artifacts more frequently for alpha stage components

2017-09-08 Thread Wes McKinney
Understood. For projects that are in alpha stage, would it be reasonable to relax the voting requirement to a 1 day vote period, where at least 1 PMC member must vote to approve (rather than the 3 vote requirement)? On Fri, Sep 8, 2017 at 4:58 PM, Julian Hyde wrote: > >> On Sep 7, 2017, at 7:06

Re: spark error with reading parquet file created vis pandas/pyarrow

2017-09-08 Thread Brian Wylie
So, this is certainly good for future versions of Arrow. Do you have any specific recommendations for a workaround currently? Saving a parquet file with datetimes will obviously be a common use case and if I'm understanding it correctly, right now saving a Parquet file with PyArrow that file will

Re: spark error with reading parquet file created vis pandas/pyarrow

2017-09-08 Thread Wes McKinney
Indeed, INT96 is deprecated in the Parquet format. There are other issues with Spark (it places restrictions on table field names, for example), so it may be worth adding an option like pq.write_table(table, where, flavor='spark') or maybe better pq.write_table(table, where, flavor='spark-2.2')

[jira] [Created] (ARROW-1499) [Python] Consider adding option to parquet.write_table that sets options for maximum Spark compatibility

2017-09-08 Thread Wes McKinney (JIRA)
Wes McKinney created ARROW-1499: --- Summary: [Python] Consider adding option to parquet.write_table that sets options for maximum Spark compatibility Key: ARROW-1499 URL: https://issues.apache.org/jira/browse/ARROW-14

Re: [DISCUSS] Publishing Arrow development artifacts more frequently for alpha stage components

2017-09-08 Thread Julian Hyde
> On Sep 7, 2017, at 7:06 PM, Wes McKinney wrote: > > I personally don't have a problem with subcomponents publishing > artifacts to package managers outside of the primary Apache project > votes and releases, so long as they clearly signal that these package > builds are for development and not

[jira] [Created] (ARROW-1498) [GitHub] Add CONTRIBUTING.md and ISSUE_TEMPLATE.md

2017-09-08 Thread Wes McKinney (JIRA)
Wes McKinney created ARROW-1498: --- Summary: [GitHub] Add CONTRIBUTING.md and ISSUE_TEMPLATE.md Key: ARROW-1498 URL: https://issues.apache.org/jira/browse/ARROW-1498 Project: Apache Arrow Issue T

Re: spark error with reading parquet file created vis pandas/pyarrow

2017-09-08 Thread Brian Wylie
Okay, So after some additional debugging, I can get around this if I set use_deprecated_int96_timestamps=True on the pq.write_table(arrow_table, filename, compression=compression, use_deprecated_int96_timestamps=True) call. But that just feels SO wrongas I'm sure it's deprecated for a reaso

spark error with reading parquet file created vis pandas/pyarrow

2017-09-08 Thread Brian Wylie
Apologies if this isn't quite the right place to ask this question, but I figured Wes/others might know right off the bat :) Context: - Mac OSX Laptop - PySpark: 2.2.0 - PyArrow: 0.6.0 - Pandas: 0.19.2 Issue Explanation: - I'm converting my Pandas dataframe to a Parquet file with code very simil

[jira] [Created] (ARROW-1497) [Java] JsonFileReader doesn't set value count for some vectors

2017-09-08 Thread Li Jin (JIRA)
Li Jin created ARROW-1497: - Summary: [Java] JsonFileReader doesn't set value count for some vectors Key: ARROW-1497 URL: https://issues.apache.org/jira/browse/ARROW-1497 Project: Apache Arrow Issue

Re: Arrow 0.7.0 release timeline

2017-09-08 Thread Wes McKinney
>From the look of the JIRA milestone, we should be in position to cut the 0.7.0 release candidate this coming Monday 11 September. What's remaining are mostly stretch goals, though there are a couple bugs that need investigating. Thanks Wes On Mon, Sep 4, 2017 at 10:08 PM, Li Jin wrote: > Update

[jira] [Created] (ARROW-1496) [JS] Upload coverage data to coveralls

2017-09-08 Thread Wes McKinney (JIRA)
Wes McKinney created ARROW-1496: --- Summary: [JS] Upload coverage data to coveralls Key: ARROW-1496 URL: https://issues.apache.org/jira/browse/ARROW-1496 Project: Apache Arrow Issue Type: Improve

Re: Time for bi-weekly Arrow sync

2017-09-08 Thread Heimir Sverrisson
Wednesdays at 16:00 UTC is fine for me too. Thanks for organizing! On Fri, Sep 8, 2017 at 7:24 AM Li Jin wrote: > Wednesdays at 16:00 UTC works for me too. Thanks Wes! > > On Fri, Sep 8, 2017 at 2:55 AM, Uwe L. Korn wrote: > > > Wednesdays at 16:00 UTC is fine for me. Thanks for organising! > >

[jira] [Created] (ARROW-1495) [C++] Store shared_ptr to boxed arrays in RecordBatch

2017-09-08 Thread Wes McKinney (JIRA)
Wes McKinney created ARROW-1495: --- Summary: [C++] Store shared_ptr to boxed arrays in RecordBatch Key: ARROW-1495 URL: https://issues.apache.org/jira/browse/ARROW-1495 Project: Apache Arrow Issu

Re: Time for bi-weekly Arrow sync

2017-09-08 Thread Li Jin
Wednesdays at 16:00 UTC works for me too. Thanks Wes! On Fri, Sep 8, 2017 at 2:55 AM, Uwe L. Korn wrote: > Wednesdays at 16:00 UTC is fine for me. Thanks for organising! > > On Fri, Sep 8, 2017, at 04:19 AM, Jacques Nadeau wrote: > > Great, thanks for coordinating! > > > > On Thu, Sep 7, 2017 at