[jira] [Commented] (ARROW-15474) [Python] Possibility of a table.drop_duplicates() function?

2022-11-09 Thread Lance Dacey (Jira)
[ https://issues.apache.org/jira/browse/ARROW-15474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17631279#comment-17631279 ] Lance Dacey commented on ARROW-15474: - Nice, I was able to test it out and seemed to

[jira] [Commented] (ARROW-15716) [Dataset][Python] Parse a list of fragment paths to gather filters

2022-11-09 Thread Lance Dacey (Jira)
[ https://issues.apache.org/jira/browse/ARROW-15716?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17631108#comment-17631108 ] Lance Dacey commented on ARROW-15716: - Yes, ultimate goal is to create a single expr

[jira] [Commented] (ARROW-15716) [Dataset][Python] Parse a list of fragment paths to gather filters

2022-11-07 Thread Lance Dacey (Jira)
[ https://issues.apache.org/jira/browse/ARROW-15716?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17630175#comment-17630175 ] Lance Dacey commented on ARROW-15716: - Yes, if I could easily retrieve a list of the

[jira] [Commented] (ARROW-15716) [Dataset][Python] Parse a list of fragment paths to gather filters

2022-11-07 Thread Lance Dacey (Jira)
[ https://issues.apache.org/jira/browse/ARROW-15716?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17630046#comment-17630046 ] Lance Dacey commented on ARROW-15716: - I wanted to check if this is something which

[jira] [Updated] (ARROW-15716) [Dataset][Python] Parse a list of fragment paths to gather filters

2022-11-07 Thread Lance Dacey (Jira)
[ https://issues.apache.org/jira/browse/ARROW-15716?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lance Dacey updated ARROW-15716: Description: Is it possible for partitioning.parse() to be updated to parse a list of paths inste

[jira] [Commented] (ARROW-15474) [Python] Possibility of a table.drop_duplicates() function?

2022-10-24 Thread Lance Dacey (Jira)
[ https://issues.apache.org/jira/browse/ARROW-15474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17623362#comment-17623362 ] Lance Dacey commented on ARROW-15474: - Nice - I will give that a shot, thanks. I hav

[jira] [Commented] (ARROW-12358) [C++][Python][R][Dataset] Control overwriting vs appending when writing to existing dataset

2022-04-22 Thread Lance Dacey (Jira)
[ https://issues.apache.org/jira/browse/ARROW-12358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17526487#comment-17526487 ] Lance Dacey commented on ARROW-12358: - Nice, thanks. I can try to test with a nightl

[jira] [Commented] (ARROW-15474) [Python] Possibility of a table.drop_duplicates() function?

2022-04-20 Thread Lance Dacey (Jira)
[ https://issues.apache.org/jira/browse/ARROW-15474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17524991#comment-17524991 ] Lance Dacey commented on ARROW-15474: - I'll keep this open since this is a major wis

[jira] [Comment Edited] (ARROW-16077) [Python] ArrowInvalid error on reading partitioned parquet files with fsspec.adlfs (pyarrow-7.0.0) due to removed '/' in the ls of path

2022-04-05 Thread Lance Dacey (Jira)
[ https://issues.apache.org/jira/browse/ARROW-16077?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17517478#comment-17517478 ] Lance Dacey edited comment on ARROW-16077 at 4/5/22 2:26 PM: -

[jira] [Commented] (ARROW-16077) [Python] ArrowInvalid error on reading partitioned parquet files with fsspec.adlfs (pyarrow-7.0.0) due to removed '/' in the ls of path

2022-04-05 Thread Lance Dacey (Jira)
[ https://issues.apache.org/jira/browse/ARROW-16077?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17517478#comment-17517478 ] Lance Dacey commented on ARROW-16077: - I am not sure about any public datasets. Loca

[jira] [Commented] (ARROW-12358) [C++][Python][R][Dataset] Control overwriting vs appending when writing to existing dataset

2022-04-05 Thread Lance Dacey (Jira)
[ https://issues.apache.org/jira/browse/ARROW-12358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17517431#comment-17517431 ] Lance Dacey commented on ARROW-12358: - Is this on the radar to be fixed for the next

[jira] [Commented] (ARROW-12358) [C++][Python][R][Dataset] Control overwriting vs appending when writing to existing dataset

2022-03-04 Thread Lance Dacey (Jira)
[ https://issues.apache.org/jira/browse/ARROW-12358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17501328#comment-17501328 ] Lance Dacey commented on ARROW-12358: - Is this issue sufficient to track this? In th

[jira] [Closed] (ARROW-12365) [Python] [Dataset] Add partition_filename_cb to ds.write_dataset()

2022-03-04 Thread Lance Dacey (Jira)
[ https://issues.apache.org/jira/browse/ARROW-12365?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lance Dacey closed ARROW-12365. --- Fix Version/s: 6.0.0 Resolution: Resolved delete_matching option solves this issue > [Python

[jira] [Created] (ARROW-15716) [Dataset][Python] Parse a list of fragment paths to gather filters

2022-02-17 Thread Lance Dacey (Jira)
Lance Dacey created ARROW-15716: --- Summary: [Dataset][Python] Parse a list of fragment paths to gather filters Key: ARROW-15716 URL: https://issues.apache.org/jira/browse/ARROW-15716 Project: Apache Arro

[jira] [Commented] (ARROW-12358) [C++][Python][R][Dataset] Control overwriting vs appending when writing to existing dataset

2022-02-02 Thread Lance Dacey (Jira)
[ https://issues.apache.org/jira/browse/ARROW-12358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17485722#comment-17485722 ] Lance Dacey commented on ARROW-12358: - Is this slated for a fix in 7.0.0? I am writi

[jira] [Commented] (ARROW-15474) [Python] Possibility of a table.drop_duplicates() function?

2022-01-27 Thread Lance Dacey (Jira)
[ https://issues.apache.org/jira/browse/ARROW-15474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17483481#comment-17483481 ] Lance Dacey commented on ARROW-15474: - Ahh, that would be great. Random is a bit ris

[jira] [Commented] (ARROW-15474) [Python] Possibility of a table.drop_duplicates() function?

2022-01-27 Thread Lance Dacey (Jira)
[ https://issues.apache.org/jira/browse/ARROW-15474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17483114#comment-17483114 ] Lance Dacey commented on ARROW-15474: - I would personally be okay with only having t

[jira] [Created] (ARROW-15474) [Python] Possibility of a table.drop_duplicates() function?

2022-01-26 Thread Lance Dacey (Jira)
Lance Dacey created ARROW-15474: --- Summary: [Python] Possibility of a table.drop_duplicates() function? Key: ARROW-15474 URL: https://issues.apache.org/jira/browse/ARROW-15474 Project: Apache Arrow

[jira] [Commented] (ARROW-12358) [C++][Python][R][Dataset] Control overwriting vs appending when writing to existing dataset

2022-01-14 Thread Lance Dacey (Jira)
[ https://issues.apache.org/jira/browse/ARROW-12358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17476120#comment-17476120 ] Lance Dacey commented on ARROW-12358: - Ah, so it must be related to the filesystem.

[jira] [Commented] (ARROW-12358) [C++][Python][R][Dataset] Control overwriting vs appending when writing to existing dataset

2022-01-13 Thread Lance Dacey (Jira)
[ https://issues.apache.org/jira/browse/ARROW-12358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17475363#comment-17475363 ] Lance Dacey commented on ARROW-12358: - [~westonpace] Just wanted to check if this is

[jira] [Comment Edited] (ARROW-12358) [C++][Python][R][Dataset] Control overwriting vs appending when writing to existing dataset

2021-12-02 Thread Lance Dacey (Jira)
[ https://issues.apache.org/jira/browse/ARROW-12358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17450796#comment-17450796 ] Lance Dacey edited comment on ARROW-12358 at 12/3/21, 3:04 AM: ---

[jira] [Commented] (ARROW-12358) [C++][Python][R][Dataset] Control overwriting vs appending when writing to existing dataset

2021-12-02 Thread Lance Dacey (Jira)
[ https://issues.apache.org/jira/browse/ARROW-12358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17452649#comment-17452649 ] Lance Dacey commented on ARROW-12358: - Any thoughts on "delete_matching" creating th

[jira] [Commented] (ARROW-14938) Partition column dissappear when reading dataset

2021-12-01 Thread Lance Dacey (Jira)
[ https://issues.apache.org/jira/browse/ARROW-14938?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17451813#comment-17451813 ] Lance Dacey commented on ARROW-14938: - Sure - refer to this section: https://arrow.

[jira] [Commented] (ARROW-14938) Partition column dissappear when reading dataset

2021-12-01 Thread Lance Dacey (Jira)
[ https://issues.apache.org/jira/browse/ARROW-14938?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17451752#comment-17451752 ] Lance Dacey commented on ARROW-14938: - If you add the partitioning argument to ds.da

[jira] [Commented] (ARROW-12358) [C++][Python][R][Dataset] Control overwriting vs appending when writing to existing dataset

2021-11-29 Thread Lance Dacey (Jira)
[ https://issues.apache.org/jira/browse/ARROW-12358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17450796#comment-17450796 ] Lance Dacey commented on ARROW-12358: - I was not able to install 6.0.1 until the lat

[jira] [Commented] (ARROW-14608) [Python] Provide access to hash_aggregate functions through a group_by method

2021-11-29 Thread Lance Dacey (Jira)
[ https://issues.apache.org/jira/browse/ARROW-14608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17450770#comment-17450770 ] Lance Dacey commented on ARROW-14608: - If we can do group_by using the pyarrow table

[jira] [Commented] (ARROW-12358) [C++][Python][R][Dataset] Control overwriting vs appending when writing to existing dataset

2021-08-24 Thread Lance Dacey (Jira)
[ https://issues.apache.org/jira/browse/ARROW-12358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17403772#comment-17403772 ] Lance Dacey commented on ARROW-12358: - kDeleteMatchingPartitions - So this only dele

[jira] [Commented] (ARROW-12365) [Python] [Dataset] Add partition_filename_cb to ds.write_dataset()

2021-08-24 Thread Lance Dacey (Jira)
[ https://issues.apache.org/jira/browse/ARROW-12365?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17403767#comment-17403767 ] Lance Dacey commented on ARROW-12365: - The metadata collector works great, but this

[jira] [Commented] (ARROW-12358) [C++][Python][R][Dataset] Control overwriting vs appending when writing to existing dataset

2021-08-13 Thread Lance Dacey (Jira)
[ https://issues.apache.org/jira/browse/ARROW-12358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17398635#comment-17398635 ] Lance Dacey commented on ARROW-12358: - I do not clear my append dataset, but I need

[jira] [Commented] (ARROW-12358) [C++][Python][R][Dataset] Control overwriting vs appending when writing to existing dataset

2021-08-12 Thread Lance Dacey (Jira)
[ https://issues.apache.org/jira/browse/ARROW-12358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17398007#comment-17398007 ] Lance Dacey commented on ARROW-12358: - What is the common workflow pattern for folks

[jira] [Commented] (ARROW-13074) [Python] Start with deprecating ParquetDataset custom attributes

2021-07-07 Thread Lance Dacey (Jira)
[ https://issues.apache.org/jira/browse/ARROW-13074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17376448#comment-17376448 ] Lance Dacey commented on ARROW-13074: - Sure Joris, I posted it and then I read that

[jira] [Issue Comment Deleted] (ARROW-13074) [Python] Start with deprecating ParquetDataset custom attributes

2021-07-06 Thread Lance Dacey (Jira)
[ https://issues.apache.org/jira/browse/ARROW-13074?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lance Dacey updated ARROW-13074: Comment: was deleted (was: I have run into a few issues with basename_template:   1) If I run ta

[jira] [Commented] (ARROW-13074) [Python] Start with deprecating ParquetDataset custom attributes

2021-07-06 Thread Lance Dacey (Jira)
[ https://issues.apache.org/jira/browse/ARROW-13074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17375901#comment-17375901 ] Lance Dacey commented on ARROW-13074: - I have run into a few issues with basename_te

[jira] [Commented] (ARROW-13074) [Python] Start with deprecating ParquetDataset custom attributes

2021-07-06 Thread Lance Dacey (Jira)
[ https://issues.apache.org/jira/browse/ARROW-13074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17375518#comment-17375518 ] Lance Dacey commented on ARROW-13074: -  Any idea if this includes the partition_file

[jira] [Closed] (ARROW-12364) [Python] [Dataset] Add metadata_collector option to ds.write_dataset()

2021-06-22 Thread Lance Dacey (Jira)
[ https://issues.apache.org/jira/browse/ARROW-12364?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lance Dacey closed ARROW-12364. --- Fix Version/s: 5.0.0 Resolution: Fixed > [Python] [Dataset] Add metadata_collector option to

[jira] [Commented] (ARROW-12364) [Python] [Dataset] Add metadata_collector option to ds.write_dataset()

2021-06-22 Thread Lance Dacey (Jira)
[ https://issues.apache.org/jira/browse/ARROW-12364?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17367552#comment-17367552 ] Lance Dacey commented on ARROW-12364: - I think this is taken care of by ARROW-10440

[jira] [Commented] (ARROW-12364) [Python] [Dataset] Add metadata_collector option to ds.write_dataset()

2021-06-22 Thread Lance Dacey (Jira)
[ https://issues.apache.org/jira/browse/ARROW-12364?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17367199#comment-17367199 ] Lance Dacey commented on ARROW-12364: - Hi @jorisvandenbossche, you asked me to creat

[jira] [Commented] (ARROW-12358) [C++][Python][R][Dataset] Control overwriting vs appending when writing to existing dataset

2021-05-17 Thread Lance Dacey (Jira)
[ https://issues.apache.org/jira/browse/ARROW-12358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17346110#comment-17346110 ] Lance Dacey commented on ARROW-12358: - Being able to update and replace specific row

[jira] [Closed] (ARROW-12365) [Python] [Dataset] Add partition_filename_cb to ds.write_dataset()

2021-04-29 Thread Lance Dacey (Jira)
[ https://issues.apache.org/jira/browse/ARROW-12365?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lance Dacey closed ARROW-12365. --- Fix Version/s: 5.0.0 Resolution: Not A Problem > [Python] [Dataset] Add partition_filename_cb

[jira] [Commented] (ARROW-12365) [Python] [Dataset] Add partition_filename_cb to ds.write_dataset()

2021-04-29 Thread Lance Dacey (Jira)
[ https://issues.apache.org/jira/browse/ARROW-12365?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17335292#comment-17335292 ] Lance Dacey commented on ARROW-12365: - @jorisvandenbossche I will close this issue i

[jira] [Closed] (ARROW-11250) [Python] Inconsistent behavior calling ds.dataset()

2021-04-16 Thread Lance Dacey (Jira)
[ https://issues.apache.org/jira/browse/ARROW-11250?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lance Dacey closed ARROW-11250. --- Fix Version/s: (was: 5.0.0) 3.0.0 Resolution: Fixed This was fixed wit

[jira] [Closed] (ARROW-9682) [Python] Unable to specify the partition style with pq.write_to_dataset

2021-04-16 Thread Lance Dacey (Jira)
[ https://issues.apache.org/jira/browse/ARROW-9682?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lance Dacey closed ARROW-9682. -- Resolution: Not A Problem This works using ds.write_dataset() > [Python] Unable to specify the partiti

[jira] [Commented] (ARROW-12358) [C++][Python][R][Dataset] Control overwriting vs appending when writing to existing dataset

2021-04-13 Thread Lance Dacey (Jira)
[ https://issues.apache.org/jira/browse/ARROW-12358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17320221#comment-17320221 ] Lance Dacey commented on ARROW-12358: - I think that having an "overwrite" option wou

[jira] [Commented] (ARROW-10695) [C++][Dataset] Allow to use a UUID in the basename_template when writing a dataset

2021-04-13 Thread Lance Dacey (Jira)
[ https://issues.apache.org/jira/browse/ARROW-10695?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17320192#comment-17320192 ] Lance Dacey commented on ARROW-10695: - [~jorisvandenbossche] partition_filename_cb:

[jira] [Created] (ARROW-12365) [Python] [Dataset] Add partition_filename_cb to ds.write_dataset()

2021-04-13 Thread Lance Dacey (Jira)
Lance Dacey created ARROW-12365: --- Summary: [Python] [Dataset] Add partition_filename_cb to ds.write_dataset() Key: ARROW-12365 URL: https://issues.apache.org/jira/browse/ARROW-12365 Project: Apache Arro

[jira] [Created] (ARROW-12364) [Python] [Dataset] Add metadata_collector option to ds.write_dataset()

2021-04-13 Thread Lance Dacey (Jira)
Lance Dacey created ARROW-12364: --- Summary: [Python] [Dataset] Add metadata_collector option to ds.write_dataset() Key: ARROW-12364 URL: https://issues.apache.org/jira/browse/ARROW-12364 Project: Apache

[jira] [Commented] (ARROW-10695) [C++][Dataset] Allow to use a UUID in the basename_template when writing a dataset

2021-04-13 Thread Lance Dacey (Jira)
[ https://issues.apache.org/jira/browse/ARROW-10695?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17320167#comment-17320167 ] Lance Dacey commented on ARROW-10695: - I have been creating my own basename_template

[jira] [Commented] (ARROW-10695) [C++][Dataset] Allow to use a UUID in the basename_template when writing a dataset

2021-03-23 Thread Lance Dacey (Jira)
[ https://issues.apache.org/jira/browse/ARROW-10695?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17306996#comment-17306996 ] Lance Dacey commented on ARROW-10695: - Sorry, did not see a notification for this. H

[jira] [Commented] (ARROW-10440) [C++][Dataset][Python] Add a callback to visit file writers just before Finish()

2021-03-11 Thread Lance Dacey (Jira)
[ https://issues.apache.org/jira/browse/ARROW-10440?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17299754#comment-17299754 ] Lance Dacey commented on ARROW-10440: - Can someone confirm if this issue would cover

[jira] [Closed] (ARROW-10694) [Python] ds.write_dataset() generates empty files for each final partition

2021-03-10 Thread Lance Dacey (Jira)
[ https://issues.apache.org/jira/browse/ARROW-10694?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lance Dacey closed ARROW-10694. --- Fix Version/s: 3.0.0 Resolution: Fixed https://github.com/dask/adlfs/pull/193 > [Python] ds.

[jira] [Commented] (ARROW-10694) [Python] ds.write_dataset() generates empty files for each final partition

2021-03-10 Thread Lance Dacey (Jira)
[ https://issues.apache.org/jira/browse/ARROW-10694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17298828#comment-17298828 ] Lance Dacey commented on ARROW-10694: - This is being worked on in the adlfs library

[jira] [Commented] (ARROW-10440) [C++][Dataset][Python] Add a callback to visit file writers just before Finish()

2021-03-03 Thread Lance Dacey (Jira)
[ https://issues.apache.org/jira/browse/ARROW-10440?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17294650#comment-17294650 ] Lance Dacey commented on ARROW-10440: - Will this change allow us to get a list of th

[jira] [Commented] (ARROW-10695) [C++][Dataset] Allow to use a UUID in the basename_template when writing a dataset

2021-02-17 Thread Lance Dacey (Jira)
[ https://issues.apache.org/jira/browse/ARROW-10695?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17286036#comment-17286036 ] Lance Dacey commented on ARROW-10695: - Perhaps this has changed, but I was running i

[jira] [Created] (ARROW-11453) [Python] [Dataset] Unable to use write_dataset() to Azure Blob with adlfs 0.6.0

2021-02-01 Thread Lance Dacey (Jira)
Lance Dacey created ARROW-11453: --- Summary: [Python] [Dataset] Unable to use write_dataset() to Azure Blob with adlfs 0.6.0 Key: ARROW-11453 URL: https://issues.apache.org/jira/browse/ARROW-11453 Project

[jira] [Closed] (ARROW-11390) [Python] pyarrow 3.0 issues with turbodbc

2021-01-27 Thread Lance Dacey (Jira)
[ https://issues.apache.org/jira/browse/ARROW-11390?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lance Dacey closed ARROW-11390. --- Fix Version/s: 3.0.0 Resolution: Fixed I reorganized my Dockerfile to ensure that pyarrow 3.0

[jira] [Commented] (ARROW-11390) [Python] pyarrow 3.0 issues with turbodbc

2021-01-27 Thread Lance Dacey (Jira)
[ https://issues.apache.org/jira/browse/ARROW-11390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17273199#comment-17273199 ] Lance Dacey commented on ARROW-11390: - Everything seems to be all set now, thanks!

[jira] [Commented] (ARROW-11390) [Python] pyarrow 3.0 issues with turbodbc

2021-01-27 Thread Lance Dacey (Jira)
[ https://issues.apache.org/jira/browse/ARROW-11390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17272708#comment-17272708 ] Lance Dacey commented on ARROW-11390: - That makes sense. I checked further and the b

[jira] [Commented] (ARROW-11390) [Python] pyarrow 3.0 issues with turbodbc

2021-01-26 Thread Lance Dacey (Jira)
[ https://issues.apache.org/jira/browse/ARROW-11390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17272271#comment-17272271 ] Lance Dacey commented on ARROW-11390: - Actually, turbodbc would have been installed

[jira] [Created] (ARROW-11390) [Python] pyarrow 3.0 issues with turbodbc

2021-01-26 Thread Lance Dacey (Jira)
Lance Dacey created ARROW-11390: --- Summary: [Python] pyarrow 3.0 issues with turbodbc Key: ARROW-11390 URL: https://issues.apache.org/jira/browse/ARROW-11390 Project: Apache Arrow Issue Type: Bu

[jira] [Commented] (ARROW-11250) [Python] Inconsistent behavior calling ds.dataset()

2021-01-15 Thread Lance Dacey (Jira)
[ https://issues.apache.org/jira/browse/ARROW-11250?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17266151#comment-17266151 ] Lance Dacey commented on ARROW-11250: - Good idea - I was a able to list all of the f

[jira] [Comment Edited] (ARROW-10247) [C++][Dataset] Cannot write dataset with dictionary column as partition field

2021-01-15 Thread Lance Dacey (Jira)
[ https://issues.apache.org/jira/browse/ARROW-10247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17265928#comment-17265928 ] Lance Dacey edited comment on ARROW-10247 at 1/15/21, 11:08 AM: --

[jira] [Commented] (ARROW-10247) [C++][Dataset] Cannot write dataset with dictionary column as partition field

2021-01-15 Thread Lance Dacey (Jira)
[ https://issues.apache.org/jira/browse/ARROW-10247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17265928#comment-17265928 ] Lance Dacey commented on ARROW-10247: - Nice - how would you general go about finding

[jira] [Commented] (ARROW-11250) [Python] Inconsistent behavior calling ds.dataset()

2021-01-15 Thread Lance Dacey (Jira)
[ https://issues.apache.org/jira/browse/ARROW-11250?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17265922#comment-17265922 ] Lance Dacey commented on ARROW-11250: - Do you have any idea at all what could also b

[jira] [Commented] (ARROW-11250) [Python] Inconsistent behavior calling ds.dataset()

2021-01-15 Thread Lance Dacey (Jira)
[ https://issues.apache.org/jira/browse/ARROW-11250?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17265909#comment-17265909 ] Lance Dacey commented on ARROW-11250: - Sure, I can raise an issue there.   {code:ja

[jira] [Comment Edited] (ARROW-11250) [Python] Inconsistent behavior calling ds.dataset()

2021-01-15 Thread Lance Dacey (Jira)
[ https://issues.apache.org/jira/browse/ARROW-11250?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17265869#comment-17265869 ] Lance Dacey edited comment on ARROW-11250 at 1/15/21, 10:19 AM: --

[jira] [Commented] (ARROW-11250) [Python] Inconsistent behavior calling ds.dataset()

2021-01-15 Thread Lance Dacey (Jira)
[ https://issues.apache.org/jira/browse/ARROW-11250?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17265869#comment-17265869 ] Lance Dacey commented on ARROW-11250: - {code:java} selected_files1 = fs.find("dev/te

[jira] [Created] (ARROW-11250) [Python] Inconsistent behavior calling ds.dataset()

2021-01-14 Thread Lance Dacey (Jira)
Lance Dacey created ARROW-11250: --- Summary: [Python] Inconsistent behavior calling ds.dataset() Key: ARROW-11250 URL: https://issues.apache.org/jira/browse/ARROW-11250 Project: Apache Arrow Issu

[jira] [Comment Edited] (ARROW-10247) [C++][Dataset] Cannot write dataset with dictionary column as partition field

2021-01-09 Thread Lance Dacey (Jira)
[ https://issues.apache.org/jira/browse/ARROW-10247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17261996#comment-17261996 ] Lance Dacey edited comment on ARROW-10247 at 1/10/21, 3:27 AM: ---

[jira] [Commented] (ARROW-10247) [C++][Dataset] Cannot write dataset with dictionary column as partition field

2021-01-09 Thread Lance Dacey (Jira)
[ https://issues.apache.org/jira/browse/ARROW-10247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17261996#comment-17261996 ] Lance Dacey commented on ARROW-10247: - What is the best workaround for this issue ri

[jira] [Commented] (ARROW-10523) [Python] Pandas timestamps are inferred to have only microsecond precision

2021-01-07 Thread Lance Dacey (Jira)
[ https://issues.apache.org/jira/browse/ARROW-10523?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17260878#comment-17260878 ] Lance Dacey commented on ARROW-10523: - I noticed that even explicitly using (unit="n

[jira] [Commented] (ARROW-10695) [C++][Dataset] Allow to use a UUID in the basename_template when writing a dataset

2020-12-04 Thread Lance Dacey (Jira)
[ https://issues.apache.org/jira/browse/ARROW-10695?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17244232#comment-17244232 ] Lance Dacey commented on ARROW-10695: - FYI, I think this might be necessary for some

[jira] [Commented] (ARROW-10517) [Python] Unable to read/write Parquet datasets with fsspec on Azure Blob

2020-12-04 Thread Lance Dacey (Jira)
[ https://issues.apache.org/jira/browse/ARROW-10517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17243963#comment-17243963 ] Lance Dacey commented on ARROW-10517: - Yes, I think the uuid specifier would work fi

[jira] [Commented] (ARROW-10517) [Python] Unable to read/write Parquet datasets with fsspec on Azure Blob

2020-12-02 Thread Lance Dacey (Jira)
[ https://issues.apache.org/jira/browse/ARROW-10517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17242425#comment-17242425 ] Lance Dacey commented on ARROW-10517: - FYI, it seems like the "part-\{i}" basename_t

[jira] [Comment Edited] (ARROW-10694) [Python] ds.write_dataset() generates empty files for each final partition

2020-12-02 Thread Lance Dacey (Jira)
[ https://issues.apache.org/jira/browse/ARROW-10694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17242213#comment-17242213 ] Lance Dacey edited comment on ARROW-10694 at 12/2/20, 9:57 AM: ---

[jira] [Commented] (ARROW-10694) [Python] ds.write_dataset() generates empty files for each final partition

2020-12-02 Thread Lance Dacey (Jira)
[ https://issues.apache.org/jira/browse/ARROW-10694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17242213#comment-17242213 ] Lance Dacey commented on ARROW-10694: - I am simply listing and deleting blobs with "

[jira] [Commented] (ARROW-10517) [Python] Unable to read/write Parquet datasets with fsspec on Azure Blob

2020-12-01 Thread Lance Dacey (Jira)
[ https://issues.apache.org/jira/browse/ARROW-10517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17241833#comment-17241833 ] Lance Dacey commented on ARROW-10517: - Thanks - since the \{i} increments each time

[jira] [Commented] (ARROW-10694) [Python] ds.write_dataset() generates empty files for each final partition

2020-11-23 Thread Lance Dacey (Jira)
[ https://issues.apache.org/jira/browse/ARROW-10694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17237341#comment-17237341 ] Lance Dacey commented on ARROW-10694: - FYI, I tested HivePartitioning as well, but f

[jira] [Commented] (ARROW-10694) [Python] ds.write_dataset() generates empty files for each final partition

2020-11-23 Thread Lance Dacey (Jira)
[ https://issues.apache.org/jira/browse/ARROW-10694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17237316#comment-17237316 ] Lance Dacey commented on ARROW-10694: - {code:java} print(fs.isfile("dev/test-dataset

[jira] [Commented] (ARROW-10694) [Python] ds.write_dataset() generates empty files for each final partition

2020-11-23 Thread Lance Dacey (Jira)
[ https://issues.apache.org/jira/browse/ARROW-10694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17237299#comment-17237299 ] Lance Dacey commented on ARROW-10694: - Sure. https://github.com/dask/adlfs/issues/13

[jira] [Created] (ARROW-10694) [Python] ds.write_dataset() generates empty files for each final partition

2020-11-23 Thread Lance Dacey (Jira)
Lance Dacey created ARROW-10694: --- Summary: [Python] ds.write_dataset() generates empty files for each final partition Key: ARROW-10694 URL: https://issues.apache.org/jira/browse/ARROW-10694 Project: Apa

[jira] [Commented] (ARROW-10517) [Python] Unable to read/write Parquet datasets with fsspec on Azure Blob

2020-11-21 Thread Lance Dacey (Jira)
[ https://issues.apache.org/jira/browse/ARROW-10517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17236702#comment-17236702 ] Lance Dacey commented on ARROW-10517: - Regarding partition_filename_cb, some common

[jira] [Closed] (ARROW-10517) [Python] Unable to read/write Parquet datasets with fsspec on Azure Blob

2020-11-20 Thread Lance Dacey (Jira)
[ https://issues.apache.org/jira/browse/ARROW-10517?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lance Dacey closed ARROW-10517. --- Fix Version/s: 2.0.0 Resolution: Later My issue is caused by another library (adlfs). Once th

[jira] [Commented] (ARROW-10517) [Python] Unable to read/write Parquet datasets with fsspec on Azure Blob

2020-11-20 Thread Lance Dacey (Jira)
[ https://issues.apache.org/jira/browse/ARROW-10517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17236357#comment-17236357 ] Lance Dacey commented on ARROW-10517: - Thanks for your help. By adding **kwargs to t

[jira] [Comment Edited] (ARROW-10517) [Python] Unable to read/write Parquet datasets with fsspec on Azure Blob

2020-11-20 Thread Lance Dacey (Jira)
[ https://issues.apache.org/jira/browse/ARROW-10517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17236357#comment-17236357 ] Lance Dacey edited comment on ARROW-10517 at 11/20/20, 6:24 PM: --

[jira] [Comment Edited] (ARROW-10517) [Python] Unable to read/write Parquet datasets with fsspec on Azure Blob

2020-11-20 Thread Lance Dacey (Jira)
[ https://issues.apache.org/jira/browse/ARROW-10517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17235477#comment-17235477 ] Lance Dacey edited comment on ARROW-10517 at 11/20/20, 8:26 AM: --

[jira] [Comment Edited] (ARROW-10517) [Python] Unable to read/write Parquet datasets with fsspec on Azure Blob

2020-11-19 Thread Lance Dacey (Jira)
[ https://issues.apache.org/jira/browse/ARROW-10517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17235477#comment-17235477 ] Lance Dacey edited comment on ARROW-10517 at 11/19/20, 1:48 PM: --

[jira] [Commented] (ARROW-10517) [Python] Unable to read/write Parquet datasets with fsspec on Azure Blob

2020-11-19 Thread Lance Dacey (Jira)
[ https://issues.apache.org/jira/browse/ARROW-10517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17235477#comment-17235477 ] Lance Dacey commented on ARROW-10517: - Yeah, I can open an issue there. I hopefully

[jira] [Commented] (ARROW-10517) [Python] Unable to read/write Parquet datasets with fsspec on Azure Blob

2020-11-19 Thread Lance Dacey (Jira)
[ https://issues.apache.org/jira/browse/ARROW-10517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17235388#comment-17235388 ] Lance Dacey commented on ARROW-10517: - Latest adlfs (0.5.5):   This really creates

[jira] [Updated] (ARROW-10517) [Python] Unable to read/write Parquet datasets with fsspec on Azure Blob

2020-11-19 Thread Lance Dacey (Jira)
[ https://issues.apache.org/jira/browse/ARROW-10517?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lance Dacey updated ARROW-10517: Attachment: ss2.PNG > [Python] Unable to read/write Parquet datasets with fsspec on Azure Blob > -

[jira] [Commented] (ARROW-10517) [Python] Unable to read/write Parquet datasets with fsspec on Azure Blob

2020-11-19 Thread Lance Dacey (Jira)
[ https://issues.apache.org/jira/browse/ARROW-10517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17235306#comment-17235306 ] Lance Dacey commented on ARROW-10517: - !ss.PNG! Added a screenshot of the result

[jira] [Updated] (ARROW-10517) [Python] Unable to read/write Parquet datasets with fsspec on Azure Blob

2020-11-19 Thread Lance Dacey (Jira)
[ https://issues.apache.org/jira/browse/ARROW-10517?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lance Dacey updated ARROW-10517: Attachment: ss.PNG > [Python] Unable to read/write Parquet datasets with fsspec on Azure Blob > --

[jira] [Commented] (ARROW-10517) [Python] Unable to read/write Parquet datasets with fsspec on Azure Blob

2020-11-19 Thread Lance Dacey (Jira)
[ https://issues.apache.org/jira/browse/ARROW-10517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17235277#comment-17235277 ] Lance Dacey commented on ARROW-10517: - This works on my local conda environment (dep

[jira] [Comment Edited] (ARROW-10517) [Python] Unable to read/write Parquet datasets with fsspec on Azure Blob

2020-11-18 Thread Lance Dacey (Jira)
[ https://issues.apache.org/jira/browse/ARROW-10517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17235084#comment-17235084 ] Lance Dacey edited comment on ARROW-10517 at 11/19/20, 7:44 AM: --

[jira] [Updated] (ARROW-10517) [Python] Unable to read/write Parquet datasets with fsspec on Azure Blob

2020-11-18 Thread Lance Dacey (Jira)
[ https://issues.apache.org/jira/browse/ARROW-10517?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lance Dacey updated ARROW-10517: Description:   {code:python} # adal==1.2.5 # adlfs==0.2.5 # fsspec==0.7.4 # pandas==1.1.3 # pyarro

[jira] [Commented] (ARROW-10517) [Python] Unable to read/write Parquet datasets with fsspec on Azure Blob

2020-11-18 Thread Lance Dacey (Jira)
[ https://issues.apache.org/jira/browse/ARROW-10517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17235084#comment-17235084 ] Lance Dacey commented on ARROW-10517: - Added an edit with the results of pure fsspec

[jira] [Updated] (ARROW-10517) [Python] Unable to read/write Parquet datasets with fsspec on Azure Blob

2020-11-18 Thread Lance Dacey (Jira)
[ https://issues.apache.org/jira/browse/ARROW-10517?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lance Dacey updated ARROW-10517: Description:   {code:python} # adal==1.2.5 # adlfs==0.2.5 # fsspec==0.7.4 # pandas==1.1.3 # pyarro

[jira] [Commented] (ARROW-10517) [Python] Unable to read/write Parquet datasets with fsspec on Azure Blob

2020-11-13 Thread Lance Dacey (Jira)
[ https://issues.apache.org/jira/browse/ARROW-10517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17231903#comment-17231903 ] Lance Dacey commented on ARROW-10517: - Hello - let me know if my edit covers it. Pr

[jira] [Updated] (ARROW-10517) [Python] Unable to read/write Parquet datasets with fsspec on Azure Blob

2020-11-13 Thread Lance Dacey (Jira)
[ https://issues.apache.org/jira/browse/ARROW-10517?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lance Dacey updated ARROW-10517: Description:     If I downgrade adlfs to 0.2.5 and azure-blob-storage to 2.1, and then upgrade

[jira] [Updated] (ARROW-10517) [Python] Unable to read/write Parquet datasets with fsspec on Azure Blob

2020-11-13 Thread Lance Dacey (Jira)
[ https://issues.apache.org/jira/browse/ARROW-10517?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lance Dacey updated ARROW-10517: Description:   {code:python} # adal==1.2.5 # adlfs==0.2.5 # fsspec==0.7.4 # pandas==1.1.3 # pyarr

[jira] [Commented] (ARROW-10517) [Python] Unable to read/write Parquet datasets with fsspec on Azure Blob

2020-11-08 Thread Lance Dacey (Jira)
[ https://issues.apache.org/jira/browse/ARROW-10517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17228257#comment-17228257 ] Lance Dacey commented on ARROW-10517: - + [~mdurant] and [~jorisvandenbossche] You g

  1   2   >