GitHub user jameshowison closed a discussion: how to debug arrow/dplyr to 
consider a bug report?

We are seeing unexpected behavior with arrow using dplyr `filter`.  The issue 
seems to be centered around a less than filter that works when we use in-memory 
but doesn't work when we use `open_dataset`.

We asked the issue on stackoverflow here: 
https://stackoverflow.com/questions/79607580/how-to-properly-use-less-than-in-a-dplyr-filter-of-a-sharded-arrow-dataset#comment140408196_79607580

And I've created a test dataset and code at: 
https://github.com/softcite/softcite-extractions-parquet-analysis in the 
https://github.com/softcite/softcite-extractions-parquet-analysis/blob/main/analysis/queries_on_parquet.qmd
 file.

I have no idea if this is pointing to a bug, so I don't want to post an issue.  
I didn't think that posit forums would help, since I think the arrow/parquet 
versions of the dplyr verbs are implemented here?

But I also don't know how to debug this further, so any guidance on that would 
be appreciated.  If I can debug it further and it does look like an issue I'll 
try to create a smaller dataset to show the behavior (but there is one in the 
GitHub repo above that it's too giant).

Thanks!
James

GitHub link: https://github.com/apache/arrow/discussions/46383

----
This is an automatically sent email for [email protected].
To unsubscribe, please send an email to: [email protected]

Reply via email to