GitHub user thisisnic added a comment to the discussion: how to debug
arrow/dplyr to consider a bug report?
First thing I'm gonna try is writing the dataset to a temporary file - this is
all done at the arrow level without bringing it into R. Then I'll read it in
again and see if the filter works.
```
tf <- tempfile()
dir.create(tf)
open_dataset('data/softcite-extractions-oa-data/p01_one_percent_random_subset/papers.parquet')
%>%
write_dataset(tf)
open_dataset(tf) |>
filter(published_year < 1990) |>
collect() |>
nrow()
```
I go `1720` here, so it feel like there's something wrong either with the file
or how it's being read. The next step is comparing the new file with the old
one and seeing if there are any differences.
GitHub link:
https://github.com/apache/arrow/discussions/46383#discussioncomment-13119481
----
This is an automatically sent email for [email protected].
To unsubscribe, please send an email to: [email protected]