Sam Albers created ARROW-14324:
----------------------------------

             Summary: Inconsistent application of type in Datasets via the 
schema
                 Key: ARROW-14324
                 URL: https://issues.apache.org/jira/browse/ARROW-14324
             Project: Apache Arrow
          Issue Type: Bug
          Components: R
    Affects Versions: 5.0.0
            Reporter: Sam Albers


 

It looks like at least {{filter}} is not handling a column type specified by 
{{schema }}when specified in {{open_dataset. }}Reprex:
{code:java}
options("max.print" = 5)
library(arrow, warn.conflicts = FALSE)
library(dplyr, warn.conflicts = FALSE)
## Set up the data
tf <- tempfile()
dir.create(tf)
write_dataset(quakes, tf)
## Works as expected
open_dataset(tf) %>%
 filter(stations == 41) %>%
 collect()
#> lat long depth mag stations
#> 1 -20.42 181.62 562 4.8 41
#> [ reached 'max' / getOption("max.print") -- omitted 11 rows ]
## errors as expected
open_dataset(tf) %>%
 filter(stations == "41") %>%
 collect()
#> Error: NotImplemented: Function equal has no kernel matching input types 
(array[int32], scalar[string])
## Ok let's change a column type
tf_reg <- open_dataset(tf)$schema
tf_reg$stations <- string()
## ok returns a character
open_dataset(tf, schema = tf_reg) %>%
 pull(stations) %>%
 typeof()
#> [1] "character"
## So if `stations` is character I think this should work?
open_dataset(tf, schema = tf_reg) %>%
 filter(stations == as.character("41")) %>%
 collect()
#> Error: Filter expression not supported for Arrow Datasets: stations == 
as.character("41")
#> Call collect() first to pull data into R.
## previous behaviour no longer works
open_dataset(tf, schema = tf_reg) %>%
 filter(stations == 41) %>%
 collect()
#> Error: NotImplemented: Function equal has no kernel matching input types 
(array[string], scalar[double])
 
{code}
 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to