[ https://issues.apache.org/jira/browse/ARROW-13188?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Mauricio 'Pachá' Vargas Sepúlveda updated ARROW-13188: ------------------------------------------------------ Summary: [R] [C++] Implement substr/str_sub for dplyr queries (was: [R] [C++] Implement SQL-alike distinct() for dplyr queries) > [R] [C++] Implement substr/str_sub for dplyr queries > ---------------------------------------------------- > > Key: ARROW-13188 > URL: https://issues.apache.org/jira/browse/ARROW-13188 > Project: Apache Arrow > Issue Type: Bug > Components: C++, R > Affects Versions: 4.0.1 > Reporter: Mauricio 'Pachá' Vargas Sepúlveda > Priority: Minor > > I would be highly desirable to be able to use (base) substr and/or (stringr) > str_sub in dplyr queries, like > {code:r} > library(arrow) > library(dplyr) > library(stringr) > # get animal products, year 20919 > open_dataset( > "../cepii-datasets-arrow/parquet/baci_hs92", > partitioning = c("year", "reporter_iso") > ) %>% > filter( > year == 2019, > str_sub(product_code, 1, 2) == "01" > ) %>% > collect() > Error: Filter expression not supported for Arrow Datasets: > str_sub(product_code, 1, 2) == "01" > Call collect() first to pull data into R. > {code} > Of course, this needs implementation, but similar to ARROW-13107, points to > an easier integration with dplyr. -- This message was sent by Atlassian Jira (v8.3.4#803005)