Mauricio 'Pachá' Vargas Sepúlveda created ARROW-13188: ---------------------------------------------------------
Summary: [R] [C++] Implement SQL-alike distinct() for dplyr queries Key: ARROW-13188 URL: https://issues.apache.org/jira/browse/ARROW-13188 Project: Apache Arrow Issue Type: Bug Components: C++, R Affects Versions: 4.0.1 Reporter: Mauricio 'Pachá' Vargas Sepúlveda I would be highly desirable to be able to use (base) substr and/or (stringr) str_sub in dplyr queries, like {code:r} library(arrow) library(dplyr) library(stringr) # get animal products, year 20919 open_dataset( "../cepii-datasets-arrow/parquet/baci_hs92", partitioning = c("year", "reporter_iso") ) %>% filter( year == 2019, str_sub(product_code, 1, 2) == "01" ) %>% collect() Error: Filter expression not supported for Arrow Datasets: str_sub(product_code, 1, 2) == "01" Call collect() first to pull data into R. {code} Of course, this needs implementation, but similar to ARROW-13107, points to an easier integration with dplyr. -- This message was sent by Atlassian Jira (v8.3.4#803005)