Mauricio 'Pachá' Vargas Sepúlveda created ARROW-13188:
---------------------------------------------------------

             Summary: [R] [C++] Implement SQL-alike distinct() for dplyr queries
                 Key: ARROW-13188
                 URL: https://issues.apache.org/jira/browse/ARROW-13188
             Project: Apache Arrow
          Issue Type: Bug
          Components: C++, R
    Affects Versions: 4.0.1
            Reporter: Mauricio 'Pachá' Vargas Sepúlveda


I would be highly desirable to be able to use (base) substr and/or (stringr) 
str_sub in dplyr queries, like

{code:r}
library(arrow)
library(dplyr)
library(stringr)

# get animal products, year 20919
open_dataset(
  "../cepii-datasets-arrow/parquet/baci_hs92",
  partitioning = c("year", "reporter_iso")
) %>% 
  filter(
    year == 2019,
    str_sub(product_code, 1, 2) == "01"
  ) %>% 
  collect()

Error: Filter expression not supported for Arrow Datasets: 
str_sub(product_code, 1, 2) == "01"
Call collect() first to pull data into R.
{code}

Of course, this needs implementation, but similar to ARROW-13107, points to an 
easier integration with dplyr.




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to