Will Jones created ARROW-16085: ---------------------------------- Summary: [R] Support unifying schemas for InMemoryDatasets Key: ARROW-16085 URL: https://issues.apache.org/jira/browse/ARROW-16085 Project: Apache Arrow Issue Type: Improvement Components: R Affects Versions: 7.0.0 Reporter: Will Jones Fix For: 8.0.0
The following fails: {code:R} sub_df1 <- Table$create( x = Array$create(c(1, 2, 3)), y = Array$create(c("a", "b", "c")) ) sub_df2 <- Table$create( x = Array$create(c(4, 5)), z = Array$create(c("d", "e")) ) ds1 <- InMemoryDataset$create(sub_df1) ds2 <- InMemoryDataset$create(sub_df2) ds <- c(ds1, ds2) actual <- ds %>% collect() {code} {code} Type error: yielded batch had schema x: double y: string which did not match InMemorySource's: x: double y: string z: string /Users/willjones/Documents/arrows/arrow-quick/cpp/src/arrow/util/iterator.h:541 child_.Next() /Users/willjones/Documents/arrows/arrow-quick/cpp/src/arrow/util/iterator.h:152 value_.status() /Users/willjones/Documents/arrows/arrow-quick/cpp/src/arrow/util/iterator.h:180 maybe_element /Users/willjones/Documents/arrows/arrow-quick/cpp/src/arrow/dataset/scanner.cc:840 fragments_it.ToVector() {code} If we fixed this, we could implement a function that does for Tables what {{dplyr::bind_rows}} does for Tibbles: {code:R} concat_tables <- function(..., schema = NULL) { tables <- list2(...) dataset <- open_dataset(map(tables, InMemoryDataset$create), schema = schema) dplyr::collect(dataset, as_data_frame = FALSE) } {code} -- This message was sent by Atlassian Jira (v8.20.1#820001)