[ https://issues.apache.org/jira/browse/ARROW-7740?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Wes McKinney resolved ARROW-7740. --------------------------------- Resolution: Fixed Issue resolved by pull request 6792 [https://github.com/apache/arrow/pull/6792] > [C++] Array internals corruption in StructArray::Flatten > -------------------------------------------------------- > > Key: ARROW-7740 > URL: https://issues.apache.org/jira/browse/ARROW-7740 > Project: Apache Arrow > Issue Type: Bug > Components: C++ > Reporter: John Sheffield > Assignee: Wes McKinney > Priority: Critical > Labels: pull-request-available > Fix For: 0.17.0 > > Time Spent: 1.5h > Remaining Estimate: 0h > > Reading a nested ndjson file using arrow::read_json_arrow with the default > `as_data_frame=TRUE` causes an immediate session crash, but switching to > `as_data_frame=FALSE` works fine and the resulting arrow object schema is > correct. > {code:java} > library(tidyr) > library(arrow) > library(jsonlite) > # Create two test datasets: long_df and a variant that nests long_df into > # a dataframe with a list-column 'nest_level1' containing a dataframe > long_df <- tidyr::expand_grid(ABC = LETTERS[1:3], xyz = letters[24:26], num = > 1:3) > long_df[["ftr1"]] <- runif(nrow(long_df)) > long_df[["ftr2"]] <- rpois(nrow(long_df), 100) > nested_frame_level1 <- tidyr::nest(long_df, nest_level1 = c(num, ftr1, ftr2)) > # Write and validate nested ndjson > jsonlite::stream_out(nested_frame_level1, con = > file("nested_frame_level1.json")) > readLines("nested_frame_level1.json", n = 2) # check we have valid ndjson here > # This does not cause a session crash > nested_arrow <- arrow::read_json_arrow(file = "nested_frame_level1.json", > as_data_frame = FALSE) > nested_arrow$schema # correctly interprets 'nest_level1` as `list<item: > struct<num: int64, ftr1: double, ftr2: int64>>` > # This causes a session crash > nested_df <- arrow::read_json_arrow(file = "nested_frame_level1.json", > as_data_frame = TRUE) > > {code} > The R package version of Arrow is latest CRAN release (arrow * 0.15.1.1, > 2019-11-05, CRAN (R 3.5.2)). I'm running this code in a slightly older R > version (3.5.1), macOS 10.14.6, x86_64, darwin15.6.0, via RStudio 1.2.5001. > [edit: formatting fix] -- This message was sent by Atlassian Jira (v8.3.4#803005)