[ https://issues.apache.org/jira/browse/ARROW-10125?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
ASF GitHub Bot updated ARROW-10125: ----------------------------------- Labels: pull-request-available (was: ) > [R] Int64 downcast check doesn't consider all chunks > ---------------------------------------------------- > > Key: ARROW-10125 > URL: https://issues.apache.org/jira/browse/ARROW-10125 > Project: Apache Arrow > Issue Type: Bug > Components: R > Affects Versions: 1.0.1 > Reporter: Kyle Kavanagh > Assignee: Neal Richardson > Priority: Critical > Labels: pull-request-available > Fix For: 2.0.0 > > Time Spent: 10m > Remaining Estimate: 0h > > I've got a proprietary dataset where one of the columns is an integer64 but > all of the values would fit within 32bits. As I understand it, arrow/feather > will downcast that column when the data is read back into R (not ideal IMO, > but not an issue generally). However, I'm having some trouble with a > specific dataset. > When I read in the data, the column is set to the class "integer64", however > the column type (typeof) is 'integer' and not 'double', which is the > underlying type used by bit64. This mismatch causes R data.table to error > out > ([https://github.com/Rdatatable/data.table/blob/master/src/rbindlist.c#L325)] > I do not have any issue with integer64 columns which have values > 2^32, and > suspiciously I am also unable to recreate the issue by manually creating a > data.table with an int64 column with small values (e.g > data.table(col=as.integer64(c(1,2,3))) ) > I did look thru the arrow::r cpp source and couldnt find an obvious case > where the underlying storage array would be an integer but also have the > 'integer64' class attr assigned... A fix would either be to remove the > integer64 class attr, or ensure that the underlying data store is a REALSXP > instead of INTEGERSXP > My company's network policies wont let me upload the sample dataset, hoping > to see if this triggers an immediate thoughts. If not, I can try to figure > our how to upload the dataset or otherwise provide details from it as > requested. > > {code:java} > > arrow::write_feather(df[,list(testCol)][1], '~/test.feather') > > test = arrow::read_feather('~/test.feather') > > class(test$testCol) > [1] "integer64" "np.ulong" > > typeof(test$testCol) > [1] "integer" > > str(test) > Classes ‘tbl_df’, ‘tbl’ and 'data.frame': 1 obs. of 1 variable: $ > testCol:Error in as.character.integer64(object) : REAL() can only be applied > to a 'numeric', not a 'integer' > #In the larger original dataset, it handles most columns properly, only the > 'testCol' breaks things. Note the difference: > > typeof(df$goodCol) > [1] "double" > > class(df$goodCol) > [1] "integer64" "np.ulong" > > typeof(df$testCol) > [1] "integer" > > class(df$testCol) > [1] "integer64" "np.ulong" > > str(df) > Classes ‘data.table’ and 'data.frame': 214781 obs. of 17 variables: > $ goodCol :integer64 1599777000000604025 ... > $ testCol :Error in as.character.integer64(object) : > > sessionInfo() > R version 3.6.1 (2019-07-05)Platform: x86_64-pc-linux-gnu (64-bit)Running > under: Red Hat Enterprise Linux Server 7.7 (Maipo) > Matrix products: defaultBLAS: /usr/lib64/libblas.so.3.4.2LAPACK: > /usr/lib64/liblapack.so.3.4.2locale: > [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_US.UTF-8 > LC_COLLATE=en_US.UTF-8 [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 > [7] LC_PAPER=en_US.UTF-8 LC_NAME=C [9] LC_ADDRESS=C > LC_TELEPHONE=C[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C > attached base packages:[1] stats graphics grDevices utils datasets > methods baseother attached packages:[1] data.table_1.13.0 bit64_4.0.5 > bit_4.0.4loaded via a namespace (and not attached): [1] Rcpp_1.0.5 > lattice_0.20-41 arrow_1.0.1 [4] assertthat_0.2.1 rappdirs_0.3.1 > grid_3.6.1 [7] R6_2.4.1 jsonlite_1.7.1 magrittr_1.5[10] > rlang_0.4.7 Matrix_1.2-18 vctrs_0.3.4[13] > reticulate_1.14-9001 tools_3.6.1 glue_1.4.2[16] purrr_0.3.4 > compiler_3.6.1 tidyselect_1.1.0{code} -- This message was sent by Atlassian Jira (v8.3.4#803005)