Thanks for the explanation. I'll keep my eye out for a new binary soon. On Mon, Nov 1, 2021 at 1:23 PM Neal Richardson <[email protected]> wrote:
> Thanks for the details. I see you're using RStudio Package Manager. There > was an issue with the binaries that RSPM built for 6.0.0.2, we've been > discussing with them and they should be fixing it on their side, so this > should resolve itself soon (if it isn't already resolved). > > Neal > > > On Mon, Nov 1, 2021 at 1:36 PM Chris Berthiaume <[email protected]> wrote: > >> Hi Neal, >> >> Here's a reproducible example using a fresh Docker container for >> bioconductor/bioconductor_docker:RELEASE_3_13. I start the container, start >> R, install arrow, attach arrow, then try to read a simple parquet file I >> just now created separately in Rstudio on MacOS with arrow 5.0.0. This >> fails. I stop/start R again, install arrow 5.0.0.2 with >> devtools::install_version(), attach, then verify that I can successfully >> read the same parquet file. >> >> I've changed the R prompt character below from ">" to "$" to prevent any >> text from being interpreted as an email reply. >> >> # Creating the parquet file in Rstudio in MacOS >> $ x <- data.frame(A=seq(0, 2), B=seq(10,12)) >> $ x >> A B >> 1 0 10 >> 2 1 11 >> 3 2 12 >> $ arrow::write_parquet(x, "~/Desktop/arrowtest/x.parquet") >> >> # Run the test in a docker container >> docker run -it --rm -v ~/Desktop/arrowtest:/data >> bioconductor/bioconductor_docker:RELEASE_3_13 bash >> root@5fa84c3f4a41:/# cd /data >> root@5fa84c3f4a41:/data# R >> >> R version 4.1.1 (2021-08-10) -- "Kick Things" >> Copyright (C) 2021 The R Foundation for Statistical Computing >> Platform: x86_64-pc-linux-gnu (64-bit) >> >> R is free software and comes with ABSOLUTELY NO WARRANTY. >> You are welcome to redistribute it under certain conditions. >> Type 'license()' or 'licence()' for distribution details. >> >> R is a collaborative project with many contributors. >> Type 'contributors()' for more information and >> 'citation()' on how to cite R or R packages in publications. >> >> Type 'demo()' for some demos, 'help()' for on-line help, or >> 'help.start()' for an HTML browser interface to help. >> Type 'q()' to quit R. >> >> $ install.packages('arrow') >> Installing package into ‘/usr/local/lib/R/site-library’ >> (as ‘lib’ is unspecified) >> also installing the dependencies ‘bit’, ‘assertthat’, ‘bit64’ >> >> trying URL ' >> https://packagemanager.rstudio.com/all/__linux__/focal/latest/src/contrib/bit_4.0.4.tar.gz >> ' >> Content type 'binary/octet-stream' length 691644 bytes (675 KB) >> ================================================== >> downloaded 675 KB >> >> trying URL ' >> https://packagemanager.rstudio.com/all/__linux__/focal/latest/src/contrib/assertthat_0.2.1.tar.gz >> ' >> Content type 'binary/octet-stream' length 52329 bytes (51 KB) >> ================================================== >> downloaded 51 KB >> >> trying URL ' >> https://packagemanager.rstudio.com/all/__linux__/focal/latest/src/contrib/bit64_4.0.5.tar.gz >> ' >> Content type 'binary/octet-stream' length 573106 bytes (559 KB) >> ================================================== >> downloaded 559 KB >> >> trying URL ' >> https://packagemanager.rstudio.com/all/__linux__/focal/latest/src/contrib/arrow_6.0.0.2.tar.gz >> ' >> Content type 'binary/octet-stream' length 23646684 bytes (22.6 MB) >> ================================================== >> downloaded 22.6 MB >> >> * installing *binary* package ‘bit’ ... >> * DONE (bit) >> * installing *binary* package ‘assertthat’ ... >> * DONE (assertthat) >> * installing *binary* package ‘bit64’ ... >> * DONE (bit64) >> * installing *binary* package ‘arrow’ ... >> * DONE (arrow) >> >> The downloaded source packages are in >> ‘/tmp/Rtmp8HkDvX/downloaded_packages’ >> $ library(arrow) >> See arrow_info() for available features >> >> Attaching package: ‘arrow’ >> >> The following object is masked from ‘package:utils’: >> >> timestamp >> >> $ sessionInfo() >> R version 4.1.1 (2021-08-10) >> Platform: x86_64-pc-linux-gnu (64-bit) >> Running under: Ubuntu 20.04.3 LTS >> >> Matrix products: default >> BLAS/LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/ >> libopenblasp-r0.3.8.so >> >> locale: >> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C >> [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 >> [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=C >> [7] LC_PAPER=en_US.UTF-8 LC_NAME=C >> [9] LC_ADDRESS=C LC_TELEPHONE=C >> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C >> >> attached base packages: >> [1] stats graphics grDevices utils datasets methods base >> >> other attached packages: >> [1] arrow_6.0.0.2 >> >> loaded via a namespace (and not attached): >> [1] tidyselect_1.1.1 bit_4.0.4 compiler_4.1.1 magrittr_2.0.1 >> [5] assertthat_0.2.1 R6_2.5.1 tools_4.1.1 glue_1.4.2 >> [9] bit64_4.0.5 vctrs_0.3.8 rlang_0.4.11 purrr_0.3.4 >> $ read_parquet("x.parquet") >> Error: NotImplemented: Support for codec 'snappy' not built >> In order to read this file, you will need to reinstall arrow with >> additional features enabled. >> Set one of these environment variables before installing: >> >> * LIBARROW_MINIMAL=false (for all optional features, including 'snappy') >> * ARROW_WITH_SNAPPY=ON (for just 'snappy') >> >> See https://arrow.apache.org/docs/r/articles/install.html for details >> >> root@5fa84c3f4a41:/data# R >> >> R version 4.1.1 (2021-08-10) -- "Kick Things" >> Copyright (C) 2021 The R Foundation for Statistical Computing >> Platform: x86_64-pc-linux-gnu (64-bit) >> >> R is free software and comes with ABSOLUTELY NO WARRANTY. >> You are welcome to redistribute it under certain conditions. >> Type 'license()' or 'licence()' for distribution details. >> >> R is a collaborative project with many contributors. >> Type 'contributors()' for more information and >> 'citation()' on how to cite R or R packages in publications. >> >> Type 'demo()' for some demos, 'help()' for on-line help, or >> 'help.start()' for an HTML browser interface to help. >> Type 'q()' to quit R. >> >> $ devtools::install_version("arrow", "5.0.0.2") >> Downloading package from url: >> https://packagemanager.rstudio.com/all/__linux__/focal/latest/src/contrib/Archive/arrow/arrow_5.0.0.2.tar.gz >> These packages have more recent versions available. >> It is recommended to update all of them. >> Which would you like to update? >> >> 1: All >> 2: CRAN packages only >> 3: None >> 4: rlang (0.4.11 -> 0.4.12) [CRAN] >> >> Enter one or more numbers, or an empty line to skip updates: >> Installing package into ‘/usr/local/lib/R/site-library’ >> (as ‘lib’ is unspecified) >> * installing *binary* package ‘arrow’ ... >> * DONE (arrow) >> $ library(arrow) >> >> Attaching package: ‘arrow’ >> >> The following object is masked from ‘package:utils’: >> >> timestamp >> >> $ sessionInfo() >> R version 4.1.1 (2021-08-10) >> Platform: x86_64-pc-linux-gnu (64-bit) >> Running under: Ubuntu 20.04.3 LTS >> >> Matrix products: default >> BLAS/LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/ >> libopenblasp-r0.3.8.so >> >> locale: >> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C >> [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 >> [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=C >> [7] LC_PAPER=en_US.UTF-8 LC_NAME=C >> [9] LC_ADDRESS=C LC_TELEPHONE=C >> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C >> >> attached base packages: >> [1] stats graphics grDevices utils datasets methods base >> >> other attached packages: >> [1] arrow_5.0.0.2 >> >> loaded via a namespace (and not attached): >> [1] magrittr_2.0.1 usethis_2.0.1 devtools_2.4.2 >> tidyselect_1.1.1 >> [5] bit_4.0.4 pkgload_1.2.2 R6_2.5.1 rlang_0.4.11 >> >> [9] fastmap_1.1.0 tools_4.1.1 pkgbuild_1.2.0 >> sessioninfo_1.1.1 >> [13] cli_3.0.1 withr_2.4.2 ellipsis_0.3.2 remotes_2.4.0 >> >> [17] bit64_4.0.5 rprojroot_2.0.2 assertthat_0.2.1 >> lifecycle_1.0.1 >> [21] crayon_1.4.1 processx_3.5.2 purrr_0.3.4 callr_3.7.0 >> >> [25] vctrs_0.3.8 fs_1.5.0 ps_1.6.0 testthat_3.0.4 >> >> [29] memoise_2.0.0 glue_1.4.2 cachem_1.0.6 compiler_4.1.1 >> >> [33] desc_1.3.0 prettyunits_1.1.1 >> $ read_parquet("x.parquet") >> A B >> 1 0 10 >> 2 1 11 >> 3 2 12 >> >> On Mon, Nov 1, 2021 at 7:05 AM Neal Richardson < >> [email protected]> wrote: >> >>> Hi Chris, >>> Could you share the output from when you installed the package? Snappy >>> and the other compression libraries should be on in the binaries (see >>> https://github.com/ursa-labs/arrow-r-nightly/runs/4052316735?check_suite_focus=true#step:4:625 >>> for example), so I'm curious if there's anything in the install logs that >>> help us understand what's up. >>> >>> Neal >>> >>> On Sun, Oct 31, 2021 at 7:06 PM Chris Berthiaume <[email protected]> >>> wrote: >>> >>>> Hello, >>>> >>>> After upgrading Arrow 5.0.0.2 to 6.0.0.2 in a Bioconductor 3.13 Docker >>>> container, I started to see some new errors when reading Parquet files that >>>> use snappy compression. I'm using the prebuilt Linux binary by setting >>>> LIBARROW_BINARY=true during installation. Building arrow using the latest >>>> nightly source fixes the issue. Is it possible the 6.0.0.2 prebuilt Linux >>>> binary does not have snappy compression support enabled? The error is >>>> copied below. >>>> >>>> Error: NotImplemented: Support for codec 'snappy' not built >>>> In order to read this file, you will need to reinstall arrow with >>>> additional features enabled. >>>> Set one of these environment variables before installing: >>>> >>>> * LIBARROW_MINIMAL=false (for all optional features, including >>>> 'snappy') >>>> * ARROW_WITH_SNAPPY=ON (for just 'snappy') >>>> >>>> See https://arrow.apache.org/docs/r/articles/install.html for details >>>> Backtrace: >>>> 1. popcycle::get.vct.by.file(db, vct_dir, >>>> "2018_176/2018-06-25T20-03-48+00-00") test_files.R:210:2 >>>> 4. arrow::read_parquet(...) >>>> 5. base::tryCatch(reader$ReadTable(), error = read_compressed_error) >>>> 6. base:::tryCatchList(expr, classes, parentenv, handlers) >>>> 7. base:::tryCatchOne(expr, names, parentenv, handlers[[1L]]) >>>> 8. value[[3L]](cond) >>>> >>>> Thanks, >>>> Chris Berthiaume >>>> >>>
