Hi Neal,
Here's a reproducible example using a fresh Docker container for
bioconductor/bioconductor_docker:RELEASE_3_13. I start the container, start
R, install arrow, attach arrow, then try to read a simple parquet file I
just now created separately in Rstudio on MacOS with arrow 5.0.0. This
fails. I stop/start R again, install arrow 5.0.0.2 with
devtools::install_version(), attach, then verify that I can successfully
read the same parquet file.
I've changed the R prompt character below from ">" to "$" to prevent any
text from being interpreted as an email reply.
# Creating the parquet file in Rstudio in MacOS
$ x <- data.frame(A=seq(0, 2), B=seq(10,12))
$ x
A B
1 0 10
2 1 11
3 2 12
$ arrow::write_parquet(x, "~/Desktop/arrowtest/x.parquet")
# Run the test in a docker container
docker run -it --rm -v ~/Desktop/arrowtest:/data
bioconductor/bioconductor_docker:RELEASE_3_13 bash
root@5fa84c3f4a41:/# cd /data
root@5fa84c3f4a41:/data# R
R version 4.1.1 (2021-08-10) -- "Kick Things"
Copyright (C) 2021 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)
R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.
R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.
Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.
$ install.packages('arrow')
Installing package into ‘/usr/local/lib/R/site-library’
(as ‘lib’ is unspecified)
also installing the dependencies ‘bit’, ‘assertthat’, ‘bit64’
trying URL '
https://packagemanager.rstudio.com/all/__linux__/focal/latest/src/contrib/bit_4.0.4.tar.gz
'
Content type 'binary/octet-stream' length 691644 bytes (675 KB)
==================================================
downloaded 675 KB
trying URL '
https://packagemanager.rstudio.com/all/__linux__/focal/latest/src/contrib/assertthat_0.2.1.tar.gz
'
Content type 'binary/octet-stream' length 52329 bytes (51 KB)
==================================================
downloaded 51 KB
trying URL '
https://packagemanager.rstudio.com/all/__linux__/focal/latest/src/contrib/bit64_4.0.5.tar.gz
'
Content type 'binary/octet-stream' length 573106 bytes (559 KB)
==================================================
downloaded 559 KB
trying URL '
https://packagemanager.rstudio.com/all/__linux__/focal/latest/src/contrib/arrow_6.0.0.2.tar.gz
'
Content type 'binary/octet-stream' length 23646684 bytes (22.6 MB)
==================================================
downloaded 22.6 MB
* installing *binary* package ‘bit’ ...
* DONE (bit)
* installing *binary* package ‘assertthat’ ...
* DONE (assertthat)
* installing *binary* package ‘bit64’ ...
* DONE (bit64)
* installing *binary* package ‘arrow’ ...
* DONE (arrow)
The downloaded source packages are in
‘/tmp/Rtmp8HkDvX/downloaded_packages’
$ library(arrow)
See arrow_info() for available features
Attaching package: ‘arrow’
The following object is masked from ‘package:utils’:
timestamp
$ sessionInfo()
R version 4.1.1 (2021-08-10)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 20.04.3 LTS
Matrix products: default
BLAS/LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/
libopenblasp-r0.3.8.so
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=C
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] arrow_6.0.0.2
loaded via a namespace (and not attached):
[1] tidyselect_1.1.1 bit_4.0.4 compiler_4.1.1 magrittr_2.0.1
[5] assertthat_0.2.1 R6_2.5.1 tools_4.1.1 glue_1.4.2
[9] bit64_4.0.5 vctrs_0.3.8 rlang_0.4.11 purrr_0.3.4
$ read_parquet("x.parquet")
Error: NotImplemented: Support for codec 'snappy' not built
In order to read this file, you will need to reinstall arrow with
additional features enabled.
Set one of these environment variables before installing:
* LIBARROW_MINIMAL=false (for all optional features, including 'snappy')
* ARROW_WITH_SNAPPY=ON (for just 'snappy')
See https://arrow.apache.org/docs/r/articles/install.html for details
root@5fa84c3f4a41:/data# R
R version 4.1.1 (2021-08-10) -- "Kick Things"
Copyright (C) 2021 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)
R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.
R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.
Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.
$ devtools::install_version("arrow", "5.0.0.2")
Downloading package from url:
https://packagemanager.rstudio.com/all/__linux__/focal/latest/src/contrib/Archive/arrow/arrow_5.0.0.2.tar.gz
These packages have more recent versions available.
It is recommended to update all of them.
Which would you like to update?
1: All
2: CRAN packages only
3: None
4: rlang (0.4.11 -> 0.4.12) [CRAN]
Enter one or more numbers, or an empty line to skip updates:
Installing package into ‘/usr/local/lib/R/site-library’
(as ‘lib’ is unspecified)
* installing *binary* package ‘arrow’ ...
* DONE (arrow)
$ library(arrow)
Attaching package: ‘arrow’
The following object is masked from ‘package:utils’:
timestamp
$ sessionInfo()
R version 4.1.1 (2021-08-10)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 20.04.3 LTS
Matrix products: default
BLAS/LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/
libopenblasp-r0.3.8.so
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=C
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] arrow_5.0.0.2
loaded via a namespace (and not attached):
[1] magrittr_2.0.1 usethis_2.0.1 devtools_2.4.2 tidyselect_1.1.1
[5] bit_4.0.4 pkgload_1.2.2 R6_2.5.1 rlang_0.4.11
[9] fastmap_1.1.0 tools_4.1.1 pkgbuild_1.2.0 sessioninfo_1.1.1
[13] cli_3.0.1 withr_2.4.2 ellipsis_0.3.2 remotes_2.4.0
[17] bit64_4.0.5 rprojroot_2.0.2 assertthat_0.2.1 lifecycle_1.0.1
[21] crayon_1.4.1 processx_3.5.2 purrr_0.3.4 callr_3.7.0
[25] vctrs_0.3.8 fs_1.5.0 ps_1.6.0 testthat_3.0.4
[29] memoise_2.0.0 glue_1.4.2 cachem_1.0.6 compiler_4.1.1
[33] desc_1.3.0 prettyunits_1.1.1
$ read_parquet("x.parquet")
A B
1 0 10
2 1 11
3 2 12
On Mon, Nov 1, 2021 at 7:05 AM Neal Richardson <[email protected]>
wrote:
> Hi Chris,
> Could you share the output from when you installed the package? Snappy and
> the other compression libraries should be on in the binaries (see
> https://github.com/ursa-labs/arrow-r-nightly/runs/4052316735?check_suite_focus=true#step:4:625
> for example), so I'm curious if there's anything in the install logs that
> help us understand what's up.
>
> Neal
>
> On Sun, Oct 31, 2021 at 7:06 PM Chris Berthiaume <[email protected]> wrote:
>
>> Hello,
>>
>> After upgrading Arrow 5.0.0.2 to 6.0.0.2 in a Bioconductor 3.13 Docker
>> container, I started to see some new errors when reading Parquet files that
>> use snappy compression. I'm using the prebuilt Linux binary by setting
>> LIBARROW_BINARY=true during installation. Building arrow using the latest
>> nightly source fixes the issue. Is it possible the 6.0.0.2 prebuilt Linux
>> binary does not have snappy compression support enabled? The error is
>> copied below.
>>
>> Error: NotImplemented: Support for codec 'snappy' not built
>> In order to read this file, you will need to reinstall arrow with
>> additional features enabled.
>> Set one of these environment variables before installing:
>>
>> * LIBARROW_MINIMAL=false (for all optional features, including 'snappy')
>> * ARROW_WITH_SNAPPY=ON (for just 'snappy')
>>
>> See https://arrow.apache.org/docs/r/articles/install.html for details
>> Backtrace:
>> 1. popcycle::get.vct.by.file(db, vct_dir,
>> "2018_176/2018-06-25T20-03-48+00-00") test_files.R:210:2
>> 4. arrow::read_parquet(...)
>> 5. base::tryCatch(reader$ReadTable(), error = read_compressed_error)
>> 6. base:::tryCatchList(expr, classes, parentenv, handlers)
>> 7. base:::tryCatchOne(expr, names, parentenv, handlers[[1L]])
>> 8. value[[3L]](cond)
>>
>> Thanks,
>> Chris Berthiaume
>>
>