Hi Neal,

Here's a reproducible example using a fresh Docker container for
bioconductor/bioconductor_docker:RELEASE_3_13. I start the container, start
R, install arrow, attach arrow, then try to read a simple parquet file I
just now created separately in Rstudio on MacOS with arrow 5.0.0. This
fails. I stop/start R again, install arrow 5.0.0.2 with
devtools::install_version(), attach, then verify that I can successfully
read the same parquet file.

I've changed the R prompt character below from ">" to "$" to prevent any
text from being interpreted as an email reply.

# Creating the parquet file in Rstudio in MacOS
$ x <- data.frame(A=seq(0, 2), B=seq(10,12))
$ x
  A  B
1 0 10
2 1 11
3 2 12
$ arrow::write_parquet(x, "~/Desktop/arrowtest/x.parquet")

# Run the test in a docker container
docker run -it --rm -v ~/Desktop/arrowtest:/data
bioconductor/bioconductor_docker:RELEASE_3_13 bash
root@5fa84c3f4a41:/# cd /data
root@5fa84c3f4a41:/data# R

R version 4.1.1 (2021-08-10) -- "Kick Things"
Copyright (C) 2021 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

$ install.packages('arrow')
Installing package into ‘/usr/local/lib/R/site-library’
(as ‘lib’ is unspecified)
also installing the dependencies ‘bit’, ‘assertthat’, ‘bit64’

trying URL '
https://packagemanager.rstudio.com/all/__linux__/focal/latest/src/contrib/bit_4.0.4.tar.gz
'
Content type 'binary/octet-stream' length 691644 bytes (675 KB)
==================================================
downloaded 675 KB

trying URL '
https://packagemanager.rstudio.com/all/__linux__/focal/latest/src/contrib/assertthat_0.2.1.tar.gz
'
Content type 'binary/octet-stream' length 52329 bytes (51 KB)
==================================================
downloaded 51 KB

trying URL '
https://packagemanager.rstudio.com/all/__linux__/focal/latest/src/contrib/bit64_4.0.5.tar.gz
'
Content type 'binary/octet-stream' length 573106 bytes (559 KB)
==================================================
downloaded 559 KB

trying URL '
https://packagemanager.rstudio.com/all/__linux__/focal/latest/src/contrib/arrow_6.0.0.2.tar.gz
'
Content type 'binary/octet-stream' length 23646684 bytes (22.6 MB)
==================================================
downloaded 22.6 MB

* installing *binary* package ‘bit’ ...
* DONE (bit)
* installing *binary* package ‘assertthat’ ...
* DONE (assertthat)
* installing *binary* package ‘bit64’ ...
* DONE (bit64)
* installing *binary* package ‘arrow’ ...
* DONE (arrow)

The downloaded source packages are in
‘/tmp/Rtmp8HkDvX/downloaded_packages’
$ library(arrow)
See arrow_info() for available features

Attaching package: ‘arrow’

The following object is masked from ‘package:utils’:

    timestamp

$ sessionInfo()
R version 4.1.1 (2021-08-10)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 20.04.3 LTS

Matrix products: default
BLAS/LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/
libopenblasp-r0.3.8.so

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=C
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C
 [9] LC_ADDRESS=C               LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base

other attached packages:
[1] arrow_6.0.0.2

loaded via a namespace (and not attached):
 [1] tidyselect_1.1.1 bit_4.0.4        compiler_4.1.1   magrittr_2.0.1
 [5] assertthat_0.2.1 R6_2.5.1         tools_4.1.1      glue_1.4.2
 [9] bit64_4.0.5      vctrs_0.3.8      rlang_0.4.11     purrr_0.3.4
$ read_parquet("x.parquet")
Error: NotImplemented: Support for codec 'snappy' not built
In order to read this file, you will need to reinstall arrow with
additional features enabled.
Set one of these environment variables before installing:

 * LIBARROW_MINIMAL=false (for all optional features, including 'snappy')
 * ARROW_WITH_SNAPPY=ON (for just 'snappy')

See https://arrow.apache.org/docs/r/articles/install.html for details

root@5fa84c3f4a41:/data# R

R version 4.1.1 (2021-08-10) -- "Kick Things"
Copyright (C) 2021 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

$ devtools::install_version("arrow", "5.0.0.2")
Downloading package from url:
https://packagemanager.rstudio.com/all/__linux__/focal/latest/src/contrib/Archive/arrow/arrow_5.0.0.2.tar.gz
These packages have more recent versions available.
It is recommended to update all of them.
Which would you like to update?

1: All
2: CRAN packages only
3: None
4: rlang (0.4.11 -> 0.4.12) [CRAN]

Enter one or more numbers, or an empty line to skip updates:
Installing package into ‘/usr/local/lib/R/site-library’
(as ‘lib’ is unspecified)
* installing *binary* package ‘arrow’ ...
* DONE (arrow)
$ library(arrow)

Attaching package: ‘arrow’

The following object is masked from ‘package:utils’:

    timestamp

$ sessionInfo()
R version 4.1.1 (2021-08-10)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 20.04.3 LTS

Matrix products: default
BLAS/LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/
libopenblasp-r0.3.8.so

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=C
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C
 [9] LC_ADDRESS=C               LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base

other attached packages:
[1] arrow_5.0.0.2

loaded via a namespace (and not attached):
 [1] magrittr_2.0.1    usethis_2.0.1     devtools_2.4.2    tidyselect_1.1.1
 [5] bit_4.0.4         pkgload_1.2.2     R6_2.5.1          rlang_0.4.11
 [9] fastmap_1.1.0     tools_4.1.1       pkgbuild_1.2.0    sessioninfo_1.1.1
[13] cli_3.0.1         withr_2.4.2       ellipsis_0.3.2    remotes_2.4.0
[17] bit64_4.0.5       rprojroot_2.0.2   assertthat_0.2.1  lifecycle_1.0.1
[21] crayon_1.4.1      processx_3.5.2    purrr_0.3.4       callr_3.7.0
[25] vctrs_0.3.8       fs_1.5.0          ps_1.6.0          testthat_3.0.4
[29] memoise_2.0.0     glue_1.4.2        cachem_1.0.6      compiler_4.1.1
[33] desc_1.3.0        prettyunits_1.1.1
$ read_parquet("x.parquet")
  A  B
1 0 10
2 1 11
3 2 12

On Mon, Nov 1, 2021 at 7:05 AM Neal Richardson <[email protected]>
wrote:

> Hi Chris,
> Could you share the output from when you installed the package? Snappy and
> the other compression libraries should be on in the binaries (see
> https://github.com/ursa-labs/arrow-r-nightly/runs/4052316735?check_suite_focus=true#step:4:625
> for example), so I'm curious if there's anything in the install logs that
> help us understand what's up.
>
> Neal
>
> On Sun, Oct 31, 2021 at 7:06 PM Chris Berthiaume <[email protected]> wrote:
>
>> Hello,
>>
>> After upgrading Arrow 5.0.0.2 to 6.0.0.2 in a Bioconductor 3.13 Docker
>> container, I started to see some new errors when reading Parquet files that
>> use snappy compression. I'm using the prebuilt Linux binary by setting
>> LIBARROW_BINARY=true during installation. Building arrow using the latest
>> nightly source fixes the issue. Is it possible the 6.0.0.2 prebuilt Linux
>> binary does not have snappy compression support enabled? The error is
>> copied below.
>>
>> Error: NotImplemented: Support for codec 'snappy' not built
>> In order to read this file, you will need to reinstall arrow with
>> additional features enabled.
>> Set one of these environment variables before installing:
>>
>>  * LIBARROW_MINIMAL=false (for all optional features, including 'snappy')
>>  * ARROW_WITH_SNAPPY=ON (for just 'snappy')
>>
>> See https://arrow.apache.org/docs/r/articles/install.html for details
>> Backtrace:
>>  1. popcycle::get.vct.by.file(db, vct_dir,
>> "2018_176/2018-06-25T20-03-48+00-00") test_files.R:210:2
>>  4. arrow::read_parquet(...)
>>  5. base::tryCatch(reader$ReadTable(), error = read_compressed_error)
>>  6. base:::tryCatchList(expr, classes, parentenv, handlers)
>>  7. base:::tryCatchOne(expr, names, parentenv, handlers[[1L]])
>>  8. value[[3L]](cond)
>>
>> Thanks,
>> Chris Berthiaume
>>
>

Reply via email to