Hello!

I am pretty confused by the schema factory function in R, because I think
what I'm doing should work, but it doesn't seem to. I have inlined the code
below, but if there's an alternate way to setting the data types of a
schema in R, then I would welcome recommendations for those as well.

Anyways, the brief overview is that I want to create tables from matrices
that will have anywhere from hundreds of columns to thousands, and
specifying the schema inline is not going to be useful. I figure I should
be able to create a named list and then pass it to the schema factory
function, but I always get an error when trying to do so ("Error:
!is.null(nms <- names(.list)) is not TRUE").

I could update to arrow 5.0.0, but I assume that my problem shouldn't be a
problem in arrow 4.0.1.

Thanks for any help!

Working code:

Create an example data frame:
sample_df <- data.frame(
     SRR12=c(0)
    ,SRR20=c(0)
    ,SRR24=c(4)
    ,SRR27=c(223)
    ,row.names=c('ENSG3')
)

sample_df

>       SRR12 SRR20 SRR24   SRR27
> ENSG3     0     0     4     223


Create an arrow table, specify the schema inline:
sample_table <- Table$create(
     sample_df
    ,schema=schema(
          SRR12=uint16()
         ,SRR20=uint16()
         ,SRR24=uint16()
         ,SRR27=uint16()
     )
)

sample_table

> Table
> 1 rows x 4 columns
> $SRR12 <uint16>
> $SRR20 <uint16>
> $SRR24 <uint16>
> $SRR27 <uint16>
>

Create a schema from a list, because we want > 1000 columns sometimes:
schema_fields <- list(SRR12=uint16(), SRR20=uint16(), SRR24=uint16(),
SRR27=uint16())
sample_schema <- schema(schema_fields)

> Error: !is.null(nms <- names(.list)) is not TRUE
>

schema_fields

> $SRR12
> UInt16
> uint16
>
> $SRR20
> UInt16
> uint16
>
> $SRR24
> UInt16
> uint16
>
> $SRR27
> UInt16
> uint16



Package information (system is macbook M1):
> brew info apache-arrow

apache-arrow: stable 5.0.0 (bottled), HEAD
Columnar in-memory analytics layer designed to accelerate big data
https://arrow.apache.org/
/opt/homebrew/Cellar/apache-arrow/4.0.1_2 (534 files, 92.9MB) *
  Poured from bottle on 2021-07-07 at 16:10:51
From:
https://github.com/Homebrew/homebrew-core/blob/HEAD/Formula/apache-arrow.rb
License: Apache-2.0
==> Dependencies
Build: boost ✔, cmake ✘, llvm ✘
Required: brotli ✔, glog ✔, grpc ✘, lz4 ✔, numpy ✘, [email protected] ✔, protobuf
✔, [email protected] ✔, rapidjson ✔, re2 ✘, snappy ✔, thrift ✔, utf8proc ✔, zstd ✔
==> Options
--HEAD
        Install HEAD version
==> Analytics
install: 1,715 (30 days), 5,687 (90 days), 18,191 (365 days)
install-on-request: 994 (30 days), 3,232 (90 days), 10,314 (365 days)
build-error: 0 (30 days)


> arrow::arrow_info()

Arrow package version: 4.0.1

Capabilities:

dataset    TRUE
parquet    TRUE
s3        FALSE
utf8proc   TRUE
re2        TRUE
snappy     TRUE
gzip       TRUE
brotli     TRUE
zstd       TRUE
lz4        TRUE
lz4_frame  TRUE
lzo       FALSE
bz2        TRUE
jemalloc   TRUE
mimalloc  FALSE

Memory:

Allocator  jemalloc
Current   256 bytes
Max         2.31 Kb

Runtime:

SIMD Level          none
Detected SIMD Level none



Aldrin Montana
Computer Science PhD Student
UC Santa Cruz

Reply via email to