Hello!
I am pretty confused by the schema factory function in R, because I think
what I'm doing should work, but it doesn't seem to. I have inlined the code
below, but if there's an alternate way to setting the data types of a
schema in R, then I would welcome recommendations for those as well.
Anyways, the brief overview is that I want to create tables from matrices
that will have anywhere from hundreds of columns to thousands, and
specifying the schema inline is not going to be useful. I figure I should
be able to create a named list and then pass it to the schema factory
function, but I always get an error when trying to do so ("Error:
!is.null(nms <- names(.list)) is not TRUE").
I could update to arrow 5.0.0, but I assume that my problem shouldn't be a
problem in arrow 4.0.1.
Thanks for any help!
Working code:
Create an example data frame:
sample_df <- data.frame(
SRR12=c(0)
,SRR20=c(0)
,SRR24=c(4)
,SRR27=c(223)
,row.names=c('ENSG3')
)
sample_df
> SRR12 SRR20 SRR24 SRR27
> ENSG3 0 0 4 223
Create an arrow table, specify the schema inline:
sample_table <- Table$create(
sample_df
,schema=schema(
SRR12=uint16()
,SRR20=uint16()
,SRR24=uint16()
,SRR27=uint16()
)
)
sample_table
> Table
> 1 rows x 4 columns
> $SRR12 <uint16>
> $SRR20 <uint16>
> $SRR24 <uint16>
> $SRR27 <uint16>
>
Create a schema from a list, because we want > 1000 columns sometimes:
schema_fields <- list(SRR12=uint16(), SRR20=uint16(), SRR24=uint16(),
SRR27=uint16())
sample_schema <- schema(schema_fields)
> Error: !is.null(nms <- names(.list)) is not TRUE
>
schema_fields
> $SRR12
> UInt16
> uint16
>
> $SRR20
> UInt16
> uint16
>
> $SRR24
> UInt16
> uint16
>
> $SRR27
> UInt16
> uint16
Package information (system is macbook M1):
> brew info apache-arrow
apache-arrow: stable 5.0.0 (bottled), HEAD
Columnar in-memory analytics layer designed to accelerate big data
https://arrow.apache.org/
/opt/homebrew/Cellar/apache-arrow/4.0.1_2 (534 files, 92.9MB) *
Poured from bottle on 2021-07-07 at 16:10:51
From:
https://github.com/Homebrew/homebrew-core/blob/HEAD/Formula/apache-arrow.rb
License: Apache-2.0
==> Dependencies
Build: boost ✔, cmake ✘, llvm ✘
Required: brotli ✔, glog ✔, grpc ✘, lz4 ✔, numpy ✘, [email protected] ✔, protobuf
✔, [email protected] ✔, rapidjson ✔, re2 ✘, snappy ✔, thrift ✔, utf8proc ✔, zstd ✔
==> Options
--HEAD
Install HEAD version
==> Analytics
install: 1,715 (30 days), 5,687 (90 days), 18,191 (365 days)
install-on-request: 994 (30 days), 3,232 (90 days), 10,314 (365 days)
build-error: 0 (30 days)
> arrow::arrow_info()
Arrow package version: 4.0.1
Capabilities:
dataset TRUE
parquet TRUE
s3 FALSE
utf8proc TRUE
re2 TRUE
snappy TRUE
gzip TRUE
brotli TRUE
zstd TRUE
lz4 TRUE
lz4_frame TRUE
lzo FALSE
bz2 TRUE
jemalloc TRUE
mimalloc FALSE
Memory:
Allocator jemalloc
Current 256 bytes
Max 2.31 Kb
Runtime:
SIMD Level none
Detected SIMD Level none
Aldrin Montana
Computer Science PhD Student
UC Santa Cruz