Re: R - how to create a schema with many columns?

Ian Cook Tue, 17 Aug 2021 15:17:12 -0700

Hi Aldrin,

Please try this:


sample_schema <- schema(!!!schema_fields)

The schema() function now uses rlang functions to evaluate its arguments,
so variable names need to be unquoted and spliced with !!!

Ian


On Tue, Aug 17, 2021 at 5:22 PM Aldrin <[email protected]> wrote:

> Hello!
>
> I am pretty confused by the schema factory function in R, because I think
> what I'm doing should work, but it doesn't seem to. I have inlined the code
> below, but if there's an alternate way to setting the data types of a
> schema in R, then I would welcome recommendations for those as well.
>
> Anyways, the brief overview is that I want to create tables from matrices
> that will have anywhere from hundreds of columns to thousands, and
> specifying the schema inline is not going to be useful. I figure I should
> be able to create a named list and then pass it to the schema factory
> function, but I always get an error when trying to do so ("Error:
> !is.null(nms <- names(.list)) is not TRUE").
>
> I could update to arrow 5.0.0, but I assume that my problem shouldn't be a
> problem in arrow 4.0.1.
>
> Thanks for any help!
>
> Working code:
>
> Create an example data frame:
> sample_df <- data.frame(
>      SRR12=c(0)
>     ,SRR20=c(0)
>     ,SRR24=c(4)
>     ,SRR27=c(223)
>     ,row.names=c('ENSG3')
> )
>
> sample_df
>
>>       SRR12 SRR20 SRR24   SRR27
>> ENSG3     0     0     4     223
>
>
> Create an arrow table, specify the schema inline:
> sample_table <- Table$create(
>      sample_df
>     ,schema=schema(
>           SRR12=uint16()
>          ,SRR20=uint16()
>          ,SRR24=uint16()
>          ,SRR27=uint16()
>      )
> )
>
> sample_table
>
>> Table
>> 1 rows x 4 columns
>> $SRR12 <uint16>
>> $SRR20 <uint16>
>> $SRR24 <uint16>
>> $SRR27 <uint16>
>>
>
> Create a schema from a list, because we want > 1000 columns sometimes:
> schema_fields <- list(SRR12=uint16(), SRR20=uint16(), SRR24=uint16(),
> SRR27=uint16())
> sample_schema <- schema(schema_fields)
>
>> Error: !is.null(nms <- names(.list)) is not TRUE
>>
>
> schema_fields
>
>> $SRR12
>> UInt16
>> uint16
>>
>> $SRR20
>> UInt16
>> uint16
>>
>> $SRR24
>> UInt16
>> uint16
>>
>> $SRR27
>> UInt16
>> uint16
>
>
>
> Package information (system is macbook M1):
> > brew info apache-arrow
>
> apache-arrow: stable 5.0.0 (bottled), HEAD
> Columnar in-memory analytics layer designed to accelerate big data
> https://arrow.apache.org/
> /opt/homebrew/Cellar/apache-arrow/4.0.1_2 (534 files, 92.9MB) *
>   Poured from bottle on 2021-07-07 at 16:10:51
> From:
> https://github.com/Homebrew/homebrew-core/blob/HEAD/Formula/apache-arrow.rb
> License: Apache-2.0
> ==> Dependencies
> Build: boost ✔, cmake ✘, llvm ✘
> Required: brotli ✔, glog ✔, grpc ✘, lz4 ✔, numpy ✘, [email protected] ✔,
> protobuf ✔, [email protected] ✔, rapidjson ✔, re2 ✘, snappy ✔, thrift ✔,
> utf8proc ✔, zstd ✔
> ==> Options
> --HEAD
>         Install HEAD version
> ==> Analytics
> install: 1,715 (30 days), 5,687 (90 days), 18,191 (365 days)
> install-on-request: 994 (30 days), 3,232 (90 days), 10,314 (365 days)
> build-error: 0 (30 days)
>
>
> > arrow::arrow_info()
>
> Arrow package version: 4.0.1
>
> Capabilities:
>
> dataset    TRUE
> parquet    TRUE
> s3        FALSE
> utf8proc   TRUE
> re2        TRUE
> snappy     TRUE
> gzip       TRUE
> brotli     TRUE
> zstd       TRUE
> lz4        TRUE
> lz4_frame  TRUE
> lzo       FALSE
> bz2        TRUE
> jemalloc   TRUE
> mimalloc  FALSE
>
> Memory:
>
> Allocator  jemalloc
> Current   256 bytes
> Max         2.31 Kb
>
> Runtime:
>
> SIMD Level          none
> Detected SIMD Level none
>
>
>
> Aldrin Montana
> Computer Science PhD Student
> UC Santa Cruz
>

Re: R - how to create a schema with many columns?

Reply via email to