Re: R - how to create a schema with many columns?

Aldrin Tue, 17 Aug 2021 17:36:36 -0700

Wow, that works! I really appreciate the help!

🎉🎉🎉


Aldrin Montana
Computer Science PhD Student
UC Santa Cruz


On Tue, Aug 17, 2021 at 3:17 PM Ian Cook <[email protected]> wrote:

> Hi Aldrin,
>
> Please try this:
>
> sample_schema <- schema(!!!schema_fields)
>
> The schema() function now uses rlang functions to evaluate its arguments,
> so variable names need to be unquoted and spliced with !!!
>
> Ian
>
>
> On Tue, Aug 17, 2021 at 5:22 PM Aldrin <[email protected]> wrote:
>
>> Hello!
>>
>> I am pretty confused by the schema factory function in R, because I think
>> what I'm doing should work, but it doesn't seem to. I have inlined the code
>> below, but if there's an alternate way to setting the data types of a
>> schema in R, then I would welcome recommendations for those as well.
>>
>> Anyways, the brief overview is that I want to create tables from matrices
>> that will have anywhere from hundreds of columns to thousands, and
>> specifying the schema inline is not going to be useful. I figure I should
>> be able to create a named list and then pass it to the schema factory
>> function, but I always get an error when trying to do so ("Error:
>> !is.null(nms <- names(.list)) is not TRUE").
>>
>> I could update to arrow 5.0.0, but I assume that my problem shouldn't be
>> a problem in arrow 4.0.1.
>>
>> Thanks for any help!
>>
>> Working code:
>>
>> Create an example data frame:
>> sample_df <- data.frame(
>>      SRR12=c(0)
>>     ,SRR20=c(0)
>>     ,SRR24=c(4)
>>     ,SRR27=c(223)
>>     ,row.names=c('ENSG3')
>> )
>>
>> sample_df
>>
>>>       SRR12 SRR20 SRR24   SRR27
>>> ENSG3     0     0     4     223
>>
>>
>> Create an arrow table, specify the schema inline:
>> sample_table <- Table$create(
>>      sample_df
>>     ,schema=schema(
>>           SRR12=uint16()
>>          ,SRR20=uint16()
>>          ,SRR24=uint16()
>>          ,SRR27=uint16()
>>      )
>> )
>>
>> sample_table
>>
>>> Table
>>> 1 rows x 4 columns
>>> $SRR12 <uint16>
>>> $SRR20 <uint16>
>>> $SRR24 <uint16>
>>> $SRR27 <uint16>
>>>
>>
>> Create a schema from a list, because we want > 1000 columns sometimes:
>> schema_fields <- list(SRR12=uint16(), SRR20=uint16(), SRR24=uint16(),
>> SRR27=uint16())
>> sample_schema <- schema(schema_fields)
>>
>>> Error: !is.null(nms <- names(.list)) is not TRUE
>>>
>>
>> schema_fields
>>
>>> $SRR12
>>> UInt16
>>> uint16
>>>
>>> $SRR20
>>> UInt16
>>> uint16
>>>
>>> $SRR24
>>> UInt16
>>> uint16
>>>
>>> $SRR27
>>> UInt16
>>> uint16
>>
>>
>>
>> Package information (system is macbook M1):
>> > brew info apache-arrow
>>
>> apache-arrow: stable 5.0.0 (bottled), HEAD
>> Columnar in-memory analytics layer designed to accelerate big data
>> https://arrow.apache.org/
>> /opt/homebrew/Cellar/apache-arrow/4.0.1_2 (534 files, 92.9MB) *
>>   Poured from bottle on 2021-07-07 at 16:10:51
>> From:
>> https://github.com/Homebrew/homebrew-core/blob/HEAD/Formula/apache-arrow.rb
>> License: Apache-2.0
>> ==> Dependencies
>> Build: boost ✔, cmake ✘, llvm ✘
>> Required: brotli ✔, glog ✔, grpc ✘, lz4 ✔, numpy ✘, [email protected] ✔,
>> protobuf ✔, [email protected] ✔, rapidjson ✔, re2 ✘, snappy ✔, thrift ✔,
>> utf8proc ✔, zstd ✔
>> ==> Options
>> --HEAD
>>         Install HEAD version
>> ==> Analytics
>> install: 1,715 (30 days), 5,687 (90 days), 18,191 (365 days)
>> install-on-request: 994 (30 days), 3,232 (90 days), 10,314 (365 days)
>> build-error: 0 (30 days)
>>
>>
>> > arrow::arrow_info()
>>
>> Arrow package version: 4.0.1
>>
>> Capabilities:
>>
>> dataset    TRUE
>> parquet    TRUE
>> s3        FALSE
>> utf8proc   TRUE
>> re2        TRUE
>> snappy     TRUE
>> gzip       TRUE
>> brotli     TRUE
>> zstd       TRUE
>> lz4        TRUE
>> lz4_frame  TRUE
>> lzo       FALSE
>> bz2        TRUE
>> jemalloc   TRUE
>> mimalloc  FALSE
>>
>> Memory:
>>
>> Allocator  jemalloc
>> Current   256 bytes
>> Max         2.31 Kb
>>
>> Runtime:
>>
>> SIMD Level          none
>> Detected SIMD Level none
>>
>>
>>
>> Aldrin Montana
>> Computer Science PhD Student
>> UC Santa Cruz
>>
>

Re: R - how to create a schema with many columns?

Reply via email to