This is an automated email from the ASF dual-hosted git repository. kou pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/arrow.git
The following commit(s) were added to refs/heads/master by this push: new aab4946 ARROW-4506: [Ruby] Add Arrow::RecordBatch#raw_records aab4946 is described below commit aab4946ad283946be948119c9107471acd64333c Author: Kenta Murata <m...@mrkn.jp> AuthorDate: Sat Mar 16 17:58:27 2019 +0900 ARROW-4506: [Ruby] Add Arrow::RecordBatch#raw_records I want to add Arrow::RecordBatch#raw_records method to convert a record batch object to a nested array. This is the first step to implement the feature. The following things are out of scope of this pull-request. - Conversion of half-float values to Ruby's float. - Unit treatment of Time32 and Time64 - Conversion of the following compound data types to Ruby objects: - ListType - StructType - UnionType - DictionaryType ### TODO: - [x] Extracting raw values of HalfFloatArray - [x] Extracting ListArray - [x] Extracting StructArray - [x] Extracting SparseUnionArray - [x] Extracting DenseUnionArray - [x] FixedSizeBinary - [x] Date32 - [x] Date64 - [x] Timestamp - [x] Decimal128 - [x] Struct - [x] Dictionary - [x] Extracting indices of DictionaryArray - [x] Make CI passed - [x] Add benchmark script Author: Kenta Murata <m...@mrkn.jp> Author: Kouhei Sutou <k...@clear-code.com> Closes #3587 from mrkn/raw_records and squashes the following commits: 00197e4b <Kouhei Sutou> Split test files 0d0d5170 <Kouhei Sutou> Replace a large StructArray test with small tests 61f9774b <Kouhei Sutou> Use {"field_name" => value} for union value 5fa9ba03 <Kouhei Sutou> Finish replacing tests for StructArray 83bf55a3 <Kouhei Sutou> Add support for nested StructArray c065a8a2 <Kouhei Sutou> Add tests for StructArray#raw_records 795c96a8 <Kouhei Sutou> Add support for "_" for data type name 87ab55da <Kouhei Sutou> Add support for nil value as NULL for struct field value 08b85c81 <Kouhei Sutou> Add support for nested list 4d78da49 <Kouhei Sutou> Remove resolved TODO 024506f1 <Kouhei Sutou> Remove needless tests e30a9e73 <Kouhei Sutou> Add support for nil in ListArrayBuilder#append_values 55066950 <Kouhei Sutou> Add primitive array tests 67808a2b <Kouhei Sutou> Add support for building BinaryArray 44779252 <Kouhei Sutou> Add tests for primitive arrays cb3cb471 <Kouhei Sutou> Add support for NullArray 728b82c9 <Kouhei Sutou> Reduce scope 021dbee2 <Kouhei Sutou> Use .cpp for C++ 4d01f85b <Kouhei Sutou> Use constexpr 7f5d6cc3 <Kouhei Sutou> Fix style db1e1c25 <Kouhei Sutou> Remove needless reference 028e3c4a <Kouhei Sutou> Use auto ce67b5bb <Kouhei Sutou> Stop reusing block argument name 750afff5 <Kouhei Sutou> Use .cpp for C++ extension a2032754 <Kouhei Sutou> Fold a long line 46d63aeb <Kouhei Sutou> Use auto 03233bd5 <Kouhei Sutou> Use Red Arrow in build directly 3a776306 <Kouhei Sutou> Fix package name for MSYS2 022dd073 <Kenta Murata> Fix the benchmark for dictionary array 3e18e85f <Kenta Murata> Rename a benchmark file 2c6142c8 <Kenta Murata> Rename a directory 40595617 <Kenta Murata> Fix benchmark task 5d380fdc <Kenta Murata> Use values between 2**16 and 2**32-1 for testing UInt32Array d0b8d0fc <Kenta Murata> Fix styling f41550d0 <Kenta Murata> Stop using precomputed scales of time unit bdf7090e <Kenta Murata> Remove needless scope blocks 8e81e9b2 <Kouhei Sutou> Implement converters based on visitor 31ca243a <Kenta Murata> Add extension files in the gem package ec11062b <Kenta Murata> Add arrow ext dir in $LOAD_PATH d0a4733b <Kenta Murata> Fix benchmark against the removal of convert_decimal: option 24d14800 <Kenta Murata> Guard RVAL2GOBJ by rb::protect 73063288 <Kenta Murata> Drop convert_decimal: option ba96767a <Kenta Murata> Remove a needless member function dfa6b4ae <Kenta Murata> Introduce require_extension_library method to load arrow.so ee8dcccf <Kenta Murata> Avoid using rb_str_new_cstr 7e51f7a8 <Kenta Murata> Fix variable names in test 8d1c36e4 <Kenta Murata> Use #pragma once bba7eb4e <Kenta Murata> Rename a variable b47fbfeb <Kenta Murata> Use GOBJ2RVAL_UNREF correctly 4a73d5eb <Kenta Murata> Use auto instead of VALUE d59d72be <Kenta Murata> Put some codes out side of rb::protect block 9995dec9 <Kenta Murata> Use rb_enc_str_new with rb_ascii8bit_encoding for binary string creation e0b26563 <Kenta Murata> Replace assert with DCHECK 70ccc327 <Kenta Murata> Make cArrowRecordBatch a local variable 16cbe8d9 <Kenta Murata> Use rb::RawMethod d6b7d3da <Kenta Murata> Remove needless require ed507393 <Kenta Murata> Fix styling da2c49f6 <Kenta Murata> Rename rb_cDate to cDate 4645c17c <Kenta Murata> Rename cRecordBatch to cArrowRecordBatch 2c410cf6 <Kenta Murata> Remove namespace comments 55df564d <Kenta Murata> Rename files 0a4016b3 <Kenta Murata> Replace license headers 180c7c3e <Kenta Murata> Use static timestamp_range in benchmark 1413aa6b <Kouhei Sutou> Set PKG_CONFIG_PATH to build Red Arrow 509af8ff <Kenta Murata> Add benchmark task e3fa4e62 <Kenta Murata> Fix word usage 31c8a270 <Kenta Murata> Use double quotations 9568a371 <Kenta Murata> Remove redundant sub test cases 7fb64a9b <Kenta Murata> Remove parentheses with empty argument 5bdd9952 <Kenta Murata> Remove needless require 56634502 <Kenta Murata> Remove arrow_ruby_compile function 794c7a27 <Kenta Murata> Revert needless changes be026ed1 <Kouhei Sutou> Fix style c0b7442d <Kouhei Sutou> Add "compile" task 99442745 <Kouhei Sutou> Add support "rake clean" and "rake clobber" 56bbc15b <Kouhei Sutou> Run extconf.rb automatically in test/run-test.rb 4d9440d0 <Kouhei Sutou> Use Ext++ d16b6e1d <Kouhei Sutou> Sort alphabetically 6217e862 <Kouhei Sutou> Add support for auto package install 08818907 <Kouhei Sutou> Remove rake-compiler dependency 38d6f530 <Kenta Murata> Make the default value of conver_decimal true f3cd6724 <Kenta Murata> Move gem entries from Gemfile into gemspec 2b97d3e3 <Kenta Murata> Fix benchmarks 0a9f9ece <Kenta Murata> Fix the random state of Faker in benchmark 26c7ea17 <Kenta Murata> Add benchmark scripts 7eaa0690 <Kenta Murata> Separate raw_records test de943457 <Kenta Murata> Fix missing const modifiers 40c4966c <Kenta Murata> Support Struct in Union cba224a3 <Kenta Murata> Support Dictionary in Union 95517e3a <Kenta Murata> Add tests for dense union in dense union 86f8bc11 <Kenta Murata> Fix travis script 82ce1bcb <Kenta Murata> Add license comment 9df5765a <Kenta Murata> Refactoring test 5016c6a9 <Kenta Murata> Support Dictionary in UnionArray 462480d6 <Kenta Murata> Support Date32, Date64, and Timestamp in UnionArray 0d358a3c <Kenta Murata> Refactoring 5a2276bf <Kenta Murata> Use non-default field name for a list in a record batch 6b9de32d <Kenta Murata> Add support of FixedSizeBinary in Union 786e4388 <Kenta Murata> Refactoring of Decimal128 converter fcd3ab95 <Kenta Murata> Support SparseUnion ce10bcbd <Kenta Murata> Support Decimal128 in DenseUnion 5b14d106 <Kenta Murata> Add partial support of DenseUnion 432f05ae <Kenta Murata> Fix encoding bug 94fe1c48 <Kenta Murata> Add tentative support of HalfFloat 72a64f5e <Kenta Murata> Refactoring 61f0bc50 <Kenta Murata> Support Dictionary indices 63876209 <Kenta Murata> Support Struct 223db821 <Kenta Murata> Use RETURN_NOT_OK cead59bb <Kenta Murata> Save errinfo if rb::error created from state 0186066c <Kenta Murata> Extract ArrayConverter class 294c7496 <Kenta Murata> Supply PKG_CONFIG_PATH to rake compile 53fa1ffd <Kenta Murata> Fix CI script 17a69c2a <Kenta Murata> Support List 88119d5a <Kenta Murata> Remove pure-Ruby version aa52a174 <Kenta Murata> Tweak comment and error message 6c8a2e2c <Kenta Murata> Add tentative supports of Time32 and Time64 d33d77d2 <Kenta Murata> Support Timestamp 72935bc9 <Kenta Murata> Support Date32 and Date64 93518946 <Kenta Murata> Use rb_jump_tag to raise deferred exception 476f544b <Kenta Murata> Add convert_decimal kwarg 54e59e78 <Kenta Murata> Fix VisitValue for nil 5f7a78ca <Kenta Murata> Use RawRecordsBuilder d8d54d67 <Kenta Murata> Add RawRecordsBuilder 4d6d6392 <Kenta Murata> Update test case 8d68d5bc <Kenta Murata> Add a partial native implementation of RecordBatch#raw_records 25a1925c <Kenta Murata> Add test and tentative implementation of RecordBatch#raw_records --- ci/travis_script_ruby.sh | 11 +- ruby/red-arrow-cuda/test/run-test.rb | 2 + ruby/red-arrow/.gitignore | 2 + ruby/red-arrow/Rakefile | 53 +- ruby/red-arrow/benchmark/raw-records/boolean.yml | 65 ++ .../red-arrow/benchmark/raw-records/decimal128.yml | 66 ++ .../red-arrow/benchmark/raw-records/dictionary.yml | 73 ++ ruby/red-arrow/benchmark/raw-records/int64.yml | 65 ++ ruby/red-arrow/benchmark/raw-records/list.yml | 68 ++ ruby/red-arrow/benchmark/raw-records/string.yml | 65 ++ ruby/red-arrow/benchmark/raw-records/timestamp.yml | 72 ++ ruby/red-arrow/dependency-check/Rakefile | 43 -- ruby/red-arrow/ext/arrow/arrow.cpp | 43 ++ ruby/red-arrow/ext/arrow/extconf.rb | 46 ++ ruby/red-arrow/ext/arrow/record-batch.cpp | 756 +++++++++++++++++++++ ruby/red-arrow/ext/arrow/red-arrow.hpp | 52 ++ .../arrow/binary-array-builder.rb} | 39 +- ruby/red-arrow/lib/arrow/data-type.rb | 12 +- ruby/red-arrow/lib/arrow/list-array-builder.rb | 2 +- ruby/red-arrow/lib/arrow/loader.rb | 6 + ruby/red-arrow/lib/arrow/struct-array-builder.rb | 6 +- ruby/red-arrow/red-arrow.gemspec | 8 +- .../raw-records/record-batch/test-basic-arrays.rb | 349 ++++++++++ .../record-batch/test-dense-union-array.rb | 487 +++++++++++++ .../raw-records/record-batch/test-list-array.rb | 499 ++++++++++++++ .../record-batch/test-multiple-columns.rb | 49 ++ .../record-batch/test-sparse-union-array.rb | 475 +++++++++++++ .../raw-records/record-batch/test-struct-array.rb | 427 ++++++++++++ ruby/red-arrow/test/run-test.rb | 21 + ruby/red-arrow/test/test-data-type.rb | 5 + ruby/red-gandiva/test/run-test.rb | 2 + ruby/red-parquet/test/run-test.rb | 2 + ruby/red-plasma/test/run-test.rb | 2 + 33 files changed, 3793 insertions(+), 80 deletions(-) diff --git a/ci/travis_script_ruby.sh b/ci/travis_script_ruby.sh index 7d69bee..0ae85b4 100755 --- a/ci/travis_script_ruby.sh +++ b/ci/travis_script_ruby.sh @@ -23,11 +23,16 @@ source $TRAVIS_BUILD_DIR/ci/travis_env_common.sh arrow_ruby_run_test() { - local arrow_c_glib_lib_dir=$1 + local arrow_c_glib_lib_dir="$1" - export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$arrow_c_glib_lib_dir - export GI_TYPELIB_PATH=$arrow_c_glib_lib_dir/girepository-1.0 + local ld_library_path_keep="$LD_LIBRARY_PATH" + local pkg_config_path_keep="$PKG_COFNIG_PATH" + LD_LIBRARY_PATH="${arrow_c_glib_lib_dir}:${LD_LIBRARY_PATH}" + PKG_CONFIG_PATH="${arrow_c_glib_lib_dir}/pkgconfig:${PKG_CONFIG_PATH}" + export GI_TYPELIB_PATH="${arrow_c_glib_lib_dir}/girepository-1.0" test/run-test.rb + LD_LIBRARY_PATH="$ld_library_path_keep" + PKG_CONFIG_PATH="$pkg_config_path_keep" } export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$ARROW_CPP_INSTALL/lib diff --git a/ruby/red-arrow-cuda/test/run-test.rb b/ruby/red-arrow-cuda/test/run-test.rb index b826f3e..a4f7f76 100755 --- a/ruby/red-arrow-cuda/test/run-test.rb +++ b/ruby/red-arrow-cuda/test/run-test.rb @@ -28,7 +28,9 @@ lib_dir = base_dir + "lib" test_dir = base_dir + "test" arrow_lib_dir = arrow_base_dir + "lib" +arrow_ext_dir = arrow_base_dir + "ext" + "arrow" +$LOAD_PATH.unshift(arrow_ext_dir.to_s) $LOAD_PATH.unshift(arrow_lib_dir.to_s) $LOAD_PATH.unshift(lib_dir.to_s) diff --git a/ruby/red-arrow/.gitignore b/ruby/red-arrow/.gitignore index 68e4b5c..e41483f 100644 --- a/ruby/red-arrow/.gitignore +++ b/ruby/red-arrow/.gitignore @@ -17,4 +17,6 @@ /.yardoc/ /doc/reference/ +/ext/arrow/Makefile +/ext/arrow/mkmf.log /pkg/ diff --git a/ruby/red-arrow/Rakefile b/ruby/red-arrow/Rakefile index a3ece36..af7ed9b 100644 --- a/ruby/red-arrow/Rakefile +++ b/ruby/red-arrow/Rakefile @@ -17,27 +17,72 @@ # specific language governing permissions and limitations # under the License. -require "rubygems" require "bundler/gem_helper" +require "rake/clean" require "yard" base_dir = File.join(__dir__) helper = Bundler::GemHelper.new(base_dir) helper.install +spec = helper.gemspec release_task = Rake::Task["release"] release_task.prerequisites.replace(["build", "release:rubygem_push"]) +def run_extconf(extension_dir, *arguments) + cd(extension_dir) do + ruby("extconf.rb", *arguments) + end +end + +spec.extensions.each do |extension| + extension_dir = File.dirname(extension) + CLOBBER << File.join(extension_dir, "Makefile") + CLOBBER << File.join(extension_dir, "mkmf.log") + + makefile = File.join(extension_dir, "Makefile") + file makefile do + run_extconf(extension_dir) + end + + desc "Configure" + task :configure do + run_extconf(extension_dir) + end + + desc "Compile" + task :compile => makefile do + cd(extension_dir) do + sh("make") + end + end + + task :clean do + cd(extension_dir) do + sh("make", "clean") if File.exist?("Makefile") + end + end +end + desc "Run tests" task :test do - cd("dependency-check") do - ruby("-S", "rake") - end ruby("test/run-test.rb") end task default: :test +desc "Run benchmarks" +task :benchmark do + benchmarks = if ENV["BENCHMARKS"] + ENV["BENCHMARKS"].split + else + FileList["benchmark/{,*/**/}*.yml"] + end + benchmarks.each do |benchmark| + sh("benchmark-driver", benchmark) + end +end + YARD::Rake::YardocTask.new do |task| end diff --git a/ruby/red-arrow/benchmark/raw-records/boolean.yml b/ruby/red-arrow/benchmark/raw-records/boolean.yml new file mode 100644 index 0000000..5e2551e --- /dev/null +++ b/ruby/red-arrow/benchmark/raw-records/boolean.yml @@ -0,0 +1,65 @@ +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. + +contexts: + - name: master + prelude: | + $LOAD_PATH.unshift(File.expand_path("ext/arrow")) + $LOAD_PATH.unshift(File.expand_path("lib")) +prelude: |- + require "arrow" + require "faker" + + state = ENV.fetch("FAKER_RANDOM_SEED", 17).to_i + Faker::Config.random = Random.new(state) + + n_rows = 1000 + n_columns = 10 + type = :boolean + + fields = {} + arrays = {} + n_columns.times do |i| + column_name = "column_#{i}" + fields[column_name] = type + arrays[column_name] = n_rows.times.map { Faker::Boolean.boolean } + end + record_batch = Arrow::RecordBatch.new(fields, arrays) + + def pure_ruby_raw_records(record_batch) + n_rows = record_batch.n_rows + n_columns = record_batch.n_columns + columns = record_batch.columns + records = [] + i = 0 + while i < n_rows + record = [] + j = 0 + while j < n_columns + record << columns[j][i] + j += 1 + end + records << record + i += 1 + end + records + end +benchmark: + pure_ruby: |- + pure_ruby_raw_records(record_batch) + raw_records: |- + record_batch.raw_records diff --git a/ruby/red-arrow/benchmark/raw-records/decimal128.yml b/ruby/red-arrow/benchmark/raw-records/decimal128.yml new file mode 100644 index 0000000..9b2fb2e --- /dev/null +++ b/ruby/red-arrow/benchmark/raw-records/decimal128.yml @@ -0,0 +1,66 @@ +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. + +contexts: + - name: master + prelude: | + $LOAD_PATH.unshift(File.expand_path("ext/arrow")) + $LOAD_PATH.unshift(File.expand_path("lib")) +prelude: |- + require "arrow" + require "faker" + + state = ENV.fetch("FAKER_RANDOM_SEED", 17).to_i + Faker::Config.random = Random.new(state) + + n_rows = 1000 + n_columns = 10 + type = Arrow::Decimal128DataType.new(10, 5) + + fields = {} + arrays = {} + n_columns.times do |i| + column_name = "column_#{i}" + fields[column_name] = type + arrays[column_name] = n_rows.times.map { Faker::Number.decimal(10, 5) } + end + record_batch = Arrow::RecordBatch.new(fields, arrays) + + def pure_ruby_raw_records(record_batch) + n_rows = record_batch.n_rows + n_columns = record_batch.n_columns + columns = record_batch.columns + records = [] + i = 0 + while i < n_rows + record = [] + j = 0 + while j < n_columns + x = columns[j][i] + record << BigDecimal(x.to_s) + j += 1 + end + records << record + i += 1 + end + records + end +benchmark: + pure_ruby: |- + pure_ruby_raw_records(record_batch) + raw_records: |- + record_batch.raw_records() diff --git a/ruby/red-arrow/benchmark/raw-records/dictionary.yml b/ruby/red-arrow/benchmark/raw-records/dictionary.yml new file mode 100644 index 0000000..3b60abd --- /dev/null +++ b/ruby/red-arrow/benchmark/raw-records/dictionary.yml @@ -0,0 +1,73 @@ +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. + +contexts: + - name: master + prelude: | + $LOAD_PATH.unshift(File.expand_path("ext/arrow")) + $LOAD_PATH.unshift(File.expand_path("lib")) +prelude: |- + require "arrow" + require "faker" + + state = ENV.fetch("FAKER_RANDOM_SEED", 17).to_i + Faker::Config.random = Random.new(state) + + n_rows = 1000 + n_columns = 10 + dictionary = Arrow::StringArray.new( + 100.times.map { Faker::Book.genre }.uniq.sort + ) + type = Arrow::DictionaryDataType.new(:int8, dictionary, true) + + fields = n_columns.times.map {|i| ["column_#{i}".to_sym, type] }.to_h + schema = Arrow::Schema.new(**fields) + arrays = n_columns.times.map do + Arrow::DictionaryArray.new( + type, + Arrow::Int8Array.new( + n_rows.times.map { + Faker::Number.within(0 ... dictionary.length) + } + ) + ) + end + record_batch = Arrow::RecordBatch.new(schema, n_rows, arrays) + + def pure_ruby_raw_records(record_batch) + n_rows = record_batch.n_rows + n_columns = record_batch.n_columns + columns = record_batch.columns + records = [] + i = 0 + while i < n_rows + record = [] + j = 0 + while j < n_columns + record << columns[j].indices[i] + j += 1 + end + records << record + i += 1 + end + records + end +benchmark: + pure_ruby: |- + pure_ruby_raw_records(record_batch) + raw_records: |- + record_batch.raw_records diff --git a/ruby/red-arrow/benchmark/raw-records/int64.yml b/ruby/red-arrow/benchmark/raw-records/int64.yml new file mode 100644 index 0000000..65d7b11 --- /dev/null +++ b/ruby/red-arrow/benchmark/raw-records/int64.yml @@ -0,0 +1,65 @@ +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. + +contexts: + - name: master + prelude: | + $LOAD_PATH.unshift(File.expand_path("ext/arrow")) + $LOAD_PATH.unshift(File.expand_path("lib")) +prelude: |- + require "arrow" + require "faker" + + state = ENV.fetch("FAKER_RANDOM_SEED", 17).to_i + Faker::Config.random = Random.new(state) + + n_rows = 1000 + n_columns = 10 + type = :int64 + + fields = {} + arrays = {} + n_columns.times do |i| + column_name = "column_#{i}" + fields[column_name] = type + arrays[column_name] = n_rows.times.map { Faker::Number.number(18).to_i } + end + record_batch = Arrow::RecordBatch.new(fields, arrays) + + def pure_ruby_raw_records(record_batch) + n_rows = record_batch.n_rows + n_columns = record_batch.n_columns + columns = record_batch.columns + records = [] + i = 0 + while i < n_rows + record = [] + j = 0 + while j < n_columns + record << columns[j][i] + j += 1 + end + records << record + i += 1 + end + records + end +benchmark: + pure_ruby: |- + pure_ruby_raw_records(record_batch) + raw_records: |- + record_batch.raw_records diff --git a/ruby/red-arrow/benchmark/raw-records/list.yml b/ruby/red-arrow/benchmark/raw-records/list.yml new file mode 100644 index 0000000..f29b26f --- /dev/null +++ b/ruby/red-arrow/benchmark/raw-records/list.yml @@ -0,0 +1,68 @@ +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. + +contexts: + - name: master + prelude: | + $LOAD_PATH.unshift(File.expand_path("ext/arrow")) + $LOAD_PATH.unshift(File.expand_path("lib")) +prelude: |- + require "arrow" + require "faker" + + state = ENV.fetch("FAKER_RANDOM_SEED", 17).to_i + Faker::Config.random = Random.new(state) + + n_rows = 1000 + n_columns = 10 + type = Arrow::ListDataType.new(name: "values", type: :double) + + fields = {} + arrays = {} + n_columns.times do |i| + column_name = "column_#{i}" + fields[column_name] = type + arrays[column_name] = n_rows.times.map { + len = Faker::Number.within(1 ... 100) + len.times.map { Faker::Number.normal(0, 1e+6) } + } + end + record_batch = Arrow::RecordBatch.new(fields, arrays) + + def pure_ruby_raw_records(record_batch) + n_rows = record_batch.n_rows + n_columns = record_batch.n_columns + columns = record_batch.columns + records = [] + i = 0 + while i < n_rows + record = [] + j = 0 + while j < n_columns + record << columns[j][i] + j += 1 + end + records << record + i += 1 + end + records + end +benchmark: + pure_ruby: |- + pure_ruby_raw_records(record_batch) + raw_records: |- + record_batch.raw_records diff --git a/ruby/red-arrow/benchmark/raw-records/string.yml b/ruby/red-arrow/benchmark/raw-records/string.yml new file mode 100644 index 0000000..2854a37 --- /dev/null +++ b/ruby/red-arrow/benchmark/raw-records/string.yml @@ -0,0 +1,65 @@ +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. + +contexts: + - name: master + prelude: | + $LOAD_PATH.unshift(File.expand_path("ext/arrow")) + $LOAD_PATH.unshift(File.expand_path("lib")) +prelude: |- + require "arrow" + require "faker" + + state = ENV.fetch("FAKER_RANDOM_SEED", 17).to_i + Faker::Config.random = Random.new(state) + + n_rows = 1000 + n_columns = 10 + type = :string + + fields = {} + arrays = {} + n_columns.times do |i| + column_name = "column_#{i}" + fields[column_name] = type + arrays[column_name] = n_rows.times.map { Faker::Name.name } + end + record_batch = Arrow::RecordBatch.new(fields, arrays) + + def pure_ruby_raw_records(record_batch) + n_rows = record_batch.n_rows + n_columns = record_batch.n_columns + columns = record_batch.columns + records = [] + i = 0 + while i < n_rows + record = [] + j = 0 + while j < n_columns + record << columns[j][i] + j += 1 + end + records << record + i += 1 + end + records + end +benchmark: + pure_ruby: |- + pure_ruby_raw_records(record_batch) + raw_records: |- + record_batch.raw_records diff --git a/ruby/red-arrow/benchmark/raw-records/timestamp.yml b/ruby/red-arrow/benchmark/raw-records/timestamp.yml new file mode 100644 index 0000000..b57570f --- /dev/null +++ b/ruby/red-arrow/benchmark/raw-records/timestamp.yml @@ -0,0 +1,72 @@ +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. + +contexts: + - name: master + prelude: | + $LOAD_PATH.unshift(File.expand_path("ext/arrow")) + $LOAD_PATH.unshift(File.expand_path("lib")) +prelude: |- + require "arrow" + require "faker" + + state = ENV.fetch("FAKER_RANDOM_SEED", 17).to_i + Faker::Config.random = Random.new(state) + + n_rows = 1000 + n_columns = 10 + type = Arrow::TimestampDataType.new(:micro) + base_timestamp = Time.at(Faker::Number.within(0 ... 1_000_000_000)) + thirty_days_in_sec = 30*24*3600 + timestamp_range = [base_timestamp - thirty_days_in_sec, base_timestamp + thirty_days_in_sec] + + fields = {} + arrays = {} + n_columns.times do |i| + column_name = "column_#{i}" + fields[column_name] = type + arrays[column_name] = n_rows.times.map { + sec = Faker::Time.between(*timestamp_range).to_i + micro = Faker::Number.within(0 ... 1_000_000) + sec * 1_000_000 + micro + } + end + record_batch = Arrow::RecordBatch.new(fields, arrays) + + def pure_ruby_raw_records(record_batch) + n_rows = record_batch.n_rows + n_columns = record_batch.n_columns + columns = record_batch.columns + records = [] + i = 0 + while i < n_rows + record = [] + j = 0 + while j < n_columns + record << columns[j][i] + j += 1 + end + records << record + i += 1 + end + records + end +benchmark: + pure_ruby: |- + pure_ruby_raw_records(record_batch) + raw_records: |- + record_batch.raw_records diff --git a/ruby/red-arrow/dependency-check/Rakefile b/ruby/red-arrow/dependency-check/Rakefile deleted file mode 100644 index e80e732..0000000 --- a/ruby/red-arrow/dependency-check/Rakefile +++ /dev/null @@ -1,43 +0,0 @@ -# -*- ruby -*- -# -# Licensed to the Apache Software Foundation (ASF) under one -# or more contributor license agreements. See the NOTICE file -# distributed with this work for additional information -# regarding copyright ownership. The ASF licenses this file -# to you under the Apache License, Version 2.0 (the -# "License"); you may not use this file except in compliance -# with the License. You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, -# software distributed under the License is distributed on an -# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY -# KIND, either express or implied. See the License for the -# specific language governing permissions and limitations -# under the License. - -require "pkg-config" -require "native-package-installer" - -case RUBY_PLATFORM -when /mingw|mswin/ - task :default => "nothing" -else - task :default => "dependency:check" -end - -task :nothing do -end - -namespace :dependency do - desc "Check dependency" - task :check do - unless PKGConfig.check_version?("arrow-glib", 0, 9, 0) - unless NativePackageInstaller.install(:debian => "libarrow-glib-dev", - :redhat => "arrow-glib-devel") - exit(false) - end - end - end -end diff --git a/ruby/red-arrow/ext/arrow/arrow.cpp b/ruby/red-arrow/ext/arrow/arrow.cpp new file mode 100644 index 0000000..48b98fb --- /dev/null +++ b/ruby/red-arrow/ext/arrow/arrow.cpp @@ -0,0 +1,43 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ + +#include "red-arrow.hpp" + +#include <ruby.hpp> + +namespace red_arrow { + VALUE cDate; + ID id_BigDecimal; + ID id_jd; + ID id_to_datetime; +} + +extern "C" void Init_arrow() { + auto mArrow = rb_const_get_at(rb_cObject, rb_intern("Arrow")); + auto cArrowRecordBatch = rb_const_get_at(mArrow, rb_intern("RecordBatch")); + rb_define_method(cArrowRecordBatch, "raw_records", + reinterpret_cast<rb::RawMethod>(red_arrow::record_batch_raw_records), + 0); + + red_arrow::cDate = rb_const_get(rb_cObject, rb_intern("Date")); + + red_arrow::id_BigDecimal = rb_intern("BigDecimal"); + red_arrow::id_jd = rb_intern("jd"); + red_arrow::id_to_datetime = rb_intern("to_datetime"); +} diff --git a/ruby/red-arrow/ext/arrow/extconf.rb b/ruby/red-arrow/ext/arrow/extconf.rb new file mode 100644 index 0000000..a8b9a0b --- /dev/null +++ b/ruby/red-arrow/ext/arrow/extconf.rb @@ -0,0 +1,46 @@ +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. + +require "extpp" +require "mkmf-gnome2" + +unless required_pkg_config_package("arrow", + debian: "libarrow-dev", + redhat: "arrow-devel", + homebrew: "apache-arrow", + msys2: "arrow") + exit(false) +end + +unless required_pkg_config_package("arrow-glib", + debian: "libarrow-glib-dev", + redhat: "arrow-glib-devel", + homebrew: "apache-arrow-glib", + msys2: "arrow") + exit(false) +end + +[ + ["glib2", "ext/glib2"], +].each do |name, relative_source_dir| + spec = find_gem_spec(name) + source_dir = File.join(spec.full_gem_path, relative_source_dir) + build_dir = source_dir + add_depend_package_path(name, source_dir, build_dir) +end + +create_makefile("arrow") diff --git a/ruby/red-arrow/ext/arrow/record-batch.cpp b/ruby/red-arrow/ext/arrow/record-batch.cpp new file mode 100644 index 0000000..506c8e1 --- /dev/null +++ b/ruby/red-arrow/ext/arrow/record-batch.cpp @@ -0,0 +1,756 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ + +#include "red-arrow.hpp" + +#include <ruby.hpp> +#include <ruby/encoding.h> + +#include <arrow-glib/error.hpp> + +#include <arrow/util/logging.h> + +namespace red_arrow { + namespace { + using Status = arrow::Status; + + void check_status(const Status&& status, const char* context) { + GError* error = nullptr; + if (!garrow_error_check(&error, status, context)) { + RG_RAISE_ERROR(error); + } + } + + class ListArrayValueConverter; + class StructArrayValueConverter; + class UnionArrayValueConverter; + class DictionaryArrayValueConverter; + + class ArrayValueConverter { + public: + ArrayValueConverter() + : decimal_buffer_(), + list_array_value_converter_(nullptr), + struct_array_value_converter_(nullptr), + union_array_value_converter_(nullptr), + dictionary_array_value_converter_(nullptr) { + } + + void set_sub_value_converters(ListArrayValueConverter* list_array_value_converter, + StructArrayValueConverter* struct_array_value_converter, + UnionArrayValueConverter* union_array_value_converter, + DictionaryArrayValueConverter* dictionary_array_value_converter) { + list_array_value_converter_ = list_array_value_converter; + struct_array_value_converter_ = struct_array_value_converter; + union_array_value_converter_ = union_array_value_converter; + dictionary_array_value_converter_ = dictionary_array_value_converter; + } + + inline VALUE convert(const arrow::NullArray& array, + const int64_t i) { + return Qnil; + } + + inline VALUE convert(const arrow::BooleanArray& array, + const int64_t i) { + return array.Value(i) ? Qtrue : Qfalse; + } + + inline VALUE convert(const arrow::Int8Array& array, + const int64_t i) { + return INT2NUM(array.Value(i)); + } + + inline VALUE convert(const arrow::Int16Array& array, + const int64_t i) { + return INT2NUM(array.Value(i)); + } + + inline VALUE convert(const arrow::Int32Array& array, + const int64_t i) { + return INT2NUM(array.Value(i)); + } + + inline VALUE convert(const arrow::Int64Array& array, + const int64_t i) { + return LL2NUM(array.Value(i)); + } + + inline VALUE convert(const arrow::UInt8Array& array, + const int64_t i) { + return UINT2NUM(array.Value(i)); + } + + inline VALUE convert(const arrow::UInt16Array& array, + const int64_t i) { + return UINT2NUM(array.Value(i)); + } + + inline VALUE convert(const arrow::UInt32Array& array, + const int64_t i) { + return UINT2NUM(array.Value(i)); + } + + inline VALUE convert(const arrow::UInt64Array& array, + const int64_t i) { + return ULL2NUM(array.Value(i)); + } + + // TODO + // inline VALUE convert(const arrow::HalfFloatArray& array, + // const int64_t i) { + // } + + inline VALUE convert(const arrow::FloatArray& array, + const int64_t i) { + return DBL2NUM(array.Value(i)); + } + + inline VALUE convert(const arrow::DoubleArray& array, + const int64_t i) { + return DBL2NUM(array.Value(i)); + } + + inline VALUE convert(const arrow::BinaryArray& array, + const int64_t i) { + int32_t length; + const auto value = array.GetValue(i, &length); + // TODO: encoding support + return rb_enc_str_new(reinterpret_cast<const char*>(value), + length, + rb_ascii8bit_encoding()); + } + + inline VALUE convert(const arrow::StringArray& array, + const int64_t i) { + int32_t length; + const auto value = array.GetValue(i, &length); + return rb_utf8_str_new(reinterpret_cast<const char*>(value), + length); + } + + inline VALUE convert(const arrow::FixedSizeBinaryArray& array, + const int64_t i) { + return rb_enc_str_new(reinterpret_cast<const char*>(array.Value(i)), + array.byte_width(), + rb_ascii8bit_encoding()); + } + + constexpr static int32_t JULIAN_DATE_UNIX_EPOCH = 2440588; + inline VALUE convert(const arrow::Date32Array& array, + const int64_t i) { + const auto value = array.Value(i); + const auto days_in_julian = value + JULIAN_DATE_UNIX_EPOCH; + return rb_funcall(cDate, id_jd, 1, LONG2NUM(days_in_julian)); + } + + inline VALUE convert(const arrow::Date64Array& array, + const int64_t i) { + const auto value = array.Value(i); + auto msec = LL2NUM(value); + auto sec = rb_rational_new(msec, INT2NUM(1000)); + auto time_value = rb_time_num_new(sec, Qnil); + return rb_funcall(time_value, id_to_datetime, 0, 0); + } + + inline VALUE convert(const arrow::Time32Array& array, + const int64_t i) { + // TODO: unit treatment + const auto value = array.Value(i); + return INT2NUM(value); + } + + inline VALUE convert(const arrow::Time64Array& array, + const int64_t i) { + // TODO: unit treatment + const auto value = array.Value(i); + return LL2NUM(value); + } + + inline VALUE convert(const arrow::TimestampArray& array, + const int64_t i) { + const auto type = + arrow::internal::checked_cast<const arrow::TimestampType*>(array.type().get()); + auto scale = time_unit_to_scale(type->unit()); + if (NIL_P(scale)) { + rb_raise(rb_eArgError, "Invalid TimeUnit"); + } + auto value = array.Value(i); + auto sec = rb_rational_new(LL2NUM(value), scale); + return rb_time_num_new(sec, Qnil); + } + + // TODO + // inline VALUE convert(const arrow::IntervalArray& array, + // const int64_t i) { + // }; + + VALUE convert(const arrow::ListArray& array, + const int64_t i); + + VALUE convert(const arrow::StructArray& array, + const int64_t i); + + VALUE convert(const arrow::UnionArray& array, + const int64_t i); + + VALUE convert(const arrow::DictionaryArray& array, + const int64_t i); + + inline VALUE convert(const arrow::Decimal128Array& array, + const int64_t i) { + decimal_buffer_ = array.FormatValue(i); + return rb_funcall(rb_cObject, + id_BigDecimal, + 1, + rb_enc_str_new(decimal_buffer_.data(), + decimal_buffer_.length(), + rb_ascii8bit_encoding())); + } + + private: + std::string decimal_buffer_; + ListArrayValueConverter* list_array_value_converter_; + StructArrayValueConverter* struct_array_value_converter_; + UnionArrayValueConverter* union_array_value_converter_; + DictionaryArrayValueConverter* dictionary_array_value_converter_; + }; + + class ListArrayValueConverter : public arrow::ArrayVisitor { + public: + explicit ListArrayValueConverter(ArrayValueConverter* converter) + : array_value_converter_(converter), + offset_(0), + length_(0), + result_(Qnil) {} + + VALUE convert(const arrow::ListArray& array, const int64_t index) { + auto values = array.values().get(); + auto offset_keep = offset_; + auto length_keep = length_; + offset_ = array.value_offset(index); + length_ = array.value_length(index); + auto result_keep = result_; + result_ = rb_ary_new_capa(length_); + check_status(values->Accept(this), + "[raw-records][list-array]"); + offset_ = offset_keep; + length_ = length_keep; + auto result_return = result_; + result_ = result_keep; + return result_return; + } + +#define VISIT(TYPE) \ + Status Visit(const arrow::TYPE ## Array& array) override { \ + return visit_value(array); \ + } + + VISIT(Null) + VISIT(Boolean) + VISIT(Int8) + VISIT(Int16) + VISIT(Int32) + VISIT(Int64) + VISIT(UInt8) + VISIT(UInt16) + VISIT(UInt32) + VISIT(UInt64) + // TODO + // VISIT(HalfFloat) + VISIT(Float) + VISIT(Double) + VISIT(Binary) + VISIT(String) + VISIT(FixedSizeBinary) + VISIT(Date32) + VISIT(Date64) + VISIT(Time32) + VISIT(Time64) + VISIT(Timestamp) + // TODO + // VISIT(Interval) + VISIT(List) + VISIT(Struct) + VISIT(Union) + VISIT(Dictionary) + VISIT(Decimal128) + // TODO + // VISIT(Extension) + +#undef VISIT + + private: + template <typename ArrayType> + inline VALUE convert_value(const ArrayType& array, + const int64_t i) { + return array_value_converter_->convert(array, i); + } + + template <typename ArrayType> + Status visit_value(const ArrayType& array) { + if (array.null_count() > 0) { + for (int64_t i = 0; i < length_; ++i) { + auto value = Qnil; + if (!array.IsNull(i + offset_)) { + value = convert_value(array, i + offset_); + } + rb_ary_push(result_, value); + } + } else { + for (int64_t i = 0; i < length_; ++i) { + rb_ary_push(result_, convert_value(array, i + offset_)); + } + } + return Status::OK(); + } + + ArrayValueConverter* array_value_converter_; + int32_t offset_; + int32_t length_; + VALUE result_; + }; + + class StructArrayValueConverter : public arrow::ArrayVisitor { + public: + explicit StructArrayValueConverter(ArrayValueConverter* converter) + : array_value_converter_(converter), + key_(Qnil), + index_(0), + result_(Qnil) {} + + VALUE convert(const arrow::StructArray& array, + const int64_t index) { + auto index_keep = index_; + auto result_keep = result_; + index_ = index; + result_ = rb_hash_new(); + const auto struct_type = array.struct_type(); + const auto n = struct_type->num_children(); + for (int i = 0; i < n; ++i) { + const auto field_type = struct_type->child(i).get(); + const auto& field_name = field_type->name(); + auto key_keep = key_; + key_ = rb_utf8_str_new(field_name.data(), field_name.length()); + const auto field_array = array.field(i).get(); + check_status(field_array->Accept(this), + "[raw-records][struct-array]"); + key_ = key_keep; + } + auto result_return = result_; + result_ = result_keep; + index_ = index_keep; + return result_return; + } + +#define VISIT(TYPE) \ + Status Visit(const arrow::TYPE ## Array& array) override { \ + fill_field(array); \ + return Status::OK(); \ + } + + VISIT(Null) + VISIT(Boolean) + VISIT(Int8) + VISIT(Int16) + VISIT(Int32) + VISIT(Int64) + VISIT(UInt8) + VISIT(UInt16) + VISIT(UInt32) + VISIT(UInt64) + // TODO + // VISIT(HalfFloat) + VISIT(Float) + VISIT(Double) + VISIT(Binary) + VISIT(String) + VISIT(FixedSizeBinary) + VISIT(Date32) + VISIT(Date64) + VISIT(Time32) + VISIT(Time64) + VISIT(Timestamp) + // TODO + // VISIT(Interval) + VISIT(List) + VISIT(Struct) + VISIT(Union) + VISIT(Dictionary) + VISIT(Decimal128) + // TODO + // VISIT(Extension) + +#undef VISIT + + private: + template <typename ArrayType> + inline VALUE convert_value(const ArrayType& array, + const int64_t i) { + return array_value_converter_->convert(array, i); + } + + template <typename ArrayType> + void fill_field(const ArrayType& array) { + if (array.IsNull(index_)) { + rb_hash_aset(result_, key_, Qnil); + } else { + rb_hash_aset(result_, key_, convert_value(array, index_)); + } + } + + ArrayValueConverter* array_value_converter_; + VALUE key_; + int64_t index_; + VALUE result_; + }; + + class UnionArrayValueConverter : public arrow::ArrayVisitor { + public: + explicit UnionArrayValueConverter(ArrayValueConverter* converter) + : array_value_converter_(converter), + index_(0), + result_(Qnil) {} + + VALUE convert(const arrow::UnionArray& array, + const int64_t index) { + const auto index_keep = index_; + const auto result_keep = result_; + index_ = index; + switch (array.mode()) { + case arrow::UnionMode::SPARSE: + convert_sparse(array); + break; + case arrow::UnionMode::DENSE: + convert_dense(array); + break; + default: + rb_raise(rb_eArgError, "Invalid union mode"); + break; + } + auto result_return = result_; + index_ = index_keep; + result_ = result_keep; + return result_return; + } + +#define VISIT(TYPE) \ + Status Visit(const arrow::TYPE ## Array& array) override { \ + convert_value(array); \ + return Status::OK(); \ + } + + VISIT(Null) + VISIT(Boolean) + VISIT(Int8) + VISIT(Int16) + VISIT(Int32) + VISIT(Int64) + VISIT(UInt8) + VISIT(UInt16) + VISIT(UInt32) + VISIT(UInt64) + // TODO + // VISIT(HalfFloat) + VISIT(Float) + VISIT(Double) + VISIT(Binary) + VISIT(String) + VISIT(FixedSizeBinary) + VISIT(Date32) + VISIT(Date64) + VISIT(Time32) + VISIT(Time64) + VISIT(Timestamp) + // TODO + // VISIT(Interval) + VISIT(List) + VISIT(Struct) + VISIT(Union) + VISIT(Dictionary) + VISIT(Decimal128) + // TODO + // VISIT(Extension) + +#undef VISIT + private: + template <typename ArrayType> + inline void convert_value(const ArrayType& array) { + auto result = rb_hash_new(); + if (array.IsNull(index_)) { + rb_hash_aset(result, field_name_, Qnil); + } else { + rb_hash_aset(result, + field_name_, + array_value_converter_->convert(array, index_)); + } + result_ = result; + } + + uint8_t compute_child_index(const arrow::UnionArray& array, + arrow::UnionType* type, + const char* tag) { + const auto type_id = array.raw_type_ids()[index_]; + const auto& type_codes = type->type_codes(); + for (uint8_t i = 0; i < type_codes.size(); ++i) { + if (type_codes[i] == type_id) { + return i; + } + } + check_status(Status::Invalid("Unknown type ID: ", type_id), + tag); + return 0; + } + + void convert_sparse(const arrow::UnionArray& array) { + const auto type = + std::static_pointer_cast<arrow::UnionType>(array.type()).get(); + const auto tag = "[raw-records][union-sparse-array]"; + const auto child_index = compute_child_index(array, type, tag); + const auto child_field = type->child(child_index).get(); + const auto& field_name = child_field->name(); + const auto field_name_keep = field_name_; + field_name_ = rb_utf8_str_new(field_name.data(), field_name.length()); + const auto child_array = array.child(child_index).get(); + check_status(child_array->Accept(this), tag); + field_name_ = field_name_keep; + } + + void convert_dense(const arrow::UnionArray& array) { + const auto type = + std::static_pointer_cast<arrow::UnionType>(array.type()).get(); + const auto tag = "[raw-records][union-dense-array]"; + const auto child_index = compute_child_index(array, type, tag); + const auto child_field = type->child(child_index).get(); + const auto& field_name = child_field->name(); + const auto field_name_keep = field_name_; + field_name_ = rb_utf8_str_new(field_name.data(), field_name.length()); + const auto child_array = array.child(child_index); + const auto index_keep = index_; + index_ = array.value_offset(index_); + check_status(child_array->Accept(this), tag); + index_ = index_keep; + field_name_ = field_name_keep; + } + + ArrayValueConverter* array_value_converter_; + int64_t index_; + VALUE field_name_; + VALUE result_; + }; + + class DictionaryArrayValueConverter : public arrow::ArrayVisitor { + public: + explicit DictionaryArrayValueConverter(ArrayValueConverter* converter) + : array_value_converter_(converter), + index_(0), + result_(Qnil) { + } + + VALUE convert(const arrow::DictionaryArray& array, + const int64_t index) { + index_ = index; + auto indices = array.indices().get(); + check_status(indices->Accept(this), + "[raw-records][dictionary-array]"); + return result_; + } + + // TODO: Convert to real value. +#define VISIT(TYPE) \ + Status Visit(const arrow::TYPE ## Array& array) override { \ + result_ = convert_value(array, index_); \ + return Status::OK(); \ + } + + VISIT(Int8) + VISIT(Int16) + VISIT(Int32) + VISIT(Int64) + +#undef VISIT + + private: + template <typename ArrayType> + inline VALUE convert_value(const ArrayType& array, + const int64_t i) { + return array_value_converter_->convert(array, i); + } + + ArrayValueConverter* array_value_converter_; + int64_t index_; + VALUE result_; + }; + + VALUE ArrayValueConverter::convert(const arrow::ListArray& array, + const int64_t i) { + return list_array_value_converter_->convert(array, i); + } + + VALUE ArrayValueConverter::convert(const arrow::StructArray& array, + const int64_t i) { + return struct_array_value_converter_->convert(array, i); + } + + VALUE ArrayValueConverter::convert(const arrow::UnionArray& array, + const int64_t i) { + return union_array_value_converter_->convert(array, i); + } + + VALUE ArrayValueConverter::convert(const arrow::DictionaryArray& array, + const int64_t i) { + return dictionary_array_value_converter_->convert(array, i); + } + + class RawRecordsBuilder : public arrow::ArrayVisitor { + public: + explicit RawRecordsBuilder(VALUE records, int n_columns) + : array_value_converter_(), + list_array_value_converter_(&array_value_converter_), + struct_array_value_converter_(&array_value_converter_), + union_array_value_converter_(&array_value_converter_), + dictionary_array_value_converter_(&array_value_converter_), + records_(records), + n_columns_(n_columns) { + array_value_converter_. + set_sub_value_converters(&list_array_value_converter_, + &struct_array_value_converter_, + &union_array_value_converter_, + &dictionary_array_value_converter_); + } + + void build(const arrow::RecordBatch& record_batch) { + rb::protect([&] { + const auto n_rows = record_batch.num_rows(); + for (int64_t i = 0; i < n_rows; ++i) { + auto record = rb_ary_new_capa(n_columns_); + rb_ary_push(records_, record); + } + for (int i = 0; i < n_columns_; ++i) { + const auto array = record_batch.column(i).get(); + column_index_ = i; + check_status(array->Accept(this), + "[raw-records]"); + } + return Qnil; + }); + } + +#define VISIT(TYPE) \ + Status Visit(const arrow::TYPE ## Array& array) override { \ + convert(array); \ + return Status::OK(); \ + } + + VISIT(Null) + VISIT(Boolean) + VISIT(Int8) + VISIT(Int16) + VISIT(Int32) + VISIT(Int64) + VISIT(UInt8) + VISIT(UInt16) + VISIT(UInt32) + VISIT(UInt64) + // TODO + // VISIT(HalfFloat) + VISIT(Float) + VISIT(Double) + VISIT(Binary) + VISIT(String) + VISIT(FixedSizeBinary) + VISIT(Date32) + VISIT(Date64) + VISIT(Time32) + VISIT(Time64) + VISIT(Timestamp) + // TODO + // VISIT(Interval) + VISIT(List) + VISIT(Struct) + VISIT(Union) + VISIT(Dictionary) + VISIT(Decimal128) + // TODO + // VISIT(Extension) + +#undef VISIT + + private: + template <typename ArrayType> + inline VALUE convert_value(const ArrayType& array, + const int64_t i) { + return array_value_converter_.convert(array, i); + } + + template <typename ArrayType> + void convert(const ArrayType& array) { + const auto n = array.length(); + if (array.null_count() > 0) { + for (int64_t i = 0; i < n; ++i) { + auto value = Qnil; + if (!array.IsNull(i)) { + value = convert_value(array, i); + } + auto record = rb_ary_entry(records_, i); + rb_ary_store(record, column_index_, value); + } + } else { + for (int64_t i = 0; i < n; ++i) { + auto record = rb_ary_entry(records_, i); + rb_ary_store(record, column_index_, convert_value(array, i)); + } + } + } + + ArrayValueConverter array_value_converter_; + ListArrayValueConverter list_array_value_converter_; + StructArrayValueConverter struct_array_value_converter_; + UnionArrayValueConverter union_array_value_converter_; + DictionaryArrayValueConverter dictionary_array_value_converter_; + + // Destination for converted records. + VALUE records_; + + // The current column index. + int column_index_; + + // The number of columns. + const int n_columns_; + }; + } + + VALUE + record_batch_raw_records(VALUE rb_record_batch) { + auto garrow_record_batch = GARROW_RECORD_BATCH(RVAL2GOBJ(rb_record_batch)); + auto record_batch = garrow_record_batch_get_raw(garrow_record_batch).get(); + const auto n_rows = record_batch->num_rows(); + const auto n_columns = record_batch->num_columns(); + auto records = rb_ary_new_capa(n_rows); + + try { + RawRecordsBuilder builder(records, n_columns); + builder.build(*record_batch); + } catch (rb::State& state) { + state.jump(); + } + + return records; + } +} diff --git a/ruby/red-arrow/ext/arrow/red-arrow.hpp b/ruby/red-arrow/ext/arrow/red-arrow.hpp new file mode 100644 index 0000000..5c9b846 --- /dev/null +++ b/ruby/red-arrow/ext/arrow/red-arrow.hpp @@ -0,0 +1,52 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ + +#pragma once + +#include <arrow/api.h> + +#include <arrow-glib/arrow-glib.hpp> +#include <rbgobject.h> + +namespace red_arrow { + extern VALUE cDate; + + extern ID id_BigDecimal; + extern ID id_jd; + extern ID id_to_datetime; + + VALUE record_batch_raw_records(VALUE obj); + + inline VALUE time_unit_to_scale(arrow::TimeUnit::type unit) { + switch (unit) { + case arrow::TimeUnit::SECOND: + return INT2FIX(1); + case arrow::TimeUnit::MILLI: + return INT2FIX(1000); + case arrow::TimeUnit::MICRO: + return INT2FIX(1000 * 1000); + case arrow::TimeUnit::NANO: + // NOTE: INT2FIX works for 1e+9 because: FIXNUM_MAX >= (1<<30) - 1 > 1e+9 + return INT2FIX(1000 * 1000 * 1000); + default: + break; // NOT REACHED + } + return Qnil; + } +} diff --git a/ruby/red-arrow/test/run-test.rb b/ruby/red-arrow/lib/arrow/binary-array-builder.rb old mode 100755 new mode 100644 similarity index 66% copy from ruby/red-arrow/test/run-test.rb copy to ruby/red-arrow/lib/arrow/binary-array-builder.rb index 9551f60..c780374 --- a/ruby/red-arrow/test/run-test.rb +++ b/ruby/red-arrow/lib/arrow/binary-array-builder.rb @@ -1,5 +1,3 @@ -#!/usr/bin/env ruby -# # Licensed to the Apache Software Foundation (ASF) under one # or more contributor license agreements. See the NOTICE file # distributed with this work for additional information @@ -17,21 +15,22 @@ # specific language governing permissions and limitations # under the License. -ENV["TZ"] = "Asia/Tokyo" - -$VERBOSE = true - -require "pathname" - -base_dir = Pathname.new(__dir__).parent.expand_path - -lib_dir = base_dir + "lib" -test_dir = base_dir + "test" - -$LOAD_PATH.unshift(lib_dir.to_s) - -require_relative "helper" - -ENV["TEST_UNIT_MAX_DIFF_TARGET_STRING_SIZE"] ||= "10000" - -exit(Test::Unit::AutoRunner.run(true, test_dir.to_s)) +module Arrow + class BinaryArrayBuilder + def append_values(values, is_valids=nil) + if is_valids + is_valids.each_with_index do |is_valid, i| + if is_valid + append_value(values[i]) + else + append_null + end + end + else + values.each do |value| + append_value(value) + end + end + end + end +end diff --git a/ruby/red-arrow/lib/arrow/data-type.rb b/ruby/red-arrow/lib/arrow/data-type.rb index 03960e4..5b1c873 100644 --- a/ruby/red-arrow/lib/arrow/data-type.rb +++ b/ruby/red-arrow/lib/arrow/data-type.rb @@ -114,14 +114,18 @@ module Arrow private def resolve_class(data_type) - data_type_name = data_type.to_s.capitalize.gsub(/\AUint/, "UInt") + components = data_type.to_s.split("_").collect(&:capitalize) + data_type_name = components.join.gsub(/\AUint/, "UInt") data_type_class_name = "#{data_type_name}DataType" unless Arrow.const_defined?(data_type_class_name) available_types = [] Arrow.constants.each do |name| - if name.to_s.end_with?("DataType") - available_types << name.to_s.gsub(/DataType\z/, "").downcase.to_sym - end + name = name.to_s + next if name == "DataType" + next unless name.end_with?("DataType") + name = name.gsub(/DataType\z/, "") + components = name.scan(/(UInt[0-9]+|[A-Z][a-z\d]+)/).flatten + available_types << components.collect(&:downcase).join("_").to_sym end message = "unknown type: #{data_type.inspect}: " + diff --git a/ruby/red-arrow/lib/arrow/list-array-builder.rb b/ruby/red-arrow/lib/arrow/list-array-builder.rb index 1fa507f..d889c8a 100644 --- a/ruby/red-arrow/lib/arrow/list-array-builder.rb +++ b/ruby/red-arrow/lib/arrow/list-array-builder.rb @@ -56,7 +56,7 @@ module Arrow when ::Array append_value_raw @value_builder ||= value_builder - @value_builder.append_values(value, nil) + @value_builder.append(*value) else message = "list value must be nil or Array: #{value.inspect}" raise ArgumentError, message diff --git a/ruby/red-arrow/lib/arrow/loader.rb b/ruby/red-arrow/lib/arrow/loader.rb index 6e0bf29..280229b 100644 --- a/ruby/red-arrow/lib/arrow/loader.rb +++ b/ruby/red-arrow/lib/arrow/loader.rb @@ -28,11 +28,13 @@ module Arrow private def post_load(repository, namespace) require_libraries + require_extension_library end def require_libraries require "arrow/array" require "arrow/array-builder" + require "arrow/binary-array-builder" require "arrow/chunked-array" require "arrow/column" require "arrow/compression-type" @@ -79,6 +81,10 @@ module Arrow require "arrow/writable" end + def require_extension_library + require "arrow.so" + end + def load_object_info(info) super diff --git a/ruby/red-arrow/lib/arrow/struct-array-builder.rb b/ruby/red-arrow/lib/arrow/struct-array-builder.rb index b56056c..0ed37ec 100644 --- a/ruby/red-arrow/lib/arrow/struct-array-builder.rb +++ b/ruby/red-arrow/lib/arrow/struct-array-builder.rb @@ -71,17 +71,17 @@ module Arrow when ::Array append_value_raw value.each_with_index do |sub_value, i| - self[i].append_value(sub_value) + self[i].append(sub_value) end when Arrow::Struct append_value_raw value.values.each_with_index do |sub_value, i| - self[i].append_value(sub_value) + self[i].append(sub_value) end when Hash append_value_raw value.each do |name, sub_value| - self[name].append_value(sub_value) + self[name].append(sub_value) end else message = diff --git a/ruby/red-arrow/red-arrow.gemspec b/ruby/red-arrow/red-arrow.gemspec index 9451c9c..7c6320e 100644 --- a/ruby/red-arrow/red-arrow.gemspec +++ b/ruby/red-arrow/red-arrow.gemspec @@ -39,17 +39,21 @@ Gem::Specification.new do |spec| spec.license = "Apache-2.0" spec.files = ["README.md", "Rakefile", "Gemfile", "#{spec.name}.gemspec"] spec.files += ["LICENSE.txt", "NOTICE.txt"] + spec.files += Dir.glob("ext/**/*.{cpp,hpp,rb}") spec.files += Dir.glob("lib/**/*.rb") spec.files += Dir.glob("image/*.*") spec.files += Dir.glob("doc/text/*") spec.test_files += Dir.glob("test/**/*") - spec.extensions = ["dependency-check/Rakefile"] + spec.extensions = ["ext/arrow/extconf.rb"] + spec.add_runtime_dependency("extpp") spec.add_runtime_dependency("gobject-introspection", ">= 3.3.5") - spec.add_runtime_dependency("pkg-config") spec.add_runtime_dependency("native-package-installer") + spec.add_runtime_dependency("pkg-config") + spec.add_development_dependency("benchmark-driver") spec.add_development_dependency("bundler") + spec.add_development_dependency("faker") spec.add_development_dependency("rake") spec.add_development_dependency("redcarpet") spec.add_development_dependency("test-unit") diff --git a/ruby/red-arrow/test/raw-records/record-batch/test-basic-arrays.rb b/ruby/red-arrow/test/raw-records/record-batch/test-basic-arrays.rb new file mode 100644 index 0000000..eee2699 --- /dev/null +++ b/ruby/red-arrow/test/raw-records/record-batch/test-basic-arrays.rb @@ -0,0 +1,349 @@ +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. + +class RawRecordsRecordBatchBasicArraysTest < Test::Unit::TestCase + test("NullArray") do + records = [ + [nil], + [nil], + [nil], + [nil], + ] + array = Arrow::NullArray.new(records.size) + schema = Arrow::Schema.new(column: :null) + record_batch = Arrow::RecordBatch.new(schema, + records.size, + [array]) + assert_equal(records, record_batch.raw_records) + end + + test("BooleanArray") do + records = [ + [true], + [nil], + [false], + ] + record_batch = Arrow::RecordBatch.new({column: :boolean}, + records) + assert_equal(records, record_batch.raw_records) + end + + test("Int8Array") do + records = [ + [-(2 ** 7)], + [nil], + [(2 ** 7) - 1], + ] + record_batch = Arrow::RecordBatch.new({column: :int8}, + records) + assert_equal(records, record_batch.raw_records) + end + + test("UInt8Array") do + records = [ + [0], + [nil], + [(2 ** 8) - 1], + ] + record_batch = Arrow::RecordBatch.new({column: :uint8}, + records) + assert_equal(records, record_batch.raw_records) + end + + test("Int16Array") do + records = [ + [-(2 ** 15)], + [nil], + [(2 ** 15) - 1], + ] + record_batch = Arrow::RecordBatch.new({column: :int16}, + records) + assert_equal(records, record_batch.raw_records) + end + + test("UInt16Array") do + records = [ + [0], + [nil], + [(2 ** 16) - 1], + ] + record_batch = Arrow::RecordBatch.new({column: :uint16}, + records) + assert_equal(records, record_batch.raw_records) + end + + test("Int32Array") do + records = [ + [-(2 ** 31)], + [nil], + [(2 ** 31) - 1], + ] + record_batch = Arrow::RecordBatch.new({column: :int32}, + records) + assert_equal(records, record_batch.raw_records) + end + + test("UInt32Array") do + records = [ + [0], + [nil], + [(2 ** 32) - 1], + ] + record_batch = Arrow::RecordBatch.new({column: :uint32}, + records) + assert_equal(records, record_batch.raw_records) + end + + test("Int64Array") do + records = [ + [-(2 ** 63)], + [nil], + [(2 ** 63) - 1], + ] + record_batch = Arrow::RecordBatch.new({column: :int64}, + records) + assert_equal(records, record_batch.raw_records) + end + + test("UInt64Array") do + records = [ + [0], + [nil], + [(2 ** 64) - 1], + ] + record_batch = Arrow::RecordBatch.new({column: :uint64}, + records) + assert_equal(records, record_batch.raw_records) + end + + test("FloatArray") do + records = [ + [-1.0], + [nil], + [1.0], + ] + record_batch = Arrow::RecordBatch.new({column: :float}, + records) + assert_equal(records, record_batch.raw_records) + end + + test("DoubleArray") do + records = [ + [-1.0], + [nil], + [1.0], + ] + record_batch = Arrow::RecordBatch.new({column: :double}, + records) + assert_equal(records, record_batch.raw_records) + end + + test("BinaryArray") do + records = [ + ["\x00".b], + [nil], + ["\xff".b], + ] + record_batch = Arrow::RecordBatch.new({column: :binary}, + records) + assert_equal(records, record_batch.raw_records) + end + + test("StringArray") do + records = [ + ["Ruby"], + [nil], + ["\u3042"], # U+3042 HIRAGANA LETTER A + ] + record_batch = Arrow::RecordBatch.new({column: :string}, + records) + assert_equal(records, record_batch.raw_records) + end + + test("Date32Array") do + records = [ + [Date.new(1960, 1, 1)], + [nil], + [Date.new(2017, 8, 23)], + ] + record_batch = Arrow::RecordBatch.new({column: :date32}, + records) + assert_equal(records, record_batch.raw_records) + end + + test("Date64Array") do + records = [ + [DateTime.new(1960, 1, 1, 2, 9, 30)], + [nil], + [DateTime.new(2017, 8, 23, 14, 57, 2)], + ] + record_batch = Arrow::RecordBatch.new({column: :date64}, + records) + assert_equal(records, record_batch.raw_records) + end + + sub_test_case("TimestampArray") do + test("second") do + records = [ + [Time.parse("1960-01-01T02:09:30Z")], + [nil], + [Time.parse("2017-08-23T14:57:02Z")], + ] + record_batch = Arrow::RecordBatch.new({ + column: { + type: :timestamp, + unit: :second, + } + }, + records) + assert_equal(records, record_batch.raw_records) + end + + test("milli") do + records = [ + [Time.parse("1960-01-01T02:09:30.123Z")], + [nil], + [Time.parse("2017-08-23T14:57:02.987Z")], + ] + record_batch = Arrow::RecordBatch.new({ + column: { + type: :timestamp, + unit: :milli, + } + }, + records) + assert_equal(records, record_batch.raw_records) + end + + test("micro") do + records = [ + [Time.parse("1960-01-01T02:09:30.123456Z")], + [nil], + [Time.parse("2017-08-23T14:57:02.987654Z")], + ] + record_batch = Arrow::RecordBatch.new({ + column: { + type: :timestamp, + unit: :micro, + } + }, + records) + assert_equal(records, record_batch.raw_records) + end + + test("nano") do + records = [ + [Time.parse("1960-01-01T02:09:30.123456789Z")], + [nil], + [Time.parse("2017-08-23T14:57:02.987654321Z")], + ] + record_batch = Arrow::RecordBatch.new({ + column: { + type: :timestamp, + unit: :nano, + } + }, + records) + assert_equal(records, record_batch.raw_records) + end + end + + sub_test_case("Time32Array") do + test("second") do + records = [ + [60 * 10], # 00:10:00 + [nil], + [60 * 60 * 2 + 9], # 02:00:09 + ] + record_batch = Arrow::RecordBatch.new({ + column: { + type: :time32, + unit: :second, + } + }, + records) + assert_equal(records, record_batch.raw_records) + end + + test("milli") do + records = [ + [(60 * 10) * 1000 + 123], # 00:10:00.123 + [nil], + [(60 * 60 * 2 + 9) * 1000 + 987], # 02:00:09.987 + ] + record_batch = Arrow::RecordBatch.new({ + column: { + type: :time32, + unit: :milli, + } + }, + records) + assert_equal(records, record_batch.raw_records) + end + end + + sub_test_case("Time64Array") do + test("micro") do + records = [ + [(60 * 10) * 1_000_000 + 123_456], # 00:10:00.123456 + [nil], + [(60 * 60 * 2 + 9) * 1_000_000 + 987_654], # 02:00:09.987654 + ] + record_batch = Arrow::RecordBatch.new({ + column: { + type: :time64, + unit: :micro, + } + }, + records) + assert_equal(records, record_batch.raw_records) + end + + test("nano") do + records = [ + [(60 * 10) * 1_000_000_000 + 123_456_789], # 00:10:00.123456789 + [nil], + [(60 * 60 * 2 + 9) * 1_000_000_000 + 987_654_321], # 02:00:09.987654321 + ] + record_batch = Arrow::RecordBatch.new({ + column: { + type: :time64, + unit: :nano, + } + }, + records) + assert_equal(records, record_batch.raw_records) + end + end + + test("Decimal128Array") do + records = [ + [BigDecimal("92.92")], + [nil], + [BigDecimal("29.29")], + ] + record_batch = Arrow::RecordBatch.new({ + column: { + type: :decimal128, + precision: 8, + scale: 2, + } + }, + records) + assert_equal(records, record_batch.raw_records) + end +end diff --git a/ruby/red-arrow/test/raw-records/record-batch/test-dense-union-array.rb b/ruby/red-arrow/test/raw-records/record-batch/test-dense-union-array.rb new file mode 100644 index 0000000..8fdf02e --- /dev/null +++ b/ruby/red-arrow/test/raw-records/record-batch/test-dense-union-array.rb @@ -0,0 +1,487 @@ +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. + +class RawRecordsRecordBatchDenseUnionArrayTest < Test::Unit::TestCase + def fields(type, type_codes) + field_description = {} + if type.is_a?(Hash) + field_description = field_description.merge(type) + else + field_description[:type] = type + end + { + column: { + type: :dense_union, + fields: [ + field_description.merge(name: "0"), + field_description.merge(name: "1"), + ], + type_codes: type_codes, + }, + } + end + + # TODO: Use Arrow::RecordBatch.new(fields(type), records) + def build_record_batch(type, records) + type_codes = [0, 1] + schema = Arrow::Schema.new(fields(type, type_codes)) + type_ids = [] + offsets = [] + arrays = schema.fields[0].data_type.fields.collect do |field| + sub_schema = Arrow::Schema.new([field]) + sub_records = [] + records.each do |record| + column = record[0] + next if column.nil? + next unless column.key?(field.name) + sub_records << [column[field.name]] + end + sub_record_batch = Arrow::RecordBatch.new(sub_schema, + sub_records) + sub_record_batch.columns[0] + end + records.each do |record| + column = record[0] + if column.nil? + type_ids << nil + offsets << 0 + elsif column.key?("0") + type_id = type_codes[0] + type_ids << type_id + offsets << (type_ids.count(type_id) - 1) + elsif column.key?("1") + type_id = type_codes[1] + type_ids << type_id + offsets << (type_ids.count(type_id) - 1) + end + end + # TODO + # union_array = Arrow::DenseUnionArray.new(schema.fields[0].data_type, + # Arrow::Int8Array.new(type_ids), + # Arrow::Int32Array.new(offsets), + # arrays) + union_array = Arrow::DenseUnionArray.new(Arrow::Int8Array.new(type_ids), + Arrow::Int32Array.new(offsets), + arrays) + schema = Arrow::Schema.new(column: union_array.value_data_type) + Arrow::RecordBatch.new(schema, + records.size, + [union_array]) + end + + test("NullArray") do + omit("Need to add support for NullArrayBuilder") + records = [ + [{"0" => nil}], + [nil], + ] + record_batch = build_record_batch(:null, records) + assert_equal(records, record_batch.raw_records) + end + + test("BooleanArray") do + records = [ + [{"0" => true}], + [nil], + [{"1" => nil}], + ] + record_batch = build_record_batch(:boolean, records) + assert_equal(records, record_batch.raw_records) + end + + test("Int8Array") do + records = [ + [{"0" => -(2 ** 7)}], + [nil], + [{"1" => nil}], + ] + record_batch = build_record_batch(:int8, records) + assert_equal(records, record_batch.raw_records) + end + + test("UInt8Array") do + records = [ + [{"0" => (2 ** 8) - 1}], + [nil], + [{"1" => nil}], + ] + record_batch = build_record_batch(:uint8, records) + assert_equal(records, record_batch.raw_records) + end + + test("Int16Array") do + records = [ + [{"0" => -(2 ** 15)}], + [nil], + [{"1" => nil}], + ] + record_batch = build_record_batch(:int16, records) + assert_equal(records, record_batch.raw_records) + end + + test("UInt16Array") do + records = [ + [{"0" => (2 ** 16) - 1}], + [nil], + [{"1" => nil}], + ] + record_batch = build_record_batch(:uint16, records) + assert_equal(records, record_batch.raw_records) + end + + test("Int32Array") do + records = [ + [{"0" => -(2 ** 31)}], + [nil], + [{"1" => nil}], + ] + record_batch = build_record_batch(:int32, records) + assert_equal(records, record_batch.raw_records) + end + + test("UInt32Array") do + records = [ + [{"0" => (2 ** 32) - 1}], + [nil], + [{"1" => nil}], + ] + record_batch = build_record_batch(:uint32, records) + assert_equal(records, record_batch.raw_records) + end + + test("Int64Array") do + records = [ + [{"0" => -(2 ** 63)}], + [nil], + [{"1" => nil}], + ] + record_batch = build_record_batch(:int64, records) + assert_equal(records, record_batch.raw_records) + end + + test("UInt64Array") do + records = [ + [{"0" => (2 ** 64) - 1}], + [nil], + [{"1" => nil}], + ] + record_batch = build_record_batch(:uint64, records) + assert_equal(records, record_batch.raw_records) + end + + test("FloatArray") do + records = [ + [{"0" => -1.0}], + [nil], + [{"1" => nil}], + ] + record_batch = build_record_batch(:float, records) + assert_equal(records, record_batch.raw_records) + end + + test("DoubleArray") do + records = [ + [{"0" => -1.0}], + [nil], + [{"1" => nil}], + ] + record_batch = build_record_batch(:double, records) + assert_equal(records, record_batch.raw_records) + end + + test("BinaryArray") do + records = [ + [{"0" => "\xff".b}], + [nil], + [{"1" => nil}], + ] + record_batch = build_record_batch(:binary, records) + assert_equal(records, record_batch.raw_records) + end + + test("StringArray") do + records = [ + [{"0" => "Ruby"}], + [nil], + [{"1" => nil}], + ] + record_batch = build_record_batch(:string, records) + assert_equal(records, record_batch.raw_records) + end + + test("Date32Array") do + records = [ + [{"0" => Date.new(1960, 1, 1)}], + [nil], + [{"1" => nil}], + ] + record_batch = build_record_batch(:date32, records) + assert_equal(records, record_batch.raw_records) + end + + test("Date64Array") do + records = [ + [{"0" => DateTime.new(1960, 1, 1, 2, 9, 30)}], + [nil], + [{"1" => nil}], + ] + record_batch = build_record_batch(:date64, records) + assert_equal(records, record_batch.raw_records) + end + + sub_test_case("TimestampArray") do + test("second") do + records = [ + [{"0" => Time.parse("1960-01-01T02:09:30Z")}], + [nil], + [{"1" => nil}], + ] + record_batch = build_record_batch({ + type: :timestamp, + unit: :second, + }, + records) + assert_equal(records, record_batch.raw_records) + end + + test("milli") do + records = [ + [{"0" => Time.parse("1960-01-01T02:09:30.123Z")}], + [nil], + [{"1" => nil}], + ] + record_batch = build_record_batch({ + type: :timestamp, + unit: :milli, + }, + records) + assert_equal(records, record_batch.raw_records) + end + + test("micro") do + records = [ + [{"0" => Time.parse("1960-01-01T02:09:30.123456Z")}], + [nil], + [{"1" => nil}], + ] + record_batch = build_record_batch({ + type: :timestamp, + unit: :micro, + }, + records) + assert_equal(records, record_batch.raw_records) + end + + test("nano") do + records = [ + [{"0" => Time.parse("1960-01-01T02:09:30.123456789Z")}], + [nil], + [{"1" => nil}], + ] + record_batch = build_record_batch({ + type: :timestamp, + unit: :nano, + }, + records) + assert_equal(records, record_batch.raw_records) + end + end + + sub_test_case("Time32Array") do + test("second") do + records = [ + [{"0" => 60 * 10}], # 00:10:00 + [nil], + [{"1" => nil}], + ] + record_batch = build_record_batch({ + type: :time32, + unit: :second, + }, + records) + assert_equal(records, record_batch.raw_records) + end + + test("milli") do + records = [ + [{"0" => (60 * 10) * 1000 + 123}], # 00:10:00.123 + [nil], + [{"1" => nil}], + ] + record_batch = build_record_batch({ + type: :time32, + unit: :milli, + }, + records) + assert_equal(records, record_batch.raw_records) + end + end + + sub_test_case("Time64Array") do + test("micro") do + records = [ + [{"0" => (60 * 10) * 1_000_000 + 123_456}], # 00:10:00.123456 + [nil], + [{"1" => nil}], + ] + record_batch = build_record_batch({ + type: :time64, + unit: :micro, + }, + records) + assert_equal(records, record_batch.raw_records) + end + + test("nano") do + records = [ + # 00:10:00.123456789 + [{"0" => (60 * 10) * 1_000_000_000 + 123_456_789}], + [nil], + [{"1" => nil}], + ] + record_batch = build_record_batch({ + type: :time64, + unit: :nano, + }, + records) + assert_equal(records, record_batch.raw_records) + end + end + + test("Decimal128Array") do + records = [ + [{"0" => BigDecimal("92.92")}], + [nil], + [{"1" => nil}], + ] + record_batch = build_record_batch({ + type: :decimal128, + precision: 8, + scale: 2, + }, + records) + assert_equal(records, record_batch.raw_records) + end + + test("ListArray") do + records = [ + [{"0" => [true, nil, false]}], + [nil], + [{"1" => nil}], + ] + record_batch = build_record_batch({ + type: :list, + field: { + name: :sub_element, + type: :boolean, + }, + }, + records) + assert_equal(records, record_batch.raw_records) + end + + test("StructArray") do + records = [ + [{"0" => {"sub_field" => true}}], + [nil], + [{"1" => nil}], + [{"0" => {"sub_field" => nil}}], + ] + record_batch = build_record_batch({ + type: :struct, + fields: [ + { + name: :sub_field, + type: :boolean, + }, + ], + }, + records) + assert_equal(records, record_batch.raw_records) + end + + test("SparseUnionArray") do + omit("Need to add support for SparseUnionArrayBuilder") + records = [ + [{"0" => {"field1" => true}}], + [nil], + [{"1" => nil}], + [{"0" => {"field2" => nil}}], + ] + record_batch = build_record_batch({ + type: :sparse_union, + fields: [ + { + name: :field1, + type: :boolean, + }, + { + name: :field2, + type: :uint8, + }, + ], + type_codes: [0, 1], + }, + records) + assert_equal(records, record_batch.raw_records) + end + + test("DenseUnionArray") do + omit("Need to add support for DenseUnionArrayBuilder") + records = [ + [{"0" => {"field1" => true}}], + [nil], + [{"1" => nil}], + [{"0" => {"field2" => nil}}], + ] + record_batch = build_record_batch({ + type: :dense_union, + fields: [ + { + name: :field1, + type: :boolean, + }, + { + name: :field2, + type: :uint8, + }, + ], + type_codes: [0, 1], + }, + records) + assert_equal(records, record_batch.raw_records) + end + + test("DictionaryArray") do + omit("Need to add support for DictionaryArrayBuilder") + records = [ + [{"0" => "Ruby"}], + [nil], + [{"1" => nil}], + [{"0" => "GLib"}], + ] + dictionary = Arrow::StringArray.new(["GLib", "Ruby"]) + record_batch = build_record_batch({ + type: :dictionary, + index_data_type: :int8, + dictionary: dictionary, + ordered: true, + }, + records) + assert_equal(records, record_batch.raw_records) + end +end diff --git a/ruby/red-arrow/test/raw-records/record-batch/test-list-array.rb b/ruby/red-arrow/test/raw-records/record-batch/test-list-array.rb new file mode 100644 index 0000000..bf1af36 --- /dev/null +++ b/ruby/red-arrow/test/raw-records/record-batch/test-list-array.rb @@ -0,0 +1,499 @@ +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. + +class RawRecordsRecordBatchListArrayTest < Test::Unit::TestCase + def fields(type) + field_description = { + name: :element, + } + if type.is_a?(Hash) + field_description = field_description.merge(type) + else + field_description[:type] = type + end + { + column: { + type: :list, + field: field_description, + }, + } + end + + test("NullArray") do + omit("Need to add support for NullArrayBuilder") + records = [ + [[nil, nil, nil]], + [nil], + ] + record_batch = Arrow::RecordBatch.new(fields(:null), + records) + assert_equal(records, record_batch.raw_records) + end + + test("BooleanArray") do + records = [ + [[true, nil, false]], + [nil], + ] + record_batch = Arrow::RecordBatch.new(fields(:boolean), + records) + assert_equal(records, record_batch.raw_records) + end + + test("Int8Array") do + records = [ + [[-(2 ** 7), nil, (2 ** 7) - 1]], + [nil], + ] + record_batch = Arrow::RecordBatch.new(fields(:int8), + records) + assert_equal(records, record_batch.raw_records) + end + + test("UInt8Array") do + records = [ + [[0, nil, (2 ** 8) - 1]], + [nil], + ] + record_batch = Arrow::RecordBatch.new(fields(:uint8), + records) + assert_equal(records, record_batch.raw_records) + end + + test("Int16Array") do + records = [ + [[-(2 ** 15), nil, (2 ** 15) - 1]], + [nil], + ] + record_batch = Arrow::RecordBatch.new(fields(:int16), + records) + assert_equal(records, record_batch.raw_records) + end + + test("UInt16Array") do + records = [ + [[0, nil, (2 ** 16) - 1]], + [nil], + ] + record_batch = Arrow::RecordBatch.new(fields(:uint16), + records) + assert_equal(records, record_batch.raw_records) + end + + test("Int32Array") do + records = [ + [[-(2 ** 31), nil, (2 ** 31) - 1]], + [nil], + ] + record_batch = Arrow::RecordBatch.new(fields(:int32), + records) + assert_equal(records, record_batch.raw_records) + end + + test("UInt32Array") do + records = [ + [[0, nil, (2 ** 32) - 1]], + [nil], + ] + record_batch = Arrow::RecordBatch.new(fields(:uint32), + records) + assert_equal(records, record_batch.raw_records) + end + + test("Int64Array") do + records = [ + [[-(2 ** 63), nil, (2 ** 63) - 1]], + [nil], + ] + record_batch = Arrow::RecordBatch.new(fields(:int64), + records) + assert_equal(records, record_batch.raw_records) + end + + test("UInt64Array") do + records = [ + [[0, nil, (2 ** 64) - 1]], + [nil], + ] + record_batch = Arrow::RecordBatch.new(fields(:uint64), + records) + assert_equal(records, record_batch.raw_records) + end + + test("FloatArray") do + records = [ + [[-1.0, nil, 1.0]], + [nil], + ] + record_batch = Arrow::RecordBatch.new(fields(:float), + records) + assert_equal(records, record_batch.raw_records) + end + + test("DoubleArray") do + records = [ + [[-1.0, nil, 1.0]], + [nil], + ] + record_batch = Arrow::RecordBatch.new(fields(:double), + records) + assert_equal(records, record_batch.raw_records) + end + + test("BinaryArray") do + records = [ + [["\x00".b, nil, "\xff".b]], + [nil], + ] + record_batch = Arrow::RecordBatch.new(fields(:binary), + records) + assert_equal(records, record_batch.raw_records) + end + + test("StringArray") do + records = [ + [ + [ + "Ruby", + nil, + "\u3042", # U+3042 HIRAGANA LETTER A + ], + ], + [nil], + ] + record_batch = Arrow::RecordBatch.new(fields(:string), + records) + assert_equal(records, record_batch.raw_records) + end + + test("Date32Array") do + records = [ + [ + [ + Date.new(1960, 1, 1), + nil, + Date.new(2017, 8, 23), + ], + ], + [nil], + ] + record_batch = Arrow::RecordBatch.new(fields(:date32), + records) + assert_equal(records, record_batch.raw_records) + end + + test("Date64Array") do + records = [ + [ + [ + DateTime.new(1960, 1, 1, 2, 9, 30), + nil, + DateTime.new(2017, 8, 23, 14, 57, 2), + ], + ], + [nil], + ] + record_batch = Arrow::RecordBatch.new(fields(:date64), + records) + assert_equal(records, record_batch.raw_records) + end + + sub_test_case("TimestampArray") do + test("second") do + records = [ + [ + [ + Time.parse("1960-01-01T02:09:30Z"), + nil, + Time.parse("2017-08-23T14:57:02Z"), + ], + ], + [nil], + ] + record_batch = Arrow::RecordBatch.new(fields(type: :timestamp, + unit: :second), + records) + assert_equal(records, record_batch.raw_records) + end + + test("milli") do + records = [ + [ + [ + Time.parse("1960-01-01T02:09:30.123Z"), + nil, + Time.parse("2017-08-23T14:57:02.987Z"), + ], + ], + [nil], + ] + record_batch = Arrow::RecordBatch.new(fields(type: :timestamp, + unit: :milli), + records) + assert_equal(records, record_batch.raw_records) + end + + test("micro") do + records = [ + [ + [ + Time.parse("1960-01-01T02:09:30.123456Z"), + nil, + Time.parse("2017-08-23T14:57:02.987654Z"), + ], + ], + [nil], + ] + record_batch = Arrow::RecordBatch.new(fields(type: :timestamp, + unit: :micro), + records) + assert_equal(records, record_batch.raw_records) + end + + test("nano") do + records = [ + [ + [ + Time.parse("1960-01-01T02:09:30.123456789Z"), + nil, + Time.parse("2017-08-23T14:57:02.987654321Z"), + ], + ], + [nil], + ] + record_batch = Arrow::RecordBatch.new(fields(type: :timestamp, + unit: :nano), + records) + assert_equal(records, record_batch.raw_records) + end + end + + sub_test_case("Time32Array") do + test("second") do + records = [ + [ + [ + 60 * 10, # 00:10:00 + nil, + 60 * 60 * 2 + 9, # 02:00:09 + ], + ], + [nil], + ] + record_batch = Arrow::RecordBatch.new(fields(type: :time32, + unit: :second), + records) + assert_equal(records, record_batch.raw_records) + end + + test("milli") do + records = [ + [ + [ + (60 * 10) * 1000 + 123, # 00:10:00.123 + nil, + (60 * 60 * 2 + 9) * 1000 + 987, # 02:00:09.987 + ], + ], + [nil], + ] + record_batch = Arrow::RecordBatch.new(fields(type: :time32, + unit: :milli), + records) + assert_equal(records, record_batch.raw_records) + end + end + + sub_test_case("Time64Array") do + test("micro") do + records = [ + [ + [ + (60 * 10) * 1_000_000 + 123_456, # 00:10:00.123456 + nil, + (60 * 60 * 2 + 9) * 1_000_000 + 987_654, # 02:00:09.987654 + ], + ], + [nil], + ] + record_batch = Arrow::RecordBatch.new(fields(type: :time64, + unit: :micro), + records) + assert_equal(records, record_batch.raw_records) + end + + test("nano") do + records = [ + [ + [ + (60 * 10) * 1_000_000_000 + 123_456_789, # 00:10:00.123456789 + nil, + (60 * 60 * 2 + 9) * 1_000_000_000 + 987_654_321, # 02:00:09.987654321 + ], + ], + [nil], + ] + record_batch = Arrow::RecordBatch.new(fields(type: :time64, + unit: :nano), + records) + assert_equal(records, record_batch.raw_records) + end + end + + test("Decimal128Array") do + records = [ + [ + [ + BigDecimal("92.92"), + nil, + BigDecimal("29.29"), + ], + ], + [nil], + ] + record_batch = Arrow::RecordBatch.new(fields(type: :decimal128, + precision: 8, + scale: 2), + records) + assert_equal(records, record_batch.raw_records) + end + + test("ListArray") do + records = [ + [ + [ + [ + true, + nil, + ], + nil, + [ + nil, + false, + ], + ], + ], + [nil], + ] + record_batch = Arrow::RecordBatch.new(fields(type: :list, + field: { + name: :sub_element, + type: :boolean, + }), + records) + assert_equal(records, record_batch.raw_records) + end + + test("StructArray") do + records = [ + [ + [ + {"field" => true}, + nil, + {"field" => nil}, + ], + ], + [nil], + ] + record_batch = Arrow::RecordBatch.new(fields(type: :struct, + fields: [ + { + name: :field, + type: :boolean, + }, + ]), + records) + assert_equal(records, record_batch.raw_records) + end + + test("SparseUnionArray") do + omit("Need to add support for SparseUnionArrayBuilder") + records = [ + [ + [ + {"field1" => true}, + nil, + {"field2" => nil}, + ], + ], + [nil], + ] + record_batch = Arrow::RecordBatch.new(fields(type: :sparse_union, + fields: [ + { + name: :field1, + type: :boolean, + }, + { + name: :field2, + type: :uint8, + }, + ], + type_codes: [0, 1]), + records) + assert_equal(records, record_batch.raw_records) + end + + test("DenseUnionArray") do + omit("Need to add support for DenseUnionArrayBuilder") + records = [ + [ + [ + {"field1" => true}, + nil, + {"field2" => nil}, + ], + ], + [nil], + ] + record_batch = Arrow::RecordBatch.new(fields(type: :dense_union, + fields: [ + { + name: :field1, + type: :boolean, + }, + { + name: :field2, + type: :uint8, + }, + ], + type_codes: [0, 1]), + records) + assert_equal(records, record_batch.raw_records) + end + + test("DictionaryArray") do + omit("Need to add support for DictionaryArrayBuilder") + records = [ + [ + [ + "Ruby", + nil, + "GLib", + ], + ], + [nil], + ] + dictionary = Arrow::StringArray.new(["GLib", "Ruby"]) + record_batch = Arrow::RecordBatch.new(fields(type: :dictionary, + index_data_type: :int8, + dictionary: dictionary, + ordered: true), + records) + assert_equal(records, record_batch.raw_records) + end +end diff --git a/ruby/red-arrow/test/raw-records/record-batch/test-multiple-columns.rb b/ruby/red-arrow/test/raw-records/record-batch/test-multiple-columns.rb new file mode 100644 index 0000000..c0e3631 --- /dev/null +++ b/ruby/red-arrow/test/raw-records/record-batch/test-multiple-columns.rb @@ -0,0 +1,49 @@ +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. + +class RawRecordsRecordBatchMultipleColumnsTest < Test::Unit::TestCase + test("3 elements") do + records = [ + [true, nil, "Ruby"], + [nil, 0, "GLib"], + [false, 2 ** 8 - 1, nil], + ] + record_batch = Arrow::RecordBatch.new([ + {name: :column0, type: :boolean}, + {name: :column1, type: :uint8}, + {name: :column2, type: :string}, + ], + records) + assert_equal(records, record_batch.raw_records) + end + + test("4 elements") do + records = [ + [true, nil, "Ruby", -(2 ** 63)], + [nil, 0, "GLib", nil], + [false, 2 ** 8 - 1, nil, (2 ** 63) - 1], + ] + record_batch = Arrow::RecordBatch.new([ + {name: :column0, type: :boolean}, + {name: :column1, type: :uint8}, + {name: :column2, type: :string}, + {name: :column3, type: :int64}, + ], + records) + assert_equal(records, record_batch.raw_records) + end +end diff --git a/ruby/red-arrow/test/raw-records/record-batch/test-sparse-union-array.rb b/ruby/red-arrow/test/raw-records/record-batch/test-sparse-union-array.rb new file mode 100644 index 0000000..3a6191d --- /dev/null +++ b/ruby/red-arrow/test/raw-records/record-batch/test-sparse-union-array.rb @@ -0,0 +1,475 @@ +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. + +class RawRecordsRecordBatchSparseUnionArrayTest < Test::Unit::TestCase + def fields(type, type_codes) + field_description = {} + if type.is_a?(Hash) + field_description = field_description.merge(type) + else + field_description[:type] = type + end + { + column: { + type: :sparse_union, + fields: [ + field_description.merge(name: "0"), + field_description.merge(name: "1"), + ], + type_codes: type_codes, + }, + } + end + + # TODO: Use Arrow::RecordBatch.new(fields(type), records) + def build_record_batch(type, records) + type_codes = [0, 1] + schema = Arrow::Schema.new(fields(type, type_codes)) + type_ids = [] + arrays = schema.fields[0].data_type.fields.collect do |field| + sub_schema = Arrow::Schema.new([field]) + sub_records = records.collect do |record| + [record[0].nil? ? nil : record[0][field.name]] + end + sub_record_batch = Arrow::RecordBatch.new(sub_schema, + sub_records) + sub_record_batch.columns[0] + end + records.each do |record| + column = record[0] + if column.nil? + type_ids << nil + elsif column.key?("0") + type_ids << type_codes[0] + elsif column.key?("1") + type_ids << type_codes[1] + end + end + # TODO + # union_array = Arrow::SparseUnionArray.new(schema.fields[0].data_type, + # Arrow::Int8Array.new(type_ids), + # arrays) + union_array = Arrow::SparseUnionArray.new(Arrow::Int8Array.new(type_ids), + arrays) + schema = Arrow::Schema.new(column: union_array.value_data_type) + Arrow::RecordBatch.new(schema, + records.size, + [union_array]) + end + + test("NullArray") do + omit("Need to add support for NullArrayBuilder") + records = [ + [{"0" => nil}], + [nil], + ] + record_batch = build_record_batch(:null, records) + assert_equal(records, record_batch.raw_records) + end + + test("BooleanArray") do + records = [ + [{"0" => true}], + [nil], + [{"1" => nil}], + ] + record_batch = build_record_batch(:boolean, records) + assert_equal(records, record_batch.raw_records) + end + + test("Int8Array") do + records = [ + [{"0" => -(2 ** 7)}], + [nil], + [{"1" => nil}], + ] + record_batch = build_record_batch(:int8, records) + assert_equal(records, record_batch.raw_records) + end + + test("UInt8Array") do + records = [ + [{"0" => (2 ** 8) - 1}], + [nil], + [{"1" => nil}], + ] + record_batch = build_record_batch(:uint8, records) + assert_equal(records, record_batch.raw_records) + end + + test("Int16Array") do + records = [ + [{"0" => -(2 ** 15)}], + [nil], + [{"1" => nil}], + ] + record_batch = build_record_batch(:int16, records) + assert_equal(records, record_batch.raw_records) + end + + test("UInt16Array") do + records = [ + [{"0" => (2 ** 16) - 1}], + [nil], + [{"1" => nil}], + ] + record_batch = build_record_batch(:uint16, records) + assert_equal(records, record_batch.raw_records) + end + + test("Int32Array") do + records = [ + [{"0" => -(2 ** 31)}], + [nil], + [{"1" => nil}], + ] + record_batch = build_record_batch(:int32, records) + assert_equal(records, record_batch.raw_records) + end + + test("UInt32Array") do + records = [ + [{"0" => (2 ** 32) - 1}], + [nil], + [{"1" => nil}], + ] + record_batch = build_record_batch(:uint32, records) + assert_equal(records, record_batch.raw_records) + end + + test("Int64Array") do + records = [ + [{"0" => -(2 ** 63)}], + [nil], + [{"1" => nil}], + ] + record_batch = build_record_batch(:int64, records) + assert_equal(records, record_batch.raw_records) + end + + test("UInt64Array") do + records = [ + [{"0" => (2 ** 64) - 1}], + [nil], + [{"1" => nil}], + ] + record_batch = build_record_batch(:uint64, records) + assert_equal(records, record_batch.raw_records) + end + + test("FloatArray") do + records = [ + [{"0" => -1.0}], + [nil], + [{"1" => nil}], + ] + record_batch = build_record_batch(:float, records) + assert_equal(records, record_batch.raw_records) + end + + test("DoubleArray") do + records = [ + [{"0" => -1.0}], + [nil], + [{"1" => nil}], + ] + record_batch = build_record_batch(:double, records) + assert_equal(records, record_batch.raw_records) + end + + test("BinaryArray") do + records = [ + [{"0" => "\xff".b}], + [nil], + [{"1" => nil}], + ] + record_batch = build_record_batch(:binary, records) + assert_equal(records, record_batch.raw_records) + end + + test("StringArray") do + records = [ + [{"0" => "Ruby"}], + [nil], + [{"1" => nil}], + ] + record_batch = build_record_batch(:string, records) + assert_equal(records, record_batch.raw_records) + end + + test("Date32Array") do + records = [ + [{"0" => Date.new(1960, 1, 1)}], + [nil], + [{"1" => nil}], + ] + record_batch = build_record_batch(:date32, records) + assert_equal(records, record_batch.raw_records) + end + + test("Date64Array") do + records = [ + [{"0" => DateTime.new(1960, 1, 1, 2, 9, 30)}], + [nil], + [{"1" => nil}], + ] + record_batch = build_record_batch(:date64, records) + assert_equal(records, record_batch.raw_records) + end + + sub_test_case("TimestampArray") do + test("second") do + records = [ + [{"0" => Time.parse("1960-01-01T02:09:30Z")}], + [nil], + [{"1" => nil}], + ] + record_batch = build_record_batch({ + type: :timestamp, + unit: :second, + }, + records) + assert_equal(records, record_batch.raw_records) + end + + test("milli") do + records = [ + [{"0" => Time.parse("1960-01-01T02:09:30.123Z")}], + [nil], + [{"1" => nil}], + ] + record_batch = build_record_batch({ + type: :timestamp, + unit: :milli, + }, + records) + assert_equal(records, record_batch.raw_records) + end + + test("micro") do + records = [ + [{"0" => Time.parse("1960-01-01T02:09:30.123456Z")}], + [nil], + [{"1" => nil}], + ] + record_batch = build_record_batch({ + type: :timestamp, + unit: :micro, + }, + records) + assert_equal(records, record_batch.raw_records) + end + + test("nano") do + records = [ + [{"0" => Time.parse("1960-01-01T02:09:30.123456789Z")}], + [nil], + [{"1" => nil}], + ] + record_batch = build_record_batch({ + type: :timestamp, + unit: :nano, + }, + records) + assert_equal(records, record_batch.raw_records) + end + end + + sub_test_case("Time32Array") do + test("second") do + records = [ + [{"0" => 60 * 10}], # 00:10:00 + [nil], + [{"1" => nil}], + ] + record_batch = build_record_batch({ + type: :time32, + unit: :second, + }, + records) + assert_equal(records, record_batch.raw_records) + end + + test("milli") do + records = [ + [{"0" => (60 * 10) * 1000 + 123}], # 00:10:00.123 + [nil], + [{"1" => nil}], + ] + record_batch = build_record_batch({ + type: :time32, + unit: :milli, + }, + records) + assert_equal(records, record_batch.raw_records) + end + end + + sub_test_case("Time64Array") do + test("micro") do + records = [ + [{"0" => (60 * 10) * 1_000_000 + 123_456}], # 00:10:00.123456 + [nil], + [{"1" => nil}], + ] + record_batch = build_record_batch({ + type: :time64, + unit: :micro, + }, + records) + assert_equal(records, record_batch.raw_records) + end + + test("nano") do + records = [ + # 00:10:00.123456789 + [{"0" => (60 * 10) * 1_000_000_000 + 123_456_789}], + [nil], + [{"1" => nil}], + ] + record_batch = build_record_batch({ + type: :time64, + unit: :nano, + }, + records) + assert_equal(records, record_batch.raw_records) + end + end + + test("Decimal128Array") do + records = [ + [{"0" => BigDecimal("92.92")}], + [nil], + [{"1" => nil}], + ] + record_batch = build_record_batch({ + type: :decimal128, + precision: 8, + scale: 2, + }, + records) + assert_equal(records, record_batch.raw_records) + end + + test("ListArray") do + records = [ + [{"0" => [true, nil, false]}], + [nil], + [{"1" => nil}], + ] + record_batch = build_record_batch({ + type: :list, + field: { + name: :sub_element, + type: :boolean, + }, + }, + records) + assert_equal(records, record_batch.raw_records) + end + + test("StructArray") do + records = [ + [{"0" => {"sub_field" => true}}], + [nil], + [{"1" => nil}], + [{"0" => {"sub_field" => nil}}], + ] + record_batch = build_record_batch({ + type: :struct, + fields: [ + { + name: :sub_field, + type: :boolean, + }, + ], + }, + records) + assert_equal(records, record_batch.raw_records) + end + + test("SparseUnionArray") do + omit("Need to add support for SparseUnionArrayBuilder") + records = [ + [{"0" => {"field1" => true}}], + [nil], + [{"1" => nil}], + [{"0" => {"field2" => nil}}], + ] + record_batch = build_record_batch({ + type: :sparse_union, + fields: [ + { + name: :field1, + type: :boolean, + }, + { + name: :field2, + type: :uint8, + }, + ], + type_codes: [0, 1], + }, + records) + assert_equal(records, record_batch.raw_records) + end + + test("DenseUnionArray") do + omit("Need to add support for DenseUnionArrayBuilder") + records = [ + [{"0" => {"field1" => true}}], + [nil], + [{"1" => nil}], + [{"0" => {"field2" => nil}}], + ] + record_batch = build_record_batch({ + type: :dense_union, + fields: [ + { + name: :field1, + type: :boolean, + }, + { + name: :field2, + type: :uint8, + }, + ], + type_codes: [0, 1], + }, + records) + assert_equal(records, record_batch.raw_records) + end + + test("DictionaryArray") do + omit("Need to add support for DictionaryArrayBuilder") + records = [ + [{"0" => "Ruby"}], + [nil], + [{"1" => nil}], + [{"0" => "GLib"}], + ] + dictionary = Arrow::StringArray.new(["GLib", "Ruby"]) + record_batch = build_record_batch({ + type: :dictionary, + index_data_type: :int8, + dictionary: dictionary, + ordered: true, + }, + records) + assert_equal(records, record_batch.raw_records) + end +end diff --git a/ruby/red-arrow/test/raw-records/record-batch/test-struct-array.rb b/ruby/red-arrow/test/raw-records/record-batch/test-struct-array.rb new file mode 100644 index 0000000..bccd0d9 --- /dev/null +++ b/ruby/red-arrow/test/raw-records/record-batch/test-struct-array.rb @@ -0,0 +1,427 @@ +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. + +class RawRecordsRecordBatchStructArrayTest < Test::Unit::TestCase + def fields(type) + field_description = { + name: :field, + } + if type.is_a?(Hash) + field_description = field_description.merge(type) + else + field_description[:type] = type + end + { + column: { + type: :struct, + fields: [ + field_description, + ], + }, + } + end + + test("NullArray") do + omit("Need to add support for NullArrayBuilder") + records = [ + [{"field" => nil}], + [nil], + ] + record_batch = Arrow::RecordBatch.new(fields(:null), + records) + assert_equal(records, record_batch.raw_records) + end + + test("BooleanArray") do + records = [ + [{"field" => true}], + [nil], + [{"field" => nil}], + ] + record_batch = Arrow::RecordBatch.new(fields(:boolean), + records) + assert_equal(records, record_batch.raw_records) + end + + test("Int8Array") do + records = [ + [{"field" => -(2 ** 7)}], + [nil], + [{"field" => nil}], + ] + record_batch = Arrow::RecordBatch.new(fields(:int8), + records) + assert_equal(records, record_batch.raw_records) + end + + test("UInt8Array") do + records = [ + [{"field" => (2 ** 8) - 1}], + [nil], + [{"field" => nil}], + ] + record_batch = Arrow::RecordBatch.new(fields(:uint8), + records) + assert_equal(records, record_batch.raw_records) + end + + test("Int16Array") do + records = [ + [{"field" => -(2 ** 15)}], + [nil], + [{"field" => nil}], + ] + record_batch = Arrow::RecordBatch.new(fields(:int16), + records) + assert_equal(records, record_batch.raw_records) + end + + test("UInt16Array") do + records = [ + [{"field" => (2 ** 16) - 1}], + [nil], + [{"field" => nil}], + ] + record_batch = Arrow::RecordBatch.new(fields(:uint16), + records) + assert_equal(records, record_batch.raw_records) + end + + test("Int32Array") do + records = [ + [{"field" => -(2 ** 31)}], + [nil], + [{"field" => nil}], + ] + record_batch = Arrow::RecordBatch.new(fields(:int32), + records) + assert_equal(records, record_batch.raw_records) + end + + test("UInt32Array") do + records = [ + [{"field" => (2 ** 32) - 1}], + [nil], + [{"field" => nil}], + ] + record_batch = Arrow::RecordBatch.new(fields(:uint32), + records) + assert_equal(records, record_batch.raw_records) + end + + test("Int64Array") do + records = [ + [{"field" => -(2 ** 63)}], + [nil], + [{"field" => nil}], + ] + record_batch = Arrow::RecordBatch.new(fields(:int64), + records) + assert_equal(records, record_batch.raw_records) + end + + test("UInt64Array") do + records = [ + [{"field" => (2 ** 64) - 1}], + [nil], + [{"field" => nil}], + ] + record_batch = Arrow::RecordBatch.new(fields(:uint64), + records) + assert_equal(records, record_batch.raw_records) + end + + test("FloatArray") do + records = [ + [{"field" => -1.0}], + [nil], + [{"field" => nil}], + ] + record_batch = Arrow::RecordBatch.new(fields(:float), + records) + assert_equal(records, record_batch.raw_records) + end + + test("DoubleArray") do + records = [ + [{"field" => -1.0}], + [nil], + [{"field" => nil}], + ] + record_batch = Arrow::RecordBatch.new(fields(:double), + records) + assert_equal(records, record_batch.raw_records) + end + + test("BinaryArray") do + records = [ + [{"field" => "\xff".b}], + [nil], + [{"field" => nil}], + ] + record_batch = Arrow::RecordBatch.new(fields(:binary), + records) + assert_equal(records, record_batch.raw_records) + end + + test("StringArray") do + records = [ + [{"field" => "Ruby"}], + [nil], + [{"field" => nil}], + ] + record_batch = Arrow::RecordBatch.new(fields(:string), + records) + assert_equal(records, record_batch.raw_records) + end + + test("Date32Array") do + records = [ + [{"field" => Date.new(1960, 1, 1)}], + [nil], + [{"field" => nil}], + ] + record_batch = Arrow::RecordBatch.new(fields(:date32), + records) + assert_equal(records, record_batch.raw_records) + end + + test("Date64Array") do + records = [ + [{"field" => DateTime.new(1960, 1, 1, 2, 9, 30)}], + [nil], + [{"field" => nil}], + ] + record_batch = Arrow::RecordBatch.new(fields(:date64), + records) + assert_equal(records, record_batch.raw_records) + end + + sub_test_case("TimestampArray") do + test("second") do + records = [ + [{"field" => Time.parse("1960-01-01T02:09:30Z")}], + [nil], + [{"field" => nil}], + ] + record_batch = Arrow::RecordBatch.new(fields(type: :timestamp, + unit: :second), + records) + assert_equal(records, record_batch.raw_records) + end + + test("milli") do + records = [ + [{"field" => Time.parse("1960-01-01T02:09:30.123Z")}], + [nil], + [{"field" => nil}], + ] + record_batch = Arrow::RecordBatch.new(fields(type: :timestamp, + unit: :milli), + records) + assert_equal(records, record_batch.raw_records) + end + + test("micro") do + records = [ + [{"field" => Time.parse("1960-01-01T02:09:30.123456Z")}], + [nil], + [{"field" => nil}], + ] + record_batch = Arrow::RecordBatch.new(fields(type: :timestamp, + unit: :micro), + records) + assert_equal(records, record_batch.raw_records) + end + + test("nano") do + records = [ + [{"field" => Time.parse("1960-01-01T02:09:30.123456789Z")}], + [nil], + [{"field" => nil}], + ] + record_batch = Arrow::RecordBatch.new(fields(type: :timestamp, + unit: :nano), + records) + assert_equal(records, record_batch.raw_records) + end + end + + sub_test_case("Time32Array") do + test("second") do + records = [ + [{"field" => 60 * 10}], # 00:10:00 + [nil], + [{"field" => nil}], + ] + record_batch = Arrow::RecordBatch.new(fields(type: :time32, + unit: :second), + records) + assert_equal(records, record_batch.raw_records) + end + + test("milli") do + records = [ + [{"field" => (60 * 10) * 1000 + 123}], # 00:10:00.123 + [nil], + [{"field" => nil}], + ] + record_batch = Arrow::RecordBatch.new(fields(type: :time32, + unit: :milli), + records) + assert_equal(records, record_batch.raw_records) + end + end + + sub_test_case("Time64Array") do + test("micro") do + records = [ + [{"field" => (60 * 10) * 1_000_000 + 123_456}], # 00:10:00.123456 + [nil], + [{"field" => nil}], + ] + record_batch = Arrow::RecordBatch.new(fields(type: :time64, + unit: :micro), + records) + assert_equal(records, record_batch.raw_records) + end + + test("nano") do + records = [ + # 00:10:00.123456789 + [{"field" => (60 * 10) * 1_000_000_000 + 123_456_789}], + [nil], + [{"field" => nil}], + ] + record_batch = Arrow::RecordBatch.new(fields(type: :time64, + unit: :nano), + records) + assert_equal(records, record_batch.raw_records) + end + end + + test("Decimal128Array") do + records = [ + [{"field" => BigDecimal("92.92")}], + [nil], + [{"field" => nil}], + ] + record_batch = Arrow::RecordBatch.new(fields(type: :decimal128, + precision: 8, + scale: 2), + records) + assert_equal(records, record_batch.raw_records) + end + + test("ListArray") do + records = [ + [{"field" => [true, nil, false]}], + [nil], + [{"field" => nil}], + ] + record_batch = Arrow::RecordBatch.new(fields(type: :list, + field: { + name: :sub_element, + type: :boolean, + }), + records) + assert_equal(records, record_batch.raw_records) + end + + test("StructArray") do + records = [ + [{"field" => {"sub_field" => true}}], + [nil], + [{"field" => nil}], + [{"field" => {"sub_field" => nil}}], + ] + record_batch = Arrow::RecordBatch.new(fields(type: :struct, + fields: [ + { + name: :sub_field, + type: :boolean, + }, + ]), + records) + assert_equal(records, record_batch.raw_records) + end + + test("SparseUnionArray") do + omit("Need to add support for SparseUnionArrayBuilder") + records = [ + [{"field" => {"field1" => true}}], + [nil], + [{"field" => nil}], + [{"field" => {"field2" => nil}}], + ] + record_batch = Arrow::RecordBatch.new(fields(type: :sparse_union, + fields: [ + { + name: :field1, + type: :boolean, + }, + { + name: :field2, + type: :uint8, + }, + ], + type_codes: [0, 1]), + records) + assert_equal(records, record_batch.raw_records) + end + + test("DenseUnionArray") do + omit("Need to add support for DenseUnionArrayBuilder") + records = [ + [{"field" => {"field1" => true}}], + [nil], + [{"field" => nil}], + [{"field" => {"field2" => nil}}], + ] + record_batch = Arrow::RecordBatch.new(fields(type: :dense_union, + fields: [ + { + name: :field1, + type: :boolean, + }, + { + name: :field2, + type: :uint8, + }, + ], + type_codes: [0, 1]), + records) + assert_equal(records, record_batch.raw_records) + end + + test("DictionaryArray") do + omit("Need to add support for DictionaryArrayBuilder") + records = [ + [{"field" => "Ruby"}], + [nil], + [{"field" => nil}], + [{"field" => "GLib"}], + ] + dictionary = Arrow::StringArray.new(["GLib", "Ruby"]) + record_batch = Arrow::RecordBatch.new(fields(type: :dictionary, + index_data_type: :int8, + dictionary: dictionary, + ordered: true), + records) + assert_equal(records, record_batch.raw_records) + end +end diff --git a/ruby/red-arrow/test/run-test.rb b/ruby/red-arrow/test/run-test.rb index 9551f60..4712d49 100755 --- a/ruby/red-arrow/test/run-test.rb +++ b/ruby/red-arrow/test/run-test.rb @@ -26,8 +26,29 @@ require "pathname" base_dir = Pathname.new(__dir__).parent.expand_path lib_dir = base_dir + "lib" +ext_dir = base_dir + "ext" + "arrow" test_dir = base_dir + "test" +make = nil +if ENV["NO_MAKE"] != "yes" + if ENV["MAKE"] + make = ENV["MAKE"] + elsif system("type gmake > /dev/null") + make = "gmake" + elsif system("type make > /dev/null") + make = "make" + end +end +if make + Dir.chdir(ext_dir.to_s) do + unless File.exist?("Makefile") + system(RbConfig.ruby, "extconf.rb", "--enable-debug-build") or exit(false) + end + system("#{make} > /dev/null") or exit(false) + end +end + +$LOAD_PATH.unshift(ext_dir.to_s) $LOAD_PATH.unshift(lib_dir.to_s) require_relative "helper" diff --git a/ruby/red-arrow/test/test-data-type.rb b/ruby/red-arrow/test/test-data-type.rb index 747eff8..bcffea2 100644 --- a/ruby/red-arrow/test/test-data-type.rb +++ b/ruby/red-arrow/test/test-data-type.rb @@ -43,6 +43,11 @@ class DataTypeTest < Test::Unit::TestCase assert_equal(Arrow::ListDataType.new(field), Arrow::DataType.resolve(type: :list, field: field)) end + + test("_") do + assert_equal(Arrow::FixedSizeBinaryDataType.new(10), + Arrow::DataType.resolve([:fixed_size_binary, 10])) + end end sub_test_case("instance methods") do diff --git a/ruby/red-gandiva/test/run-test.rb b/ruby/red-gandiva/test/run-test.rb index b826f3e..a4f7f76 100755 --- a/ruby/red-gandiva/test/run-test.rb +++ b/ruby/red-gandiva/test/run-test.rb @@ -28,7 +28,9 @@ lib_dir = base_dir + "lib" test_dir = base_dir + "test" arrow_lib_dir = arrow_base_dir + "lib" +arrow_ext_dir = arrow_base_dir + "ext" + "arrow" +$LOAD_PATH.unshift(arrow_ext_dir.to_s) $LOAD_PATH.unshift(arrow_lib_dir.to_s) $LOAD_PATH.unshift(lib_dir.to_s) diff --git a/ruby/red-parquet/test/run-test.rb b/ruby/red-parquet/test/run-test.rb index b826f3e..a4f7f76 100755 --- a/ruby/red-parquet/test/run-test.rb +++ b/ruby/red-parquet/test/run-test.rb @@ -28,7 +28,9 @@ lib_dir = base_dir + "lib" test_dir = base_dir + "test" arrow_lib_dir = arrow_base_dir + "lib" +arrow_ext_dir = arrow_base_dir + "ext" + "arrow" +$LOAD_PATH.unshift(arrow_ext_dir.to_s) $LOAD_PATH.unshift(arrow_lib_dir.to_s) $LOAD_PATH.unshift(lib_dir.to_s) diff --git a/ruby/red-plasma/test/run-test.rb b/ruby/red-plasma/test/run-test.rb index b826f3e..a4f7f76 100755 --- a/ruby/red-plasma/test/run-test.rb +++ b/ruby/red-plasma/test/run-test.rb @@ -28,7 +28,9 @@ lib_dir = base_dir + "lib" test_dir = base_dir + "test" arrow_lib_dir = arrow_base_dir + "lib" +arrow_ext_dir = arrow_base_dir + "ext" + "arrow" +$LOAD_PATH.unshift(arrow_ext_dir.to_s) $LOAD_PATH.unshift(arrow_lib_dir.to_s) $LOAD_PATH.unshift(lib_dir.to_s)