Thanks Aldrin and Weston!  Following your suggestions I was able to encode
the schema such that Athena recognized it... In case it helps anyone,
here's some sample code..

import { Schema, Field, Utf8, Table, RecordBatchStreamWriter, Int32, Bool,
DateMillisecond, DateDay } from 'apache-arrow';
const s = new Schema([
  new Field('name', new Utf8),
  new Field('address', new Utf8),
  new Field('active', new Bool),
  new Field('count', new Int32),
  new Field('birthday', new DateDay),
  new Field('created', new DateMillisecond)
]);
const w = new RecordBatchStreamWriter();
w.write(new Table(s));
const encodedSchema = Buffer.from(w.toUint8Array(true)).toString('base64');



On Fri, May 6, 2022 at 3:53 PM Aldrin <[email protected]> wrote:

> I didn't think of this as a possible solution, for some reason, but I
> think it actually makes a lot of sense. Just as a reference, this is
> something I currently do when storing data in a key-value interface:
>
>    - I write a buffer with no batches
>    - Write batches in separate buffers
>       - these are sized to fully utilize the space for each key-value
>
> It is possible to then read the key-value that only contains a schema.
>
> I believe my approach for doing this can be seen in [1], and I use the
> StreamWriter because I want it to use an in-memory format that is
> streamable.
>
> [1]:
> https://gitlab.com/skyhookdm/skytether-singlecell/-/blob/mainline/src/cpp/processing/dataformats.cpp#L16
>
> Aldrin Montana
> Computer Science PhD Student
> UC Santa Cruz
>
>
> On Fri, May 6, 2022 at 12:04 PM Weston Pace <[email protected]> wrote:
>
>> Can you serialize the schema by creating an IPC file with zero record
>> batches?  I apologize, but I do not know the JS API as well.  Maybe
>> you can create a table from just a schema (or a schema and a set of
>> empty arrays) and then turn that into an IPC file?  This shouldn't add
>> too much overhead.
>>
>> On Thu, May 5, 2022 at 8:23 AM Howard Engelhart
>> <[email protected]> wrote:
>> >
>> > I'm looking to implement an Athena federated query custom connector
>> using the arrow js lib.  I'm getting stuck on figuring out how to encode a
>> Schema properly for the Athena GetTableResponse.  I have found an example
>> using python that does something like this.. (paraphrasing...)
>> >
>> > import pyarrow as pa
>> > .....
>> >        return {
>> >             "@type": "GetTableResponse",
>> >             "catalogName": self.catalogName,
>> >             "tableName": {'schemaName': self.databaseName, 'tableName':
>> self.tableName},
>> >             "schema": {"schema":
>> base64.b64encode(pa.schema(....args...).serialize().slice(4)).decode("utf-8")},
>> >             "partitionColumns": self.partitions,
>> >             "requestType": self.request_type
>> >         }
>> > What i'm looking for is the js equivalent of
>> > pa.schema(....args...).serialize()
>> >
>> > Is there one?  If not, could someone point me in the right direction of
>> how to code up something similar?
>>
>

Reply via email to