Re: SQL datatime fields in json -> timestamp in parquet ? (CTAS)

2015-07-13 Thread Stefán Baxter
I see, thank you :) On Mon, Jul 13, 2015 at 11:02 PM, Jacques Nadeau wrote: > Wrong line in the code. Actual code: > > > https://github.com/apache/drill/blob/master/exec/java-exec/src/test/resources/vector/complex/extended.json#L8 > > On Mon, Jul 13, 2015 at 3:48 PM, Jacques Nadeau > wrote: >

Re: SQL datatime fields in json -> timestamp in parquet ? (CTAS)

2015-07-13 Thread Jacques Nadeau
Wrong line in the code. Actual code: https://github.com/apache/drill/blob/master/exec/java-exec/src/test/resources/vector/complex/extended.json#L8 On Mon, Jul 13, 2015 at 3:48 PM, Jacques Nadeau wrote: > If you use extended JSON in your JSON file, Drill will automatically > convert to TIMESTAM

Re: SQL datatime fields in json -> timestamp in parquet ? (CTAS)

2015-07-13 Thread Jacques Nadeau
If you use extended JSON in your JSON file, Drill will automatically convert to TIMESTAMP_MILLIS. You can see and example of the JSON format for this at [1]. For checking, one of the parquet-tools options will solve this. I can't remember which one off hand. https://github.com/apache/drill/blob

Re: SQL datatime fields in json -> timestamp in parquet ? (CTAS)

2015-07-13 Thread Stefán Baxter
Hi Jacques, How can I tell if has that notation and is there a way for me to set the defaults for the conversion of json datatime fields? Regards, -Stefan On Mon, Jul 13, 2015 at 3:19 PM, Jacques Nadeau wrote: > There are two different settings inside a Parquet file: physical storage > and l

Re: SQL datatime fields in json -> timestamp in parquet ? (CTAS)

2015-07-13 Thread Jacques Nadeau
There are two different settings inside a Parquet file: physical storage and loigcal annotation. A timestamp should be stored as a physical INT64 with the TIMESTAMP_MILLI annotation. See here: https://github.com/apache/parquet-format/blob/master/src/thrift/parquet.thrift#L105 On Mon, Jul 13, 20

Re: SQL datatime fields in json -> timestamp in parquet ? (CTAS)

2015-07-13 Thread Stefán Baxter
thank you. I had seen this. I was just expecting the list to say 'TIMESTAMP_MILLI' :) (that would up the confidence level for a newbie) Regards, -Stefan On Mon, Jul 13, 2015 at 2:44 PM, Kristine Hahn wrote: > Expected, I think. > > https://drill.apache.org/docs/parquet-format/#sql-types-to-pa

Re: SQL datatime fields in json -> timestamp in parquet ? (CTAS)

2015-07-13 Thread Kristine Hahn
Expected, I think. https://drill.apache.org/docs/parquet-format/#sql-types-to-parquet-logical-types says that the timestamp type is mapped to the Parquet TIMESTAMP_MILLI, which is a Unix timestamp (int64). Take a look at https://drill.apache.org/docs/data-type-conversion/#to_timestamp and the Timez

SQL datatime fields in json -> timestamp in parquet ? (CTAS)

2015-07-13 Thread Stefán Baxter
Hi, I have a json file that contains a SQL timestamp. When I use it to create a Parquet file it seems to become a INT64: Jul 12, 2015 3:34:59 PM INFO: parquet.hadoop.ColumnChunkPageWriteStore: written 153,728B for [occurred_at] INT64: 28,910 values, 231,288B raw, 153,681B comp, 1 pages, encoding