Re: Querying Delimited Sequence file

rahul challapalli Tue, 30 Aug 2016 11:07:51 -0700

You should be able to use split_part function (I haven't tried it
myself...but it is supported). With this function you can extract
individual columns. Unfortunately I couldn't find the documentation for
this function as well. But it should be similar to how other databases
implement this function.


Also as you have observed, split does not support delimiters with more than
one character. You can raise a jira and mark it as documentation related.

Rahul



On Tue, Aug 30, 2016 at 8:58 AM, Robin Moffatt <
robin.moff...@rittmanmead.com> wrote:

> Hi,
>
> Thanks - I think SPLIT gets me some of the way, but after the FLATTEN I
> want to PIVOT, so instead of :
>
> 0: jdbc:drill:zk=cdh57-01-node-01.moffatt.me:> select
> flatten(split(version,'.')) from sys.version;
> +---------+
> | EXPR$0  |
> +---------+
> | 1       |
> | 7       |
> | 0       |
> +---------+
>
> I'd get something like:
>
> +---------+---------+---------+
> | EXPR$0  | EXPR$1  | EXPR$2  |
> +---------+---------+---------+
> | 1       | 7       | 0       |
> +---------+---------+---------+
>
> I'm guessing this isn't possible in Drill yet?
>
> Also, what would be be the syntax to enter the \x01 character in the SPLIT
> function? Entered literally I get an error:
>
> 0: jdbc:drill:zk=cdh57-01-node-01.moffatt.me:> SELECT
> split(CONVERT_FROM(binary_value, 'UTF8'),'\x01') from
>  `/user/oracle/seq/pdb.soe.logon` limit 5;
> Error: SYSTEM ERROR: IllegalArgumentException: Only single character
> delimiters are supported for split()
>
> BTW I didn't realise SPLIT was supported, and it's not listed in
> https://drill.apache.org/docs/string-manipulation/ or
> https://drill.apache.org/search/?q=split -- is there somewhere I should
> log
> this kind of documentation issue?
>
> thanks, Robin.
>
>
> On 30 August 2016 at 16:07, Zelaine Fong <zf...@maprtech.com> wrote:
>
> > If the column is delimited by some character, you can use the SPLIT()
> > function to separate the value into an array of values.  You can then use
> > the FLATTEN() function to separate the array of values into individual
> > records.
> >
> > E.g., if your column has the value "a:b", where your delimiter is ":",
> you
> > would run the following query:
> >
> > 0: jdbc:drill:zk=local> select flatten(split(columns[0],':')) from
> > `/tmp/foo.csv`;
> > +---------+
> > | EXPR$0  |
> > +---------+
> > | a       |
> > | b       |
> > +---------+
> > 2 rows selected (0.319 seconds)
> >
> > Is that what you had in mind?
> >
> > -- Zelaine
> >
> > On Tue, Aug 30, 2016 at 7:17 AM, Robin Moffatt <
> > robin.moff...@rittmanmead.com> wrote:
> >
> > > Hi,
> > >
> > > I'm trying to read a sequence file, in which the key is null and the
> > value
> > > holds multiple columns [1], delimited by \x01. In Hive I simply define
> it
> > > as :
> > >
> > > CREATE EXTERNAL TABLE foo (col1 string, col2 string, col3 timestamp)
> > > ROW FORMAT DELIMITED
> > > STORED as sequencefile
> > > LOCATION '/user/oracle/foo/bar';
> > >
> > > In Drill I've got as far as
> > >
> > > SELECT CONVERT_FROM(binary_value, 'UTF8') from  `/user/oracle/foo/bar`
> > >
> > > which yields the data but as a single column. I can cast it to
> individual
> > > columns but this is no use if the field positions change
> > >
> > > SELECT substr(CONVERT_FROM(binary_value, 'UTF8'),5,1) as
> > > col0,substr(CONVERT_FROM(binary_value, 'UTF8'),7,13) as
> > > col1,substr(CONVERT_FROM(binary_value, 'UTF8'),20,20) as col3 from
> > >  `/user/oracle/seq/pdb.soe.logon` limit 5;
> > > +-------+----------------+-----------------------+
> > > | col0  |      col1      |         col3          |
> > > +-------+----------------+-----------------------+
> > > | I     | PDB.SOE.LOGON  | 2016-07-29 13:36:40  |
> > >
> > >
> > > Is there a way to treat a column as delimited and burst it out into
> > > multiple columns? Presumably I could somehow dump the string contents
> to
> > > CSV and then re-read it - but I'm interested here in using Drill the
> > query
> > > existing data; wrangling it to suit Drill isn't really what I'm looking
> > for
> > > (and maybe Drill just isn't the right tool here?).
> > >
> > >
> > > thanks,
> > >
> > > Robin.
> > >
> > > [1]
> > > https://docs.oracle.com/goldengate/bd1221/gg-bd/GADBD/
> > > GUID-85A82B2E-CD51-463A-8674-3D686C3C0EC0.htm#GADBD-GUID-
> > > 4CAFC347-0F7D-49AB-B293-EFBCE95B66D6
> > >
> >
>

Re: Querying Delimited Sequence file

Reply via email to