Re: Querying Delimited Sequence file

rahul challapalli Tue, 30 Aug 2016 11:09:06 -0700

Also you can refer to [1] for the list of string functions implemented.

[1]
https://github.com/apache/drill/blob/master/exec/java-exec/src/main/java/org/apache/drill/exec/expr/fn/impl/StringFunctions.java


On Tue, Aug 30, 2016 at 11:06 AM, rahul challapalli <
challapallira...@gmail.com> wrote:

> You should be able to use split_part function (I haven't tried it
> myself...but it is supported). With this function you can extract
> individual columns. Unfortunately I couldn't find the documentation for
> this function as well. But it should be similar to how other databases
> implement this function.
>
> Also as you have observed, split does not support delimiters with more
> than one character. You can raise a jira and mark it as documentation
> related.
>
> Rahul
>
>
>
> On Tue, Aug 30, 2016 at 8:58 AM, Robin Moffatt <
> robin.moff...@rittmanmead.com> wrote:
>
>> Hi,
>>
>> Thanks - I think SPLIT gets me some of the way, but after the FLATTEN I
>> want to PIVOT, so instead of :
>>
>> 0: jdbc:drill:zk=cdh57-01-node-01.moffatt.me:> select
>> flatten(split(version,'.')) from sys.version;
>> +---------+
>> | EXPR$0  |
>> +---------+
>> | 1       |
>> | 7       |
>> | 0       |
>> +---------+
>>
>> I'd get something like:
>>
>> +---------+---------+---------+
>> | EXPR$0  | EXPR$1  | EXPR$2  |
>> +---------+---------+---------+
>> | 1       | 7       | 0       |
>> +---------+---------+---------+
>>
>> I'm guessing this isn't possible in Drill yet?
>>
>> Also, what would be be the syntax to enter the \x01 character in the SPLIT
>> function? Entered literally I get an error:
>>
>> 0: jdbc:drill:zk=cdh57-01-node-01.moffatt.me:> SELECT
>> split(CONVERT_FROM(binary_value, 'UTF8'),'\x01') from
>>  `/user/oracle/seq/pdb.soe.logon` limit 5;
>> Error: SYSTEM ERROR: IllegalArgumentException: Only single character
>> delimiters are supported for split()
>>
>> BTW I didn't realise SPLIT was supported, and it's not listed in
>> https://drill.apache.org/docs/string-manipulation/ or
>> https://drill.apache.org/search/?q=split -- is there somewhere I should
>> log
>> this kind of documentation issue?
>>
>> thanks, Robin.
>>
>>
>> On 30 August 2016 at 16:07, Zelaine Fong <zf...@maprtech.com> wrote:
>>
>> > If the column is delimited by some character, you can use the SPLIT()
>> > function to separate the value into an array of values.  You can then
>> use
>> > the FLATTEN() function to separate the array of values into individual
>> > records.
>> >
>> > E.g., if your column has the value "a:b", where your delimiter is ":",
>> you
>> > would run the following query:
>> >
>> > 0: jdbc:drill:zk=local> select flatten(split(columns[0],':')) from
>> > `/tmp/foo.csv`;
>> > +---------+
>> > | EXPR$0  |
>> > +---------+
>> > | a       |
>> > | b       |
>> > +---------+
>> > 2 rows selected (0.319 seconds)
>> >
>> > Is that what you had in mind?
>> >
>> > -- Zelaine
>> >
>> > On Tue, Aug 30, 2016 at 7:17 AM, Robin Moffatt <
>> > robin.moff...@rittmanmead.com> wrote:
>> >
>> > > Hi,
>> > >
>> > > I'm trying to read a sequence file, in which the key is null and the
>> > value
>> > > holds multiple columns [1], delimited by \x01. In Hive I simply
>> define it
>> > > as :
>> > >
>> > > CREATE EXTERNAL TABLE foo (col1 string, col2 string, col3 timestamp)
>> > > ROW FORMAT DELIMITED
>> > > STORED as sequencefile
>> > > LOCATION '/user/oracle/foo/bar';
>> > >
>> > > In Drill I've got as far as
>> > >
>> > > SELECT CONVERT_FROM(binary_value, 'UTF8') from  `/user/oracle/foo/bar`
>> > >
>> > > which yields the data but as a single column. I can cast it to
>> individual
>> > > columns but this is no use if the field positions change
>> > >
>> > > SELECT substr(CONVERT_FROM(binary_value, 'UTF8'),5,1) as
>> > > col0,substr(CONVERT_FROM(binary_value, 'UTF8'),7,13) as
>> > > col1,substr(CONVERT_FROM(binary_value, 'UTF8'),20,20) as col3 from
>> > >  `/user/oracle/seq/pdb.soe.logon` limit 5;
>> > > +-------+----------------+-----------------------+
>> > > | col0  |      col1      |         col3          |
>> > > +-------+----------------+-----------------------+
>> > > | I     | PDB.SOE.LOGON  | 2016-07-29 13:36:40  |
>> > >
>> > >
>> > > Is there a way to treat a column as delimited and burst it out into
>> > > multiple columns? Presumably I could somehow dump the string contents
>> to
>> > > CSV and then re-read it - but I'm interested here in using Drill the
>> > query
>> > > existing data; wrangling it to suit Drill isn't really what I'm
>> looking
>> > for
>> > > (and maybe Drill just isn't the right tool here?).
>> > >
>> > >
>> > > thanks,
>> > >
>> > > Robin.
>> > >
>> > > [1]
>> > > https://docs.oracle.com/goldengate/bd1221/gg-bd/GADBD/
>> > > GUID-85A82B2E-CD51-463A-8674-3D686C3C0EC0.htm#GADBD-GUID-
>> > > 4CAFC347-0F7D-49AB-B293-EFBCE95B66D6
>> > >
>> >
>>
>
>

Re: Querying Delimited Sequence file

Reply via email to