Also you can refer to [1] for the list of string functions implemented. [1] https://github.com/apache/drill/blob/master/exec/java-exec/src/main/java/org/apache/drill/exec/expr/fn/impl/StringFunctions.java
On Tue, Aug 30, 2016 at 11:06 AM, rahul challapalli < challapallira...@gmail.com> wrote: > You should be able to use split_part function (I haven't tried it > myself...but it is supported). With this function you can extract > individual columns. Unfortunately I couldn't find the documentation for > this function as well. But it should be similar to how other databases > implement this function. > > Also as you have observed, split does not support delimiters with more > than one character. You can raise a jira and mark it as documentation > related. > > Rahul > > > > On Tue, Aug 30, 2016 at 8:58 AM, Robin Moffatt < > robin.moff...@rittmanmead.com> wrote: > >> Hi, >> >> Thanks - I think SPLIT gets me some of the way, but after the FLATTEN I >> want to PIVOT, so instead of : >> >> 0: jdbc:drill:zk=cdh57-01-node-01.moffatt.me:> select >> flatten(split(version,'.')) from sys.version; >> +---------+ >> | EXPR$0 | >> +---------+ >> | 1 | >> | 7 | >> | 0 | >> +---------+ >> >> I'd get something like: >> >> +---------+---------+---------+ >> | EXPR$0 | EXPR$1 | EXPR$2 | >> +---------+---------+---------+ >> | 1 | 7 | 0 | >> +---------+---------+---------+ >> >> I'm guessing this isn't possible in Drill yet? >> >> Also, what would be be the syntax to enter the \x01 character in the SPLIT >> function? Entered literally I get an error: >> >> 0: jdbc:drill:zk=cdh57-01-node-01.moffatt.me:> SELECT >> split(CONVERT_FROM(binary_value, 'UTF8'),'\x01') from >> `/user/oracle/seq/pdb.soe.logon` limit 5; >> Error: SYSTEM ERROR: IllegalArgumentException: Only single character >> delimiters are supported for split() >> >> BTW I didn't realise SPLIT was supported, and it's not listed in >> https://drill.apache.org/docs/string-manipulation/ or >> https://drill.apache.org/search/?q=split -- is there somewhere I should >> log >> this kind of documentation issue? >> >> thanks, Robin. >> >> >> On 30 August 2016 at 16:07, Zelaine Fong <zf...@maprtech.com> wrote: >> >> > If the column is delimited by some character, you can use the SPLIT() >> > function to separate the value into an array of values. You can then >> use >> > the FLATTEN() function to separate the array of values into individual >> > records. >> > >> > E.g., if your column has the value "a:b", where your delimiter is ":", >> you >> > would run the following query: >> > >> > 0: jdbc:drill:zk=local> select flatten(split(columns[0],':')) from >> > `/tmp/foo.csv`; >> > +---------+ >> > | EXPR$0 | >> > +---------+ >> > | a | >> > | b | >> > +---------+ >> > 2 rows selected (0.319 seconds) >> > >> > Is that what you had in mind? >> > >> > -- Zelaine >> > >> > On Tue, Aug 30, 2016 at 7:17 AM, Robin Moffatt < >> > robin.moff...@rittmanmead.com> wrote: >> > >> > > Hi, >> > > >> > > I'm trying to read a sequence file, in which the key is null and the >> > value >> > > holds multiple columns [1], delimited by \x01. In Hive I simply >> define it >> > > as : >> > > >> > > CREATE EXTERNAL TABLE foo (col1 string, col2 string, col3 timestamp) >> > > ROW FORMAT DELIMITED >> > > STORED as sequencefile >> > > LOCATION '/user/oracle/foo/bar'; >> > > >> > > In Drill I've got as far as >> > > >> > > SELECT CONVERT_FROM(binary_value, 'UTF8') from `/user/oracle/foo/bar` >> > > >> > > which yields the data but as a single column. I can cast it to >> individual >> > > columns but this is no use if the field positions change >> > > >> > > SELECT substr(CONVERT_FROM(binary_value, 'UTF8'),5,1) as >> > > col0,substr(CONVERT_FROM(binary_value, 'UTF8'),7,13) as >> > > col1,substr(CONVERT_FROM(binary_value, 'UTF8'),20,20) as col3 from >> > > `/user/oracle/seq/pdb.soe.logon` limit 5; >> > > +-------+----------------+-----------------------+ >> > > | col0 | col1 | col3 | >> > > +-------+----------------+-----------------------+ >> > > | I | PDB.SOE.LOGON | 2016-07-29 13:36:40 | >> > > >> > > >> > > Is there a way to treat a column as delimited and burst it out into >> > > multiple columns? Presumably I could somehow dump the string contents >> to >> > > CSV and then re-read it - but I'm interested here in using Drill the >> > query >> > > existing data; wrangling it to suit Drill isn't really what I'm >> looking >> > for >> > > (and maybe Drill just isn't the right tool here?). >> > > >> > > >> > > thanks, >> > > >> > > Robin. >> > > >> > > [1] >> > > https://docs.oracle.com/goldengate/bd1221/gg-bd/GADBD/ >> > > GUID-85A82B2E-CD51-463A-8674-3D686C3C0EC0.htm#GADBD-GUID- >> > > 4CAFC347-0F7D-49AB-B293-EFBCE95B66D6 >> > > >> > >> > >