RE: REGEX search Operator

2016-02-02 Thread masahide.miura
Hi, ANSI SQL doesn't define regex operator. Drill neither. Doesn't it enough 'LIKE' operator? Or, REGEXP_REPLACE/SUBSTR functions may help you. https://drill.apache.org/docs/string-manipulation/ -- Miura, Masahide -Original Message- From: Nicolas Paris [mailto:nipari...@gmail.com] Sen

Re: REGEX search Operator

2016-02-02 Thread Nicolas Paris
> ANSI SQL doesn't define regex operator. > Drill neither. > ​Drill has SQL functions extension like "REPEATED_CONTAINS"​ that looks to handle regex. regex operator could be replaced with one new SQL extension ? I guess I could create my own functions in java, right ? Maybe push it into github then

Re: REGEX search Operator

2016-02-02 Thread John Omernik
I would like to see something like this as well, even if it's an included UDF like REGEX(field, pattern) using Java's library for regex like Hive does. That would be EXTREMELY helpful. On Tue, Feb 2, 2016 at 6:55 AM, Nicolas Paris wrote: > > ANSI SQL doesn't define regex operator. > > Drill

Re: REGEX search Operator

2016-02-04 Thread Jason Altekruse
I didn't realize that we were lacking this functionality. As the repeated_contains operator handles wildcards it makes sense to add such a function to drill. It should be simple to implement, would someone like to open a JIRA and submit a PR for this? - Jason On Tue, Feb 2, 2016 at 8:56 AM, John

Re: REGEX search Operator

2016-02-04 Thread Nicolas Paris
Well I am creating a udf good exercise I hope a PR soon 2016-02-04 16:37 GMT+01:00 Jason Altekruse : > I didn't realize that we were lacking this functionality. As the > repeated_contains operator handles wildcards it makes sense to add such a > function to drill. > > It should be simple to imple

Re: REGEX search Operator

2016-02-04 Thread Jason Altekruse
Awesome, thanks! On Thu, Feb 4, 2016 at 7:44 AM, Nicolas Paris wrote: > Well I am creating a udf > good exercise > I hope a PR soon > > 2016-02-04 16:37 GMT+01:00 Jason Altekruse : > > > I didn't realize that we were lacking this functionality. As the > > repeated_contains operator handles wildc

Re: REGEX search Operator

2016-02-04 Thread Nicolas Paris
Jason, I have it working, Just tell me the way to proceed to PR. 1. where do I put my maven project ? Witch folder in my drill github fork? 2. do I need a jira ? how proceed ? For now, I only published it on my github account in a separate project Thanks 2016-02-04 16:52 GMT+01:00 Jason Altekru

Re: REGEX search Operator

2016-02-04 Thread Jason Altekruse
I think you should actually just put the function in Drill itself. System native functions are implemented in the same interface as UDFs, because our mechanism for evaluating them is very efficient (we code generate code blocks by linking together the bodies of the individual functions to evaluate

Re: REGEX search Operator

2016-02-04 Thread John Omernik
I'd be curios on how you are implemeting the regex... using Java's regex libraries? etc. I know one thing with Hive that always bothered me was the need to double escape things. '\d\d\d\d-\d\d-\d\d' needed to be '\\d\\d\\d\\d-\\d\\d-\\d\\d' of we can avoid that it would be AWESOME. On Thu, Feb

Re: REGEX search Operator

2016-02-04 Thread Nicolas Paris
John, Jason, 2016-02-04 18:47 GMT+01:00 John Omernik : > I'd be curios on how you are implemeting the regex... using Java's regex > libraries? etc. > ​Yeah, I use java.util.regex ​ > I know one thing with Hive that always bothered me was the need to double > escape things. > > '\d\d\d\d-\d\d-\d

Re: REGEX search Operator

2016-02-04 Thread Jason Altekruse
Tip for navigating large Github repos. You can type 't' when looking at the folder structure to open a fast global search. Searching for the functions is a little extra-complicated in Drill because we actually generate a bunch of them to cover all of the types. This means that source code templates

Re: REGEX search Operator

2016-02-04 Thread John Omernik
So my question on the double escape, is there no way to handle that so the user can use single escaped regex? I know many folks who use big data platform to test large complex regexes for things like security appliances, and having to convert the regex seems like a lot of work if you consider every

Re: REGEX search Operator

2016-02-04 Thread Nicolas Paris
You mean: userRegex=>javaRegex "\d" => "\\d" "\w" => "\\w" "\n" => "\n" I can do that thanks to regex I guess. I will give a try 2016-02-04 19:37 GMT+01:00 John Omernik : > So my question on the double escape, is there no way to handle that so the > user can use single escaped regex? I know many

Re: REGEX search Operator

2016-02-04 Thread John Omernik
Ya, do you see where I am coming from here? Let's let the users submit regex in the pure form if possible, and code the nuances of java regex behind the scenes. I think it would be a great way to make Drill very accessible and desirable. I think what happened in Hive is the regex commands started

Re: REGEX search Operator

2016-02-05 Thread Nicolas Paris
John, Sorry for that, this already work as expected. Give it a try, this is so easy to deploy SELECT first_name FROM cp.`employee.json` WHERE contains(first_name,'\w+') LIMIT 5; first_name | ---| Sheri | Derrick| Michael| Maya | Roberta| 2016-02-04 20:41 GMT+01:00

Re: REGEX search Operator

2016-02-09 Thread John Omernik
Nicolas, not really sure what's happening here. it compiled fine, but when I run it I get this error. The jar is distributed to my bits, I validated that... it's in the DRILL_HOME/jars/3rdparty folder on every bit... do I need to do something more than that? select count(1) from view_myview wher

Re: REGEX search Operator

2016-02-09 Thread Nicolas Paris
Hi John, They are actualy two jars to put in the folder (generated in /target). Have you restarted drill after ? 2016-02-09 16:20 GMT+01:00 John Omernik : > Nicolas, not really sure what's happening here. it compiled fine, but when > I run it I get this error. The jar is distributed to my bi

Re: REGEX search Operator

2016-02-09 Thread John Omernik
I copied both files and it appears to work, but after some testing, I am getting inconsistent results, see below. I ran three queries. first a like looking for domain names that end in .com (domain_name like '%.com' that returned a count of 9.8 million. Then I tried the contains, with '\.com$' whi

Re: REGEX search Operator

2016-02-09 Thread Nicolas Paris
John, About the escape, I will explore that question. About your query, you may try this pattern : select count(1) from view_mydata where srcday = '2016-02-05' and contains(domain_name, '.*\\.com$'); 2016-02-09 17:19 GMT+01:00 John Omernik : > I copied both files and it appears to work, but aft

Re: REGEX search Operator

2016-02-09 Thread Nicolas Paris
John, I realized I'd make a modification in order your query work. Then I updated the github project. select count(1) from view_mydata where srcday = '2016-02-05' and contains(domain_name, '\\.com$'); will work now. (just redeploy the jars) I will try to make : select count(1) from view_mydata whe