Thanks Charles, that worked even on my 1.8. Drill folks: We need to do some documentation updates. We've added functions (like REGEXP_MATCHES, and it's in 1.8, so I am not sure where it was added) and other functions like SPLIT and yet no mention in https://drill.apache.org/docs/string-manipulation/
So, yes, this is "meh" work compared to programming all the cool things in Drill. But there are a number of reasons that this needs to be done besides common practices. 1. Users, and more importantly POTENTIAL users get frustrated when trying to use drill for the first time. Coming from other Big Data systems like Hive, not having Regex, split, and other functions is frustrating. But what is more frustrating is to find that they actually exist, and are just not documented. Nothing will turn people off faster. 2. Without the knowledge of these functions, people try "hacky" work arounds like what I did, killing performance, and setting Drill in a bad light. 3. It provides an over all feeling of lack of effort by the community. I am know that resources are not unlimited, and these things need to be addressed by "someone" but issues like this are really important for getting more people into the community who may be able to help contribute! 4. I think as part of developer review and pull requests that add functions/functionality should require a pull request to also provide a documentation update. This helps to ensure that the docs keep up to date, as well as keeping users appraised of what is happening... i.e. it's a good "feeling" to see a great tool like Drill "improving" with new functionality. Please, folks, we need to do some one time clean up (go back through pull requests to ensure all functions are documented up to now) and then then get processes in place to ensure ongoing updates. Thanks John Omernik On Tue, Feb 28, 2017 at 10:15 AM, Charles Givre <[email protected]> wrote: > Hi John, > I believe that Drill 1.9 includes a REGEXP_MATCHES( <source>, <pattern> ) > function which does what you'd expect it to. I'm not sure when this was > introduced, so it maybe in earlier versions of Drill. > Best, > -- C > > On Tue, Feb 28, 2017 at 11:03 AM, John Omernik <[email protected]> wrote: > > > I have a data set that has birthdays in YYYY-MM-DD format. > > > > Most of this data is great. I am trying to compute the age using > > > > EXTRACT(year from age(dob)) > > > > > > But some of my data is crapola... let's call it alternative data... > > > > > > When I try to run the Extract function, I get > > > > Error: SYSTEM ERROR: IllegalFieldValueException: Value 0 for monthOfYear > > must be in the range [1,12] > > > > Fragment 5:17 > > > > [Error Id: 62f90784-c9f4-4362-9710-a37464fc801a on drillnode:20005] > > > > > > I've tried an ugly where clause, and this works: > > > > where > > > > (dob LIKE '%-01-%' or dob LIKE '%-02-%' or dob LIKE '%-03-%' or dob LIKE > > '%-04-%' or dob LIKE '%-05-%' or dob LIKE '%-06-%' or dob LIKE '%-07-%' > or > > dob LIKE '%-08-%' or dob LIKE '%-09-%' or > > > > dob LIKE '%-1-%' or dob LIKE '%-2-%' or dob LIKE '%-3-%' or dob LIKE > > '%-4-%' or dob LIKE '%-5-%' or dob LIKE '%-6-%' or dob LIKE '%-7-%' or > dob > > LIKE '%-8-%' or dob LIKE '%-9-%' or > > > > dob LIKE '%-10-%' or dob LIKE '%-11-%' or dob LIKE '%-12-%') > > > > > > But WOW is that ugly. I could add the jar for regex contains, and make it > > much easier (do we have a regex search function built into drill? I think > > we should at this point...) > > > > > > Is there another way to say try the extra function, and catch a failure, > > and ignore on failure? What if we had a cast function that returned NULL > > on failure so we could use it in the where clause? Any other more > elegant > > ways to handle this? > > > > > > Thanks! > > > > > > John > > >
