Anyone know if translate takes ranges, like some implementations? e.g., translate ('[a-z]', '[A-Z]')
Of course, that probably doesn't work for non-ascii characters. On Fri, Mar 1, 2013 at 11:24 AM, Tom Hall <thattommyh...@gmail.com> wrote: > Thanks Dean, > > I dont think translate would work as the set of things to remove is > massive. > Yeah, it's a one-off cleanup job while exporting to try redshift on our > datasets. > My guess is it's something about the way hive handles strings? Tried > "\\ufffd" as the replacement str but no joy either. > > Cheers again, > Tom > > > > On 1 March 2013 17:08, Dean Wampler <dean.wamp...@thinkbiganalytics.com>wrote: > >> I think this should work, but you might investigate using the translate >> function instead. I suspect it will provide much better performance than >> using regexps. Also, Are you planning to do this once to create your final >> tables? If so, the performance overhead won't matter much. >> >> dean >> >> >> On Fri, Mar 1, 2013 at 10:52 AM, Tom Hall <thattommyh...@gmail.com>wrote: >> >>> I would like to remove unicode chars that are outside the Basic >>> Multilingual Plane [1] >>> >>> I thought >>> select regexp_replace(some_column,"[^\\u0000-\\uffff]","\ufffd") from >>> my_table >>> would work but while the regexp does work the replacement str does not >>> (I can paste in the literal �, which you may or may not be able to see here >>> but it somehow did not fell right) >>> >>> I saw Deans previous post on using octals [2] but I think \ufffd is >>> outside the allowable range. >>> >>> Cheers, >>> Tom >>> >>> >>> [1] >>> http://en.wikipedia.org/wiki/Plane_%28Unicode%29#Basic_Multilingual_Plane >>> [2] >>> http://grokbase.com/t/hive/dev/131a4n562y/unicode-character-as-delimiter >>> >> >> >> >> -- >> *Dean Wampler, Ph.D.* >> thinkbiganalytics.com >> +1-312-339-1330 >> >> > -- *Dean Wampler, Ph.D.* thinkbiganalytics.com +1-312-339-1330