Re: regexp_replace with unicode chars

Dean Wampler Fri, 01 Mar 2013 09:32:25 -0800

Anyone know if translate takes ranges, like some implementations? e.g.,

translate ('[a-z]', '[A-Z]')


Of course, that probably doesn't work for non-ascii characters.

On Fri, Mar 1, 2013 at 11:24 AM, Tom Hall <thattommyh...@gmail.com> wrote:

> Thanks Dean,
>
> I dont think translate would work as the set of things to remove is
> massive.
> Yeah, it's a one-off cleanup job while exporting to try redshift on our
> datasets.
> My guess is it's something about the way hive handles strings? Tried
> "\\ufffd" as the replacement str but no joy either.
>
> Cheers again,
> Tom
>
>
>
> On 1 March 2013 17:08, Dean Wampler <dean.wamp...@thinkbiganalytics.com>wrote:
>
>> I think this should work, but you might investigate using the translate
>> function instead. I suspect it will provide much better performance than
>> using regexps. Also, Are you planning to do this once to create your final
>> tables? If so, the performance overhead won't matter much.
>>
>> dean
>>
>>
>> On Fri, Mar 1, 2013 at 10:52 AM, Tom Hall <thattommyh...@gmail.com>wrote:
>>
>>> I would like to remove unicode chars that are outside the Basic
>>> Multilingual Plane [1]
>>>
>>> I thought
>>> select regexp_replace(some_column,"[^\\u0000-\\uffff]","\ufffd") from
>>> my_table
>>> would work but while the regexp does work the replacement str does not
>>> (I can paste in the literal �, which you may or may not be able to see here
>>> but it somehow did not fell right)
>>>
>>> I saw Deans previous post on using octals [2] but I think \ufffd is
>>> outside the allowable range.
>>>
>>> Cheers,
>>> Tom
>>>
>>>
>>> [1]
>>> http://en.wikipedia.org/wiki/Plane_%28Unicode%29#Basic_Multilingual_Plane
>>> [2]
>>> http://grokbase.com/t/hive/dev/131a4n562y/unicode-character-as-delimiter
>>>
>>
>>
>>
>> --
>> *Dean Wampler, Ph.D.*
>> thinkbiganalytics.com
>> +1-312-339-1330
>>
>>
>


-- 
*Dean Wampler, Ph.D.*
thinkbiganalytics.com
+1-312-339-1330

Re: regexp_replace with unicode chars

Reply via email to