On Fri, May 15, 2009 at 11:18 PM, Barry Brevik <bbre...@stellarmicro.com> wrote:
> I am running Active Perl 5.8.8.
>
> I am converting a large enterprise database into a new system and have
> run across a free-form text field in which users have entered all manner
> of garbage.
>
> One scenario is where two sentences have been run together with no
> ending '.' or space. Here are some examples:
>
>    madeStyle
>    facilitatedOne
>    Anti-magneticQuality
>
> As you can see, the new sentence begins with an upper-case letter, so if
> I can just break apart the construct like this I'll be OK:  "madeStyle"
> should become  "made. Style".
>
> Difficulty: the fields contain hundreds of words both preceding and
> following the "bad" words, so I have to be able to pick out the
> lower-case words that contain one embedded upper-case character.
>
> Ant ideas?
>
> Barry Brevik

Hi Barry,

Maybe something like this would help:

$ cat test.txt
madeStyle
facilitatedOne
Anti-magneticQuality

$ cat test.txt |perl -pe 's/(\w+)([A-Z])/\1\. \2/g'
made. Style
facilitated. One
Anti-magnetic. Quality

Regards,
Ari Constancio
_______________________________________________
ActivePerl mailing list
ActivePerl@listserv.ActiveState.com
To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs

Reply via email to