On Fri, May 15, 2009 at 11:18 PM, Barry Brevik <bbre...@stellarmicro.com> wrote: > I am running Active Perl 5.8.8. > > I am converting a large enterprise database into a new system and have > run across a free-form text field in which users have entered all manner > of garbage. > > One scenario is where two sentences have been run together with no > ending '.' or space. Here are some examples: > > madeStyle > facilitatedOne > Anti-magneticQuality > > As you can see, the new sentence begins with an upper-case letter, so if > I can just break apart the construct like this I'll be OK: "madeStyle" > should become "made. Style". > > Difficulty: the fields contain hundreds of words both preceding and > following the "bad" words, so I have to be able to pick out the > lower-case words that contain one embedded upper-case character. > > Ant ideas? > > Barry Brevik
Hi Barry, Maybe something like this would help: $ cat test.txt madeStyle facilitatedOne Anti-magneticQuality $ cat test.txt |perl -pe 's/(\w+)([A-Z])/\1\. \2/g' made. Style facilitated. One Anti-magnetic. Quality Regards, Ari Constancio _______________________________________________ ActivePerl mailing list ActivePerl@listserv.ActiveState.com To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs