Hi,

Marv Boyes wrote:
> Hello, all. I've been tasked with migrating a large MS Works
> "database" into the 21st century. The thing's original setup didn't
> enforce any sort of standardization in data entry, so there are nearly
> as many different formats and styles in the data as there have been
> people entering it. My best bet seems to be to hammer things into
> shape with a CSV version of the data before even thinking of trying to
> drop it into a new database app. Since it's plain text, Vim seems the
> perfect tool for the job. :)
>
> I could use some pointers on search and replace with regular
> expressions. I'm sure this will be painfully basic to most of you, but
> I can't seem ot get the hang of it for this particular job. Most of
> the problem is with dates, in that I have a mishmash of formats. Most
> of them are in dashed format, but there's not even much uniformity
> _there_: some are MM-DD-YYYY, some are M-D-YY, and so on. What I'd
> like to do is reformat them en masse as MM/DD/YYYY; preserving the
> original values, replacing dashes with slashes, putting zeroes in
> front of existing single digits, and expanding two-digit years into
> four digits by bolting on "20" at the front.
>
> For example, let's say I have some dates that look like this:
>
>           7-30-05
>           12-5-2006
>           10-2-06
>
> What I'd like to end up with is this...
>
>           07/30/2005
>           12/05/2006
>           10/02/2006
>
> ...without, of course, having to re-type every single one by hand. ;)

if you are sure that there are no dates from before 2000 the following
command should do the job (all on one line):

  :%s,\<\(\d\+\)[-/]\(\d\+\)[-/]\%(20\)\?\(\d\d\)\>,\=(submatch(1) < 10 ? '0' : 
'') . submatch(1) . '-' . (submatch(2) < 10 ? '0' : '') . submatch(2) . '-' . 
'20' . submatch(3),

I have used commas as separators so that there is no need to escape the
slashes used between the parts for month, day, and year. The regex part
is quite easy: we look for something word-like ("\<...\>") which
consists of one or more digits ("\d\+"), a dash or a slash ("[-/]"),
some digits, a second dash or slash, and two or four digits; if the
third number has four digits, the first two must be 20
("\%(20\)\?\(\d\d\)"). I used VIM's non-capturing parentheses to make
clear that the content of "\%(20\)" is not needed later.

If this expression matches, the submatches 1, 2, and 3 contain month,
day, and year, respectively.

Generating the replacement is simple, too; the expression is only
longer (therefore I have split it on three lines here):

  \=(submatch(1) < 10 ? '0' : '') . submatch(1) . '-' .
    (submatch(2) < 10 ? '0' : '') . submatch(2) . '-' .
    '20' . submatch(3)

It uses the "\=" special register to evaluate an expression.
"submatch(1)" contains the month. If it is less than 10 it has only one
digit. In this case the month is prefixed with a zero. The same is true
for the day in "submatch(2)". "submatch(3)" only contains the second to
last digits of the year, because we used non-capturing parentheses. So
we always have to prefix it with '20'. Those three strings are then
concatenated with dashes between them.

Regards,
Jürgen

-- 
Sometimes I think the surest sign that intelligent life exists elsewhere
in the universe is that none of it has tried to contact us.     (Calvin)

Reply via email to