Hi, Marv Boyes wrote: > Hello, all. I've been tasked with migrating a large MS Works > "database" into the 21st century. The thing's original setup didn't > enforce any sort of standardization in data entry, so there are nearly > as many different formats and styles in the data as there have been > people entering it. My best bet seems to be to hammer things into > shape with a CSV version of the data before even thinking of trying to > drop it into a new database app. Since it's plain text, Vim seems the > perfect tool for the job. :) > > I could use some pointers on search and replace with regular > expressions. I'm sure this will be painfully basic to most of you, but > I can't seem ot get the hang of it for this particular job. Most of > the problem is with dates, in that I have a mishmash of formats. Most > of them are in dashed format, but there's not even much uniformity > _there_: some are MM-DD-YYYY, some are M-D-YY, and so on. What I'd > like to do is reformat them en masse as MM/DD/YYYY; preserving the > original values, replacing dashes with slashes, putting zeroes in > front of existing single digits, and expanding two-digit years into > four digits by bolting on "20" at the front. > > For example, let's say I have some dates that look like this: > > 7-30-05 > 12-5-2006 > 10-2-06 > > What I'd like to end up with is this... > > 07/30/2005 > 12/05/2006 > 10/02/2006 > > ...without, of course, having to re-type every single one by hand. ;)
if you are sure that there are no dates from before 2000 the following command should do the job (all on one line): :%s,\<\(\d\+\)[-/]\(\d\+\)[-/]\%(20\)\?\(\d\d\)\>,\=(submatch(1) < 10 ? '0' : '') . submatch(1) . '-' . (submatch(2) < 10 ? '0' : '') . submatch(2) . '-' . '20' . submatch(3), I have used commas as separators so that there is no need to escape the slashes used between the parts for month, day, and year. The regex part is quite easy: we look for something word-like ("\<...\>") which consists of one or more digits ("\d\+"), a dash or a slash ("[-/]"), some digits, a second dash or slash, and two or four digits; if the third number has four digits, the first two must be 20 ("\%(20\)\?\(\d\d\)"). I used VIM's non-capturing parentheses to make clear that the content of "\%(20\)" is not needed later. If this expression matches, the submatches 1, 2, and 3 contain month, day, and year, respectively. Generating the replacement is simple, too; the expression is only longer (therefore I have split it on three lines here): \=(submatch(1) < 10 ? '0' : '') . submatch(1) . '-' . (submatch(2) < 10 ? '0' : '') . submatch(2) . '-' . '20' . submatch(3) It uses the "\=" special register to evaluate an expression. "submatch(1)" contains the month. If it is less than 10 it has only one digit. In this case the month is prefixed with a zero. The same is true for the day in "submatch(2)". "submatch(3)" only contains the second to last digits of the year, because we used non-capturing parentheses. So we always have to prefix it with '20'. Those three strings are then concatenated with dashes between them. Regards, Jürgen -- Sometimes I think the surest sign that intelligent life exists elsewhere in the universe is that none of it has tried to contact us. (Calvin)