I have a set of documents with filenames that give a good indication of content.
A filename of 12 digits (I think this is [0-9]{12} as a regular expression)
with the extension html is a troubleshooting guide, the number being an error
code.
A filename with two or three letters, then a minus (which would be [a-z]{2,3}-
I think), then a known string means the document is about a particular subject;
I have a list of the known strings matched to subjects.
What I would like to do, is have my indexer create a field named "category",
populated with either the string "troubleshooting" or with the known string
extracted from the filename.
Examples:
For a file named 0000000000111.html the indexer adds the field "category" with
the value "troubleshooting".
For a file named xxx-cal-123.html the indexer adds the field "category" with
the value "cal".
For a file named xx-qv-(9).html the indexer adds the field "category" with the
value "qv".
Is there a way to do that?
__________________________________________________
Do You Yahoo!?
Tired of spam? Yahoo! Mail has the best spam protection around
http://mail.yahoo.com