If the documents follow certain rules/patterns, and you are looking for specific words/skills in the resume, wouldn't you be better served using regular expression based classifiers, rather than NLP based?
HTH.

On 5/29/2013 2:56 AM, Florin Langa wrote:
Hello everyone!

I have a question...maybe it a silly question but I don't know how to
manage it. I need to build a classifier for CV. In order to do this I
assume that I need to build a model file containing a set of skills. I have
a list of skills but I don't know how to build the input file. Here is a
sample of my input file:

Tiles and clinkers, setting experience Tile layer .
Silk screen printing Lead typesetter, printing shop .
CTI, computer telephony Alarm operator .
GifBuilder animation program Specialist book writer .
Gardening, study circle leadership Sports centre manager .
........
etc.

The first part, until the next capital letter is the skill name and the
second part is the job name.
Ex: Gardening, study circle leadership - skill name, Sports centre manager
- job name.

In order to create the actual training file I use the following command:

opennlp DoccatTrainer -encoding UTF-8 -lang en -data /tmp/jobs.txt -model
/tmp/en-language-jobs.bin

Now, my question is if the input file I am providing to the above command
has the right format.

Also, please note that I was able to create the training file but when
running the command

opennlp Doccat  /tmp/en-language-jobs.bin < /tmp/programmer.txt the results
are 100% irrelevant.

Best regards,
Florin


Reply via email to