Re: sentence detector with abbreviations not working

Adi Tue, 08 Jan 2013 07:42:47 -0800

James Kosin <james.kosin@...> writes:


> How many sentences do you have in the training set used to train your model?
> What parameters did you use?
> Do the sentences have a variation of sentences with and without 
> abbreviations?
> 
> James
> 


Hi James,

I had a training corpus of around 1200 sentences.
Some of these sentences had abbreviations, but im trying to get it to perform
better with unseen abbreviations.

Which is why im trying the abbreviations dictionary.

I did not supply any training parameters, the documentation for what exactly to
supply is a little unclear. Could this be the issue?

________________________________________________________________________________
Dictionary abbrDict = new Dictionary();
abbrDict = new Dictionary( new FileInputStream(new File(pathToAbbr)));
ObjectStream<String> lineStream = new PlainTextByLineStream(new
FileInputStream(pathToData), "UTF-8");
ObjectStream<SentenceSample> sampleStream = new 
SentenceSampleStream(lineStream);

SentenceDetectorFactory sdfac = new SentenceDetectorFactory("en", true,
abbrDict, null);
TrainingParameters trainParams = new TrainingParameters();
model = SentenceDetectorME.train("en", sampleStream, sdfac, trainParams);


_______________________________________________________________________________


this is the format of the abbreviations xml file im using, i got this from one
of the forums as well.

<?xml version="1.0" encoding="UTF-8"?>
<dictionary case_sensitive="false">
        <entry>
                <token>tel.</token>
        </entry>
        <entry>
                <token>Jr.</token>
        </entry>
        <entry>
                <token>Mrs.</token>
        </entry>
</dictionary>

_______________________________________________________________________________

Thanks again for your assistance.

Adi

Re: sentence detector with abbreviations not working

Reply via email to