SV: Sentence detection for user generated content

Svetoslav Marinov Tue, 21 Jan 2014 02:01:50 -0800

You should not use the general models especially for such data like tweets and 
Facebook! They are trained using a completely different set of data. No wonder 
you get a lot of errors.


My advice is to create a training set and retrain the sentence detector. It 
will get much better and then you will most probably need 3 models - 1 for 
tweets, 1 for FB and one for blogs. The underlying data can be quite different 
in all of these.

But maybe someone has some models that they can freely distribute…

Hope this helps.

Поздрави, 

Светослав
________________________________________
Från: Yasen Kiprov <[email protected]>
Skickat: den 21 januari 2014 10:45
Till: [email protected]
Ämne: Sentence detection for user generated content

Hello,

Has anyone tried training the sentence detector on user generated
content - tweets, facebook or forum posts? Is there a model available
for the purpose? With the default models I see lots of errors with
things like ...... , ?!!?!! , !!!, and also with incomplete sentences.

All the best,
Yasen

SV: Sentence detection for user generated content

Reply via email to