I have tried to train NER for Italian Addresses using the following train
data; this is just an extract because I used a train file of 50.000 records.

VIA <START:street> FRANCESCO ZANARDI <END> <START:number> 985 <END>
<START:zip> 40131 <END> <START:town> BOLOGNA <END> <START:province> BO <END>
VIA <START:street> STEFANO BORGIA <END> <START:number> 151 <END>
<START:zip> 00168 <END> <START:town> ROMA <END> <START:province> RM <END>
VIALE <START:street> ITALIA <END> <START:number> 40 <END> <START:zip> 83100
<END> <START:town> AVELLINO <END> <START:province> AV <END>
PIAZZA <START:street> ROMA <END> <START:number> 15 <END> <START:zip> 63100
<END> <START:town> ASCOLI PICENO <END> <START:province> AP <END>


I have used the following line command to train:

C:\Programmi\apache-opennlp-1.5.2-incubating\bin>opennlp.bat
TokenNameFinderTrainer -encoding UTF-8 -lang it -data
../traindata/it-ner-address.train -model ../models/it/it-ner-address.bin


Then I have run a Name Finder Tool with the following connand:
C:\Programmi\apache-opennlp-1.5.2-incubating\bin>opennlp.bat
TokenNameFinder ../models/it/it-ner-address.bin <
../input/it-ner-address.txt > ../output/it-ner-address.txt
using a small file of 100 records and I have received the following results
(still this is just an extract):

PZA <START:number> GIOVANNI FONTANA <END> <START:zip> 1 <END> <START:town>
60125 <END> <START:province> ANCONA <END> <START:province> AN <END>
VIA <START:number> A. GARIBALDI <END> <START:zip> 56 <END> <START:town>
60019 <END> <START:province> SENIGALLIA <END> <START:province> AN <END>
VIA <START:number> A. GARIBALDI <END> <START:zip> 56 <END> <START:town>
60019 <END> <START:province> SENIGALLIA <END> <START:province> AN <END>
VIA <START:zip> ACHILLE GRANDI <END> <START:zip> 21 <END> <START:street>
INT <END> <START:number> INT A <END> <START:street> 23891 BARZANO' <END>
<START:street> LC <END>
VIA <START:number> AGRARIA <END> <START:zip> 2 <END> <START:town> 60035
<END> <START:province> JESI <END> <START:province> AN <END>
VIA <START:number> AGRARIA <END> <START:zip> 2 <END> <START:town> 60035
<END> <START:province> JESI <END> <START:province> AN <END>
VIA <START:street> ALBERTO DA GIUSSANO <END> <START:number> 39 INT <END>
<START:zip> I <END> <START:town> 20030 <END> <START:street> SEVESO <END>
<START:street> MB <END>
VIA <START:number> AMEDEO <END> <START:zip> 51A <END> <START:town> 24040
<END> <START:province> VERDELLINO <END> <START:province> BG <END>
VIA <START:street> AMEDEO DI SAVOIA 15 INT <END> <START:zip> INT <END>
<START:town> 46040 <END> <START:street> CASALROMANO <END> <START:street> MN
<END>
VIA <START:number> ANTONIO GRAMSCI <END> <START:zip> 14 <END> <START:town>
61040 <END> <START:town> MONDAVIO PU <END>
VIA <START:town> ARNETTA <END> <START:zip> 20 <END> <START:street> INT
<END> <START:number> INT <END> <START:zip> B <END> <START:town> 21045 <END>
<START:province> GAZZADA SCHIANNO <END> <START:province> VA <END>
VIA <START:number> BRESCIA <END> <START:zip> 31 <END> <START:town> 26013
<END> <START:province> CREMA <END> <START:province> CR <END>
VIA <START:zip> C. CAVOUR <END> <START:zip> 6 <END> <START:street> PRESSO
<END> <START:number> INT <END> <START:zip> FARMA <END> <START:town> 60033
<END> <START:province> CHIARAVALLE <END> <START:province> AN <END>
VIA <START:number> CAMERANO <END> <START:zip> 7 <END> <START:town> 62019
<END> <START:province> RECANATI <END> <START:province> MC <END>
VIA <START:town> CANDIA <END> <START:street> 350 <END> <START:street> INT
<END> <START:zip> INT E <END> <START:town> 60131 <END> <START:province>
ANCONA <END> <START:province> AN <END>
VIA <START:number> CESARE BECCARIA <END> <START:zip> 49 <END> <START:town>
60019 <END> <START:province> SENIGALLIA <END> <START:province> AN <END>
VIA <START:zip> CESARE PAVESE <END> <START:zip> 28 <END> <START:street> INT
<END> <START:zip> INT INT <END> <START:town> 46030 <END> <START:town>
BIGARELLO MN <END>




The results are clearly not  good. Do you have any idea of how I could
improve them ? I am new to Opennlp is there any parameter that I should use
when running the training?

Mauro

Reply via email to