I was thinking to train with a file made in this way: via<START:street>massarenti<END ><START:number>300<END>,<START:town>Bologna<END>,<START:province>BO<END>,<START:Country>IT>END>. piazza<START:street>maggiore<END ><START:number>3<END>,<START:town>Trento<END>,<START:province>TN<END>,<START:Country>IT>END> ............
via (meaning is street) and piazza (meaning is square) are two descriptors that could not be classified according to my opinion. ciao On Fri, Apr 20, 2012 at 3:29 PM, Jim - FooBar(); <[email protected]>wrote: > On 20/04/12 14:16, mauro fraboni wrote: > >> I am investigating if it is possible to use OpenNLP to parse italian post >> addresses. >> I do not want to validate the input address using an official address >> database; I just need to divide a single address string into its >> individual >> component parts and I thought to use NameFinder. >> My idea was to train Name Finder using some italian addresses indicating >> in >> training data the parts like Street, Town, Province, Post Code, Country >> Do you think that it can work? Someone has experience about it? >> >> Thanks and ciao. >> >> > Hmmm, that sounds like it should work....however you don't want to > separate your entities to Street, Town, Province, Post Code, Country etc > cos then how are you going to join them to get your 'real' entity > (address)? I would say keep the whole address as 1 entity and produce some > training data that mark the whole thing...of course if you already have > some training is better otherwise you will spend a bit of time creating > your annotated corpus... > > My logic says that this is the way to go - maybe I'm wrong is some way.... > Any different opinions anyone? > > Jim > > ps. In your first sentence did you by any chance mean to say "recognise" > instead of "parse"? >
