I was thinking to train with a file made in this way:

via<START:street>massarenti<END
><START:number>300<END>,<START:town>Bologna<END>,<START:province>BO<END>,<START:Country>IT>END>.
piazza<START:street>maggiore<END
><START:number>3<END>,<START:town>Trento<END>,<START:province>TN<END>,<START:Country>IT>END>
............


via (meaning is street) and piazza (meaning is square) are two descriptors
that could not be classified according to my opinion.

ciao

On Fri, Apr 20, 2012 at 3:29 PM, Jim - FooBar(); <[email protected]>wrote:

>  On 20/04/12 14:16, mauro fraboni wrote:
>
>> I am investigating if it is possible to use OpenNLP to parse italian post
>> addresses.
>> I do not want to validate the input address using an official address
>> database; I just need to divide a single address string into its
>> individual
>> component parts and I thought to use NameFinder.
>> My idea was to train Name Finder using some italian addresses indicating
>> in
>> training data the parts like Street, Town, Province, Post Code, Country
>> Do you think that it can work? Someone has experience about it?
>>
>> Thanks and ciao.
>>
>>
> Hmmm, that sounds like it should work....however you don't want to
> separate your entities to Street, Town, Province, Post Code, Country etc
> cos then how are you going to join them to get your 'real' entity
> (address)? I would say keep the whole address as 1 entity and produce some
> training data that mark the whole thing...of course if you already have
> some training is better otherwise you will spend a bit of time creating
> your annotated corpus...
>
> My logic says that this is the way to go - maybe I'm wrong is some way....
> Any different opinions anyone?
>
> Jim
>
> ps. In your first sentence did you by any chance mean to say "recognise"
> instead of "parse"?
>

Reply via email to