Hello, I recognized in some recent discussions that the parser training has to be customized for different languages regarding head rules and punctuation markers.
My question now before I open a jira issue. Does all this customisation for a language make sense because the real differnces come from different POS Models. As i understand it right in the code I have to provide punctuation types. But those are total dependent on the POS Model. I my case I use a STTS Tagger and punctuations are marked with $., $( or $, . Furthermore the ( causes problems within the constituents stack. I need to encode them. Now the question: Would it be easier to just replace the punctuations as they are hardcoded in the head rules class? Would it be better to "refacture" the head rules class so that we can use 2 external files (1 for the rules and one for the Tagset or the punctuations within the tagset). Thanks for any kind of advice Andreas -- Andreas Niekler, Dipl. Ing. (FH) NLP Group | Department of Computer Science University of Leipzig Johannisgasse 26 | 04103 Leipzig mail: [email protected]
