Hi Yakov, Yes, you can use the POS tagger to tag with whatever categories you choose.
If we take your original example "*The quick brown fox_animal jumps_action over the lazy dog_animal" make sure you tag all tokens,e.g. "The_NA quick_NA brown_NA fox_animal jumps_action over_NA the_NA lazy dog_animal", where NA just means not-applicable. You choose your categories... Then you can only extract those words and categories you are interested in. Then you'll need to tag some data you can train on, about 15 000 examples or more. You can have a POS tagging dictionary in addition, which will help diminish the search space of possible tags for a token. You can have the same tags across languages but each language should have its own training data and dictionary. However, I am not sure about how successful the approach will be, where you only need to do partial annotation. What do you want to use it for? Maybe there are better options... Svetoslav ________________________________________ Från: Yakov Keranchuk <[email protected]> Skickat: den 29 augusti 2013 12:44 Till: [email protected] Ämne: Re: category tagging So I found simple example in sources: WordTagSampleStreamTest.java, it parses string "This_x1 is_x2 a_x3 test_x4 sentence_x5 ._x6" using POSSample. As I understand, with normal approach there are few steps for each language: 1. collect data for model 2. create POS dictionary like this: <dictionary> <entry tags="x1"> <token>This</token> </entry> <entry tags="x2"> <token>is</token> </entry> <entry tags="x3"> <token>a</token> </entry> ... 3. learn model with this dictionary Is it right approach? Is POS Tagger appropriate for this task? Thanks in advance, Yakov On Tue, Aug 27, 2013 at 6:31 PM, Yakov Keranchuk <[email protected]>wrote: > Hi > > Is it possible to make tagging for tokens with own rules? > Example: *The quick brown fox_animal jumps_action over the lazy dog_animal > * > * > * > Do we need to create custom dictionary for POS tagger? > If it so can there be only one dictionary for a few languages? > > Best regards, > Yakov >
