Hi All, Sorry for the naive questions...
I am new to NLP and have been reading about maximum entropy (MaxEnt) approach for POS tagging. Now, my questions are: 1. the maximum entropy method produces a "model" based on the training data (already tagged text body)...so this model is *the* probability distribution that guarantees all the features and also maximizes the entropy value? 2. how does this "model/probability distribution function" look? do we have an example, or a place we can actually see what is inside this model? does it look like a table? a collection of probabilities and features? 3. how is this "model" actually used? is there a place we can see a simple example... such as if the sentence is "I am confused", then the model is somehow checked/searched and will tag all the above 3 words? what if a new sentence that does NOT follow any existing feature is submitted to the model? I have been googling, wishing to see a simple Hello World kind of example, from building the training data, to the model, to how actually the model is used... but no luck. Also read the paper by Adam Berger, still don't really understand how the model looks, and how it is actually used... I appreciate any help! thanks!
