On 09/06/2013 11:53 AM, Tobias Ednersson wrote:
I have a master degree in computational linguistics and basic knowledge of Java.
I am looking for a way to learn java "properly" while contributing to some
nlp-project.
I am more of a linguist than a mathematician.
What I would like to do to begin with is simple qa, fixing simple bugs and such.
I have browsed through the apache documentation on how to contribute, but any
further pointers would be greatly apreciated.
Where do I start to get to know the project? The most straightforward approach
would of course be to check out the source code
and have a look at it. Is this a good strategy?
A good way to get started it to train a component on your own and then
use it to tag some sample text. That should teach you the
very basics about how to use OpenNLP.
Reading through our source code is probably the best way to get a deeper
understanding on how things work,
there are a few patterns which are repeated over and over again through
our components. The easiest way to understand
those it through reading the code of some of the simpler components such
as the Document Categorizer, Tokenizer or Sentence Detector,
Have a look at our issue tracker to find open bugs or features which
might be of interest for you to work on.
Here is a list of all open issues:
https://issues.apache.org/jira/issues/?jql=project%20%3D%20OPENNLP%20AND%20status%20%3D%20Open%20ORDER%20BY%20priority%20DESC
It would be very valuable to find a new contributor/committer for our
machine learning code, maybe that could be something for you.
We have some serious bug in our L-BFGS training code, have a look here
OPENNLP-338 and the follow up issue to fix the bug OPENNLP-569.
Anyway that might be a difficult to fix issue.
Another interesting issue could be OPENNLP-31, it is about writing
evaluation code for the parser component. The parser is another area which
is currently lacking maintenance.
HTH,
Jörn