Re: Regarding performance of opennlp entity extraction modals

2015-03-16 Thread Joern Kottmann
Hello, I don't have any numbers for you. The performance depends highly on the model you are using, the configured feature generation and the number of features in your training data. To get a good number you probably have to run a test on your machines. All modern CPUs have multiple cores these

Re: Student looking to contribute toward OpenNLP

2015-03-16 Thread Rohit Shinde
Okay, I have no problem with that. I'll look over some other issues. In the meantime, I think I would like to work on medical de-identification. How would I go about starting this work? What all would I need to know? On Mon, Mar 16, 2015 at 7:15 PM, Joern Kottmann kottm...@gmail.com wrote:

Re: Student looking to contribute toward OpenNLP

2015-03-16 Thread Joern Kottmann
Hello, thanks for your interest in OpenNLP. We already have a lot of candidates for those GSOC issues. You are welcome to suggest something you would like to work on here on the dev list, create an issue for it and contribute some code to solve it. The best way to get started is probably to

Regarding performance of opennlp entity extraction modals

2015-03-16 Thread Anuj Chopra
hi, i wanted some information regarding the performance of opennlp entity extraction modals in documents/seconds and Mb/seconds. Currently I am using person, location, organisation and money extraction modals. If possible, please tell the speeds when combination of modals is used too. Thank you

Re: Student looking to contribute toward OpenNLP

2015-03-16 Thread Rohit Shinde
I would certainly like to get involved in this then. I looked over the paper and its results were highly positive. So does this mean that we would be implementing their model that gave such good results? Also, I was looking at the OpenNLP issues on the JIRA page and I really liked this one--

Re: Student looking to contribute toward OpenNLP

2015-03-16 Thread andy mcmurry
Opennlp is a standard lib used by many apache NLP projects. The clinical text engine (ctakes.apache.org) is one such use of open NLP. There is a medical data privacy engine (de-identification) that does medical concept recognition and privacy features described in the paper. We used it to conduct