Re: Please review the 1.5.3 release announcement

2013-04-17 Thread Joern Kottmann
Yes, we are ready, everything is done. Lets send the announcement. Jörn On Wed, Apr 17, 2013 at 2:44 PM, William Colen wrote: > Jörn, thank you for updating the web site. I already added a news item. Now > are we ready to send the announce? > > > > On Mon, Apr 15, 2013 at 6:52 PM, Jörn Kottmann

Re: TokenNameFinder and Span probs

2014-05-07 Thread Joern Kottmann
Hello Mark, +1 for your second solution. I believe that is much more intuitive than calling a method afterwards to retrieve the prob for a Span. it is easier to use because the prob is delivered as part of the result and no user action is required to obtain it. We could use this solution everywhe

Re: Build failed in Jenkins: OpenNLP #476

2014-10-27 Thread Joern Kottmann
On Mon, 2014-10-27 at 19:15 +, Rodrigo Agerri wrote: > Hi, > > This is not caused by my latest commit, is it not? Your last commit just triggered the build. The build itself was successful. It failed afterwards when it tried to deploy the artifacts to the snapshot repo with: "503 Service Temp

What should we do with the SF models?

2014-10-28 Thread Joern Kottmann
Hi all, OpenNLP always came with a couple of trained models which were ready to use for a few languages. The performance a user encounters with those models heavily depends on their input text. Especially the English name finder models which were trained on MUC 6/7 data perform very poorly these

Re: Jenkins build is back to normal : OpenNLP_java8 #2

2014-10-29 Thread Joern Kottmann
Hello, I added an OpenNLP Java 8 build to the build server. This will hopefully inform us about problems with Java 8 in the future. Jörn On Wed, 2014-10-29 at 20:25 +, Apache Jenkins Server wrote: > See >

Re: 1.6.0 maven repo

2014-11-19 Thread Joern Kottmann
Hello, yes, that should be the current state. Can you please elaborate on the issue you have. Do you get an old version? We should try to make a release of 1.6.0, I think most issues are already solved and remaining bugs we will uncover during the manual testing phase. Jörn On Wed, 2014-11-19

Re: Need to speed up the model creation process of OpenNLP

2014-11-19 Thread Joern Kottmann
The runtime almost scales with the number of cores your CPU you have. If you have a 4 core CPU you might come down from 3 hours to 1 hour. To enabled it you need to train with the -params argument and provide a config file for the learner. There are samples shipped with OpenNLP. Jörn On Wed, 201

Build changed opennlp/pom.xml moved to root directory

2014-11-20 Thread Joern Kottmann
Hello everybody, we changed the structure of the project slightly. The main pom.xml used to be located in opennlp/pom.xml. This was done because an Eclipse workspace can't have files at the root level. The Maven convention is to have the file at the root level. I think it is time to move this file

Next release (was: Re: 1.6.0 maven repo)

2014-11-20 Thread Joern Kottmann
, 2014-11-20 at 07:33 +, Rodrigo Agerri wrote: > +1 to start making a release. I would like to be involved too. > > R > On 19 Nov 2014 23:40, "Joern Kottmann" wrote: > > > Hello, > > > > yes, that should be the current state. > > > > Can

Re: Word Sense Disambiguation

2015-01-19 Thread Joern Kottmann
Hello, +1 from me to just go ahead and implement the proposed approach. One goal of this implementation will be to figure out the interface we want to have in OpenNLP for WSD. We can later extend OpenNLP with more implementations which are taking different approaches. Jörn On Thu, 2015-01-15 at

Re: svn commit: r1655238 - /opennlp/trunk/

2015-01-28 Thread Joern Kottmann
You didn't remove any entries in your recent commit to them. We moved the main pom.xml from the opennlp folder to the root of the project. Now using eclipse with m2e creates the project files there and I thought it would be nice to have them in svn ignore. Maybe it is possible to consolidate the

Re: OpenNLP 1.6.0 RC 2 ready for testing

2015-01-28 Thread Joern Kottmann
The training and tagging test for the parser showed that it behaves differently than the 1.5.3 version. Does anybody know which changes causes that? Otherwise I will start bisecting. Jörn On Thu, 2015-01-22 at 17:55 -0200, William Colen wrote: > Hi all, > > Our second release candidate is ready

Re: svn commit: r1655546 - in /opennlp/trunk/opennlp-tools: pom.xml src/test/java/opennlp/tools/ngram/ src/test/java/opennlp/tools/ngram/NGramModelTest.java src/test/resources/opennlp/tools/ngram/ src

2015-01-29 Thread Joern Kottmann
On Thu, 2015-01-29 at 08:02 +, tomm...@apache.org wrote: > +String modelString = IOUtils.toString(nGramModelStream); > +String outputString = > out.toString(Charset.defaultCharset().name()); The XML serialization writes it in UTF-8. Shouldn't you use UTF-8 for this test too instead of

Re: svn commit: r1655546 - in /opennlp/trunk/opennlp-tools: pom.xml src/test/java/opennlp/tools/ngram/ src/test/java/opennlp/tools/ngram/NGramModelTest.java src/test/resources/opennlp/tools/ngram/ src

2015-01-29 Thread Joern Kottmann
This file should have an AL header. Jörn On Thu, 2015-01-29 at 08:02 +, tomm...@apache.org wrote: > Added: > opennlp/trunk/opennlp-tools/src/test/resources/opennlp/tools/ngram/ngram-model.xml > URL: > http://svn.apache.org/viewvc/opennlp/trunk/opennlp-tools/src/test/resources/opennlp/tools/ng

Re: svn commit: r1655546 - in /opennlp/trunk/opennlp-tools: pom.xml src/test/java/opennlp/tools/ngram/ src/test/java/opennlp/tools/ngram/NGramModelTest.java src/test/resources/opennlp/tools/ngram/ src

2015-01-29 Thread Joern Kottmann
Or if that is a problem for the test, you could also tell RAT to ignore it. On my machine the test fails. The two strings don't match. Jörn On Thu, 2015-01-29 at 09:59 +0100, Tommaso Teofili wrote: > right, thanks I'll fix both. > > Tommaso > > 2015-01-29 9:54

Re: svn commit: r1655546 - in /opennlp/trunk/opennlp-tools: pom.xml src/test/java/opennlp/tools/ngram/ src/test/java/opennlp/tools/ngram/NGramModelTest.java src/test/resources/opennlp/tools/ngram/ src

2015-01-29 Thread Joern Kottmann
It still fails in the assert. I didn't check but I guess the build server has the same problem. Jörn On Thu, 2015-01-29 at 10:25 +0100, Tommaso Teofili wrote: > even after my latest commit? If so I'll rearrange the test a bit. > > Tommaso > > 2015-01-29 10:21

Re: svn commit: r1655546 - in /opennlp/trunk/opennlp-tools: pom.xml src/test/java/opennlp/tools/ngram/ src/test/java/opennlp/tools/ngram/NGramModelTest.java src/test/resources/opennlp/tools/ngram/ src

2015-01-29 Thread Joern Kottmann
wrote: > I've just disabled that test, I'll fix it and re-enable it when done. > > Regards, > Tommaso > > 2015-01-29 10:51 GMT+01:00 Joern Kottmann : > > > It still fails in the assert. I didn't check but I guess the build > > server has the same pro

Re: Text Summarization module?

2015-02-02 Thread Joern Kottmann
He created this jira issue: https://issues.apache.org/jira/browse/OPENNLP-752 And the code is also attached to it. Thanks for the contribution. I will have a look and give my feedback directly on the issue. Jörn On Mon, 2015-01-19 at 11:43 -0800, Ramakrishna Soma wrote: > Hi Developers, > > W

Parser performance bug

2015-02-16 Thread Joern Kottmann
Hi all, the performance of the parser changed a bit. The output of the current version in 1.6.0 RC2 is different from the output of the 1.5.3 release. Even tough there shouldn't been any difference as far as I can see. The question of what caused that difference came up and I started to bisect it

Re: Word Sense Disambiguation

2015-02-16 Thread Joern Kottmann
On Sat, 2015-02-14 at 11:09 +0100, Aliaksandr Autayeu wrote: > Since you're perhaps deeper in this that others you seem to be the > best > candidate to make a proposal, to check the state of the art algorithms > and > devise general enough interface for all or most of them. One way could > be > to

Re: Word Sense Disambiguation

2015-02-16 Thread Joern Kottmann
On Mon, 2015-02-16 at 16:29 +0100, Aliaksandr Autayeu wrote: > Jörn, to avoid ambiguity in case you addressed me to propose a WSD > interface. I'd prefer Anthony to come up with a proposal, because he is > closer to the multiple WSD algorithms that would be nice to include in the > analysis. Sorry

Re: [GSoC2015] OPENNLP-758

2015-03-05 Thread Joern Kottmann
Hello, we got already two students for those two GSOC WSD tasks. They contacted us a while ago (see the WSD thread on this list) and set up the tasks so they can apply for it. I am not sure if it makes much sense to break the WSD tasks further down. Do you have something else in mind you could w

Re: Parser performance bug

2015-03-06 Thread Joern Kottmann
the parameters in > that file. We can't discard the possibility that there was a bug that was > fixed with the changes. > > > Regards, > William > > 2015-02-16 12:17 GMT-02:00 Joern Kottmann : > > > Hi all, > > > > the performance of the p

Re: Parser performance bug

2015-03-09 Thread Joern Kottmann
On Fri, 2015-03-06 at 21:07 +0100, Joern Kottmann wrote: > The parser still uses the old style of setting the beam size via the > constructor. Due to the changes to move that to the training time it > doesn't work anymore. The parser has to be changed to set the beam > size >

Re: Regarding performance of opennlp entity extraction modals

2015-03-16 Thread Joern Kottmann
Hello, I don't have any numbers for you. The performance depends highly on the model you are using, the configured feature generation and the number of features in your training data. To get a good number you probably have to run a test on your machines. All modern CPUs have multiple cores these

Re: Student looking to contribute toward OpenNLP

2015-03-16 Thread Joern Kottmann
Hello, thanks for your interest in OpenNLP. We already have a lot of candidates for those GSOC issues. You are welcome to suggest something you would like to work on here on the dev list, create an issue for it and contribute some code to solve it. The best way to get started is probably to look

Re: svn commit: r1670574 - /opennlp/trunk/opennlp-uima/src/main/java/opennlp/uima/namefind/NameFinder.java

2015-04-01 Thread Joern Kottmann
The adaptive data is cleared in the documentDone method. The statement in the issue that it is not cleared is not true afaik. Jörn On Wed, Apr 1, 2015 at 9:47 AM, wrote: > Author: tommaso > Date: Wed Apr 1 07:47:41 2015 > New Revision: 1670574 > > URL: http://svn.apache.org/r1670574 > Log: > O

Automated testing with public data

2015-04-14 Thread Joern Kottmann
Hi all, this time the progress with the testing for 1.6.0 is rather slow. Most tests are done now and I believe we are in a good shape to build RC3. Anyway it would have bee better to be at that stage month ago. To improve the situation in the future I would like to propose to automate all tests

Re: Automated testing with public data

2015-04-15 Thread Joern Kottmann
sers must be able to trust that > they incur no license restrictions beyond the ASL. > > Cheers, > > -- Richard > > On 14.04.2015, at 23:47, Joern Kottmann wrote: > > > Hi all, > > > > this time the progress with the testing for 1.6.0 is rather slow. Mos

Re: Automated testing with public data

2015-04-15 Thread Joern Kottmann
e can share it among the committers. Is that what you mean with proprietary data? Jörn On Wed, Apr 15, 2015 at 10:05 AM, Richard Eckart de Castilho < richard.eck...@gmail.com> wrote: > On 15.04.2015, at 09:39, Joern Kottmann wrote: > > > Some data sets are publicly available but

Re: Is it ok to post a job offer here?

2015-04-27 Thread Joern Kottmann
Should be fine. Any objections? Jörn On Thu, 2015-04-23 at 17:26 +0200, Thilo Goetz wrote: > Is it acceptable to post a job offer (NLP related) to this list? Thanks. > > --Thilo signature.asc Description: This is a digitally signed message part

Re: Automated testing with public data

2015-04-29 Thread Joern Kottmann
chard Eckart de Castilho < > richard.eck...@gmail.com>: > > > On 15.04.2015, at 10:23, Joern Kottmann wrote: > > > > > With publicly accessible data I mean a corpus you can somehow acquire, > > > opposed to the data you create on your own for a project. > >

Re: How to start contributing to OpenNLP

2015-05-12 Thread Joern Kottmann
Hello, the best way to start is to find something you feel comfortable doing. That could be fixing a bug or implementing a certain feature. Yes, have a look at JIRA there are many issues. Is there some component you would prefer working on? HTH, Jörn On Tue, May 12, 2015 at 5:34 PM, Haider Al

Re: GSoC 2015 - WSD Module

2015-05-22 Thread Joern Kottmann
Hello, one of the tasks we should start is, is to define the interface for the WSD component. Please have a look at the other components in OpenNLP and try to propose an interface in a similar style. Can we use one interface for all the different implementations? Jörn On Mon, May 18, 2015 at 3

W2VClassesDictionary class

2015-05-22 Thread Joern Kottmann
Hello, looks like this class was renamed into WordClusterDictionary. Can the class W2VClassesDictionary be removed? We shouldn't include it in RC4 when it is not necessary. Thanks, Jörn

OpenNLP RC4

2015-05-22 Thread Joern Kottmann
Hello, we should now be in a good state to do RC4. We finally solved the performance problems with the parser and a couple of very minor things where fixed as well (e.g NOTICE file update). A major addition since RC3 are the automated evaluation tests to speed up our release process. I hope this

Re: OpenNLP 1.6.0 RC 4 ready for testing

2015-05-28 Thread Joern Kottmann
The chunker and parser tests are fine now. Do you know what's the deal with the sentence detector? The compatibility test is marked as failed. Can we leave it like that or do we have to fix some bugs? Jörn On May 23, 2015 5:35 AM, "William Colen" wrote: > Our fourth release candidate is ready

Re: GIS API

2015-05-28 Thread Joern Kottmann
Yes, or we can also remove it directly. The package name changed anyway, we will not be able to support existing code using EventStream with 1.6.0 without breaking it. Please open a jia for it. Jörn On Thu, May 28, 2015 at 3:49 PM, Russ, Daniel (NIH/CIT) [E] < dr...@mail.nih.gov> wrote: > Sorry

Re: GIS API

2015-05-28 Thread Joern Kottmann
Looks like there are a couple of classes depending on it which can be all removed. Lets clean that up for 1.6.0. Jörn On Thu, May 28, 2015 at 3:51 PM, Joern Kottmann wrote: > Yes, or we can also remove it directly. The package name changed anyway, > we will not be able to support existin

Re: OpenNLP 1.6.0 RC 4 ready for testing

2015-05-29 Thread Joern Kottmann
t; > > Output in 1.6.0: > Imagem inline 3 > > > > > > I would ignore the change. > > > > > Thank you, > William > > > > 2015-05-28 7:28 GMT-03:00 Joern Kottmann : > The chunker and parser tests are fine now. > >

Re: GSoC 2015 - WSD Module

2015-06-01 Thread Joern Kottmann
Hello, I had a look at your APIs. Lets start with the WSDisambiguator. Should that be an interface? // returns the senses ordered by their score (best one first or only 1 in supervised case) String[] disambiguate(String inputText,int inputWordposition); Shouldn't we have a tokenized input? Or i

Re: GSoC 2015 - WSD Module

2015-06-03 Thread Joern Kottmann
We should not use remote resources. A remote service adds severe limits to the WSD component. A remote resource will be slow to query (compared to disk or memory), queries might be expensive (pay per request), the license might not allow usage in a way the ASL promises to our users. Another issue i

Re: GSoC 2015 - WSD Module

2015-06-05 Thread Joern Kottmann
Hello, yes, wordnet is fine, we already depend on it. I just think that remote resources are particular problematic. For local resources it boils down to their license. Here is the wordnet one: http://wordnet.princeton.edu/wordnet/license/ We might even be able to redistribute this here at Apac

Re: GSoC 2015 - WSD Module

2015-06-10 Thread Joern Kottmann
You can attach the patch to one of the issues, you can create an new issue. In the end it doesn't matter much, but important is that we make progress here and get the initial code into our repository. Subsequent changes can then be done in a patch series. Please try to submit the patch as quickly

Re: GSoC 2015 - WSD Module

2015-06-19 Thread Joern Kottmann
Hello, I will dedicate time tonight to get this pulled in the sandbox and will then also provide some feedback. We can then create new patches against the sandbox to fix further issues. Jörn On Fri, Jun 19, 2015 at 11:02 AM, Anthony Beylerian < anthonybeyler...@hotmail.com> wrote: > Thank you f

Re: WSD - Supervised techniques

2015-06-24 Thread Joern Kottmann
On Fri, 2015-06-19 at 21:42 +0900, Mondher Bouazizi wrote: > Hi, > > Actually I have finished the implementation of most of the parts of the IMS > approach. I also made a parser for the Senseval-3 data. > > However I am currently working on two main points: > > - I am trying to figure out how to

Re: GSoC 2015 - WSD Module

2015-06-25 Thread Joern Kottmann
On Wed, 2015-06-10 at 22:13 +0900, Anthony Beylerian wrote: > Hi, > > I attached an initial patch to OPENNLP-758. > However, we are currently modifying things a bit since many approaches need > to be supported, but would like your recommendations. > Here are some notes : > > 1 - We used extJWNL

Re: GSoC 2015 - WSD Module

2015-06-25 Thread Joern Kottmann
On Mon, 2015-06-22 at 00:55 +0900, Anthony Beylerian wrote: > Dear Jörn, > Thank you for that. > > After further surveying, I was thinking of beginning the implementation of an > approach based on context clustering as a next step. > Maybe similar to the one in [1] which relies on a public (CC-A

Re: GSoC 2015 - WSD Module

2015-06-25 Thread Joern Kottmann
On Mon, 2015-06-22 at 00:55 +0900, Anthony Beylerian wrote: > Dear Jörn, > Thank you for that. > > After further surveying, I was thinking of beginning the implementation of an > approach based on context clustering as a next step. > Maybe similar to the one in [1] which relies on a public (CC-A

Re: GSoC 2015 - WSD Module

2015-06-28 Thread Joern Kottmann
Yes, the performance testing has to be there, otherwise it is hard to tell if it works or not. Jörn On Mon, 2015-06-29 at 02:02 +0900, Anthony Beylerian wrote: > Dear Jörn, > > As a first milestone, for now we have the main interface with two > implementations (one unsupervised, one supervised)

Re: GSoC 2015 - WSD Module

2015-06-30 Thread Joern Kottmann
Can you please open some jira issues so we can better keep track of what has to be done. Jörn On Jun 28, 2015 10:23 PM, "Joern Kottmann" wrote: > Yes, the performance testing has to be there, otherwise it is hard to > tell if it works or not. > > Jörn > > On Mo

Re: [VOTE] Release OpenNLP 1.6.0 RC 6

2015-06-30 Thread Joern Kottmann
+1 in addition to the other tests I verified all the hashes and signatures. They are all good. Jörn On Jun 16, 2015 4:51 PM, "William Colen" wrote: > Hello, > > Lets vote to release RC 6 as OpenNLP 1.6.0. > > The testing of it is documented here: > https://cwiki.apache.org/confluence/display/OPE

Re: GSoC 2015 - WSD Module

2015-07-09 Thread Joern Kottmann
Please open a jira issues for this, and for other GSOC tasks. I would like to use jira to plan the outstanding tasks. Are you working on this currently? Jörn On Mon, 2015-06-22 at 00:55 +0900, Anthony Beylerian wrote: > Dear Jörn, > Thank you for that. > > After further surveying, I was thinkin

Re: Word Sense Disambiguator

2015-07-24 Thread Joern Kottmann
It would be nice if you could share instructions on how to run it. I also would like to give it a try. Jörn On Fri, Jul 24, 2015 at 4:54 AM, Anthony Beylerian < anthonybeyler...@hotmail.com> wrote: > Hello, > Yes for the moment we are only using WordNet for sense definitions.The > plan is to com

Re: svn commit: r1681259 - in /opennlp/trunk: opennlp-distr/pom.xml opennlp-docs/pom.xml opennlp-tools/pom.xml opennlp-uima/pom.xml pom.xml

2015-09-03 Thread Joern Kottmann
Hello, yes the github apache/opennlp repository is always synchronized with our subversion repository here at Apache. If you have a look you will see recent changes in there. Jörn On Tue, May 26, 2015 at 6:07 AM, Ethan Wang wrote: > Hey folks, > > is g...@github.com:apache/opennlp.git still an

Re: mallet addon

2015-09-29 Thread Joern Kottmann
Hello, this doesn't work with the 1.6.0 release, I build it for testing of one of the first drafts of the machine learning rewrite work we did for 1.6.0. There have been a few changes afterwards. Anyway, if you have a need for it I am happy to fix it up. We can also move it to the sandbox, releasi

Re: Out of Bounds Exception in BioCodec.class

2015-10-07 Thread Joern Kottmann
Hello, I can't see the exception. Can you post it just as text please. Thanks, Jörn On Wed, 2015-10-07 at 10:56 -0400, Blizzard, Zach wrote: > Hey Dev team, > > > > I have a quick question about the BioCodec class: I’m trying to create > my own model to train the OpenNLP program, but I’m run

Re: mallet addon

2015-10-12 Thread Joern Kottmann
wrote: > Hi, > > On Tue, Sep 29, 2015 at 3:41 PM, Joern Kottmann > wrote: > > We can also move > > it to the sandbox, releasing it at Apache might be more difficult since > > mallet pulls in incompatible licensed dependencies. But maybe that > changed, > > we

Re: mallet addon

2015-10-20 Thread Joern Kottmann
y-do-you-hate-crfs/ > > but if results are also worse in Maxent, that is intriguing. I will > look at the Mallet implementation to see if I find out something. > > R > > > > On Mon, Oct 12, 2015 at 4:07 PM, Joern Kottmann > wrote: > > Hello, > > > > fixed

Re: Question about OpenNLP and comparison to e.g., NTLK, Stanford NER, etc.

2015-11-11 Thread Joern Kottmann
Hello, It is definitely true that OpenNLP exists for a long time (more than 10 years), but that doesn't mean it wasn't improved. Actually it changed a lot in that period. The core strength of OpenNLP was always that it can be used really easy to perform one of the supported NLP tasks. This was

Re: Question about OpenNLP and comparison to e.g., NTLK, Stanford NER, etc.

2015-11-12 Thread Joern Kottmann
On Thu, 2015-11-12 at 19:50 +, Jason Baldridge wrote: > Having said that, there is a lot of activity in the deep learning > space, > where old techniques (neural nets) are now viable in ways they weren't > previously, and they are outperforming linear classifiers in task > after > task. I'm cur

Re: Question about OpenNLP and comparison to e.g., NTLK, Stanford NER, etc.

2015-11-12 Thread Joern Kottmann
On Thu, 2015-11-12 at 15:43 +, Russ, Daniel (NIH/CIT) [E] wrote: > 1) I use the old sourceforge models. I find that the source of error > in my analysis are usually not do to mistakes in sentence detection or > POS tagging. I don’t have the annotated data or the time/money to > build custom m

Language Model contribution

2016-02-17 Thread Joern Kottmann
Hello, I saw the language model commit. Thanks for contributing that! Would it be possible to get a short introduction to it? The interface is supposed to take a StringList. Wouldn't it be better if a user can just pass in a String instead? Otherwise he has to worry about tokenizing a string in

Re: Language Model contribution

2016-02-17 Thread Joern Kottmann
Ups, confused the language model you were working on with language detection. I think the interface is good as it is. Jörn On Wed, Feb 17, 2016 at 10:00 AM, Joern Kottmann wrote: > Hello, > > I saw the language model commit. Thanks for contributing that! > > Would it be possible

Re: Question about deprecated NameFinderME constructors

2016-03-08 Thread Joern Kottmann
There is a custom xml element where it can load a user defined class for feature generation. So you would add an element like this: https://opennlp.apache.org/documentation/1.6.0/manual/opennlp.html#tools.namefind.training.featuregen I think we should remove the deprecated training methods so

Re: GSoC 2016: OpenNLP Sentiment Analysis

2016-04-26 Thread Joern Kottmann
I will be able to join as well. Jörn On Tue, Apr 26, 2016 at 5:28 AM, Mattmann, Chris A (3980) < chris.a.mattm...@jpl.nasa.gov> wrote: > Hey Anastasija, > > To be honest 9am EST is a little aggressive, I will likely be able > to do 6:40 am PT (am traveling back from DC as I type this) which > is

Re: GSoC 2016: OpenNLP Sentiment Analysis

2016-04-26 Thread Joern Kottmann
The Large Movie Review Dataset might be interesting for this as well: http://ai.stanford.edu/~amaas/data/sentiment/ Jörn On Tue, Apr 26, 2016 at 4:26 PM, Anthony Beylerian < anthony.beyler...@gmail.com> wrote: > sentiment analysis discussion doc : > > > https://docs.google.com/document/d/1Gi59Yq

Re: svn commit: r1734600 - in /opennlp/sandbox/opennlp-wsd/src: main/java/opennlp/tools/disambiguator/ main/java/opennlp/tools/disambiguator/ims/ main/java/opennlp/tools/disambiguator/mfs/ main/java/o

2016-04-26 Thread Joern Kottmann
Please always mention the issue number in the commit message. Thanks, Jörn On Fri, 2016-03-11 at 17:37 +, beyler...@apache.org wrote: > Author: beylerian > Date: Fri Mar 11 17:37:07 2016 > New Revision: 1734600 > > URL: http://svn.apache.org/viewvc?rev=1734600&view=rev > Log: > added unit te

Re: svn commit: r1731145 - in /opennlp/trunk/opennlp-tools/src/main/java/opennlp/tools: lemmatizer/ util/

2016-04-26 Thread Joern Kottmann
Hello Rodrigo, you are adding a couple of java files in this commit, and I think more in other commits for the lemmatizer. All new java files must have the AL header. May you please add the header to files where it is missing. Thanks, Jörn  On Thu, 2016-02-18 at 21:02 +, rage...@apache.org

Re: Permissions in JIRA?

2016-05-18 Thread Joern Kottmann
Hello, I added you and Anastasija as contributors, you can now assign issues to yourself. HTH, Jörn On Tue, May 17, 2016 at 6:51 AM, Mattmann, Chris A (3980) < chris.a.mattm...@jpl.nasa.gov> wrote: > Hi, > > I would like to assign OPENNLP-840 the GSoC project I’m mentoring > to myself. Would yo

Re: Performances of OpenNLP tools

2016-06-21 Thread Joern Kottmann
Just don't use the very old existing models, to get good results you have to train on your own data, especially if the domain of the data used for training and the data which should be processed doesn't match. The old models are trained on 90s news, those don't work well on todays news and probably

Re: Performances of OpenNLP tools

2016-06-21 Thread Joern Kottmann
There are some research papers which study and compare the performance of NLP toolkits, but be careful often they don't train the NLP tools on the same data and the training data makes a big difference on the performance. Jörn On Tue, Jun 21, 2016 at 5:44 PM, Joern Kottmann wrote: > Ju

Re: Performances of OpenNLP tools

2016-06-22 Thread Joern Kottmann
: > > https://github.com/scalanlp/chalk/wiki/Chalk-command-line-tutorial > > Chalk (now defunct) provided a Scala wrapper around OpenNLP functionality, > so the instructions there should make it fairly straightforward to adapt > MASC data to OpenNLP. > > -Jason > > On

Re: SentimentAnalysisParser updates

2016-07-01 Thread Joern Kottmann
Hello, would be nice to get a pull request for the work you did. Thanks, Jörn On Wed, Jun 29, 2016 at 8:08 PM, Anastasija Mensikova < mensikova.anastas...@gmail.com> wrote: > Hi everyone, > > Some updates on our SentimentAnalysisParser. > > For the past week I worked on making a pull request to

Re: DeepLearning4J as a ML for OpenNLP

2016-07-01 Thread Joern Kottmann
Hello, the people from deeplearning4j are rather nice and I discussed with them for a while how it can be used for OpenNLP. The state back then was that they don't properly support the sparse feature vectors we use in OpenNLP today. Instead we would need to use word embeddings. In the end I never

Re: Performances of OpenNLP tools

2016-07-04 Thread Joern Kottmann
rwise, if anyone would like to suggest proper data-sets for testing > each component that would be really helpful > > Anthony > > On Thu, Jun 23, 2016 at 12:18 AM, Joern Kottmann > wrote: > > > It would be nice to get MASC support into the OpenNLP formats package. > > &g

Re: AdaptiveFeatureGenerator and FeatureGeneratorAdapter

2016-07-04 Thread Joern Kottmann
Hello, as far as I understand the proposed change will break backward compatibility for users who implement AdaptiveFeatureGenerator. Is that correct? Anyway, I always like the idea of making things simpler. In Java 8 it is possible to declare default methods in an interface. http://docs.oracle.c

Re: Model to detect the gender

2016-07-04 Thread Joern Kottmann
Hello, there are also other interesting properties e.g. person title (e.g. professor, doctor), job title/position, company legal form. And much more for other entity types. Maybe it would be worth it to build a dedicated component to extract properties from entities. Jörn On Fri, Jul 1, 2016 at

Re: Model to detect the gender

2016-07-04 Thread Joern Kottmann
her name-related > entities too? > > OR > > Find the entities with a dictionary and then train a maxent model that > finds other properties like person title, job position etc? > > Thanks for the clarification. > > > 2016-07-04 12:15 GMT+02:00 Joern Kottmann : >

Re: Model to detect the gender

2016-07-04 Thread Joern Kottmann
The co-referencer we used used to have in opennlp-tools has a model to detect the gender of names. That could could be extracted and put into a stand alone component. Jörn On Mon, Jul 4, 2016 at 2:41 PM, Joern Kottmann wrote: > I was speaking about the second case. We could build a dedica

Re: Migrate to Git?

2016-07-04 Thread Joern Kottmann
Hello all, do we still want to do this? Has been a while since we discussed it. I am happy to get it done if we reach consensus on it again. My +1 again. Jörn On Thu, Dec 20, 2012 at 4:40 PM, Tommaso Teofili wrote: > in my opinion that would be good, +1 > Tommaso > > > 2012/12/19 Jörn Kottman

Re: Migrate to Git?

2016-07-04 Thread Joern Kottmann
hern California, Los Angeles, CA 90089 USA > WWW: http://irds.usc.edu/ > ++++++ > > > > > > > > > > > On 7/4/16, 7:36 AM, "Joern Kottmann" wrote: > > > Hello all, > > > > do we still want to do this? H

Re: Migrate to Git?

2016-07-04 Thread Joern Kottmann
ion Retrieval and Data Science Group (IRDS) > Adjunct Associate Professor, Computer Science Department > University of Southern California, Los Angeles, CA 90089 USA > WWW: http://irds.usc.edu/ > ++++++ > > >

Re: Migrate to Git?

2016-07-04 Thread Joern Kottmann
Here is the jira issue for this: https://issues.apache.org/jira/browse/INFRA-12209 Jörn On Wed, 2012-12-19 at 21:09 +0100, Jörn Kottmann wrote: > Hi all, > > I heard at ApacheCon Europe that it should be possible to migrate > from  > Subverion to Git. > > Is there any interest in doing that? If

Re: Migrate to Git?

2016-08-18 Thread Joern Kottmann
ieval and Data Science Group (IRDS) > Adjunct Associate Professor, Computer Science Department > University of Southern California, Los Angeles, CA 90089 USA > WWW: http://irds.usc.edu/ > ++++++ > > > > > > > &g

Re: Migrate to Git?

2016-08-19 Thread Joern Kottmann
ave some SentimentAnalysis stuff to hopefully commit > and > get refactored. Hopefully after that’s done we can ship a release soon and > publish to Central. > > > > On 8/18/16, 5:50 AM, "Joern Kottmann" wrote: > > We made some progress here, the repository is now swi

Re: Migrate to Git?

2016-08-19 Thread Joern Kottmann
lly after that’s done we can ship a release soon and > publish to Central. > > > > On 8/18/16, 5:50 AM, "Joern Kottmann" wrote: > > We made some progress here, the repository is now switched to git. > > Please have a look here: > https://issues.apa

Re: Migrate to Git?

2016-08-19 Thread Joern Kottmann
wrote: > > > > > we can use branches instead of repositories. > > > > Thanks, > > Madhawa > > > > Madhawa > > > > On Fri, Aug 19, 2016 at 1:54 PM, Joern Kottmann > > > > wrote: > > > > > > > > Yes, i

Re: Migrate to Git?

2016-08-19 Thread Joern Kottmann
On Fri, 2016-08-19 at 18:09 +0200, Aliaksandr Autayeu wrote: > P.S. On convenience. Cloning into single directory and setting up > single > project makes it works just as well. Decent IDEs handle this easily. > > On tracking history. The need to track history of experimental code > obfuscates its

Re: Migrate to Git?

2016-08-19 Thread Joern Kottmann
On Fri, 2016-08-19 at 18:01 +0200, Aliaksandr Autayeu wrote: > Separating site and code is not enough. Different code requires > different > levels of maintenance, that's why it's better to separate sandbox and > add-ons from trunk too. Sandbox might become outdated or might not > compile. > It mig

Re: Is sentence detection process really needed?

2016-08-26 Thread Joern Kottmann
The name finder has the concept of "adaptive data" in the feature generation. The feature generators can remember things from previous sentences and use it to generate features based on it. Usually that can help with the recognition rate if you have names that are repeated. You can tweak this to y

Re: Access to Git

2016-09-09 Thread Joern Kottmann
Hello, yes you can use it. The add-ons and other things are not setup yet as far as I know, have to ping the infra team about it. Please have a look at the issue I posted to see how to access it. I will work on this on Monday. HTH Jörn On Sep 9, 2016 19:10, "William Colen" wrote: > Hello, > >

Re: Access to Git

2016-09-14 Thread Joern Kottmann
Sorry, it took me a little to figure this out. This link explains how it works: https://reference.apache.org/committer/git The reponame is opennlp, we will soon also have the other repos opennlp-addons and opennlp-sandbox. Jörn On Fri, Sep 9, 2016 at 10:58 PM, Joern Kottmann wrote: > He

Re: Access to Git

2016-09-19 Thread Joern Kottmann
The opennlp-addons repo is now also available, and opennlp-sandbox will be available soon. Jörn On Thu, 2016-09-15 at 01:12 +0200, Joern Kottmann wrote: > Sorry, it took me a little to figure this out. > > This link explains how it works: > https://reference.apache.org/committer/gi

Re: Morfologik Addon

2016-10-13 Thread Joern Kottmann
We could distribute it with our main release, similar to how we do with opennlp-uima. I think that would make sense. If people would like to use it they can add it as an extra dependency. There are probably also other thing we can distribute in a similar fashion with the next release. Jörn On Fr

Moving brat annotator to opennlp.git

2016-10-18 Thread Joern Kottmann
Hello all, what do you think about including the brat ner annotator in the 1.6.1 release? I believe it is important that we include it to allow our users to easier run custom annotation projects, as part of the move we need to extend the documentation so everyone can easily get it up and running

Re: Moving brat annotator to opennlp.git

2016-10-19 Thread Joern Kottmann
gt; > > +1 > > > > Madhawa > > > > On Wed, Oct 19, 2016 at 2:20 PM, "Shuo Xu" wrote: > > > > > +1 > > > > > > > > > On Wed, Oct 19, 2016 at 12:46 AM, Joern Kottmann > > > wrote: > > > > > >

Re: Moving brat annotator to opennlp.git

2016-10-19 Thread Joern Kottmann
Have a look at this page: http://brat.nlplab.org/standoff.html Jörn On Wed, Oct 19, 2016 at 9:06 PM, Richard Eckart de Castilho < richard.eck...@gmail.com> wrote: > On 19.10.2016, at 20:59, Joern Kottmann wrote: > > > > There is a dedicated servlet which implements exac

Re: Moving brat annotator to opennlp.git

2016-10-19 Thread Joern Kottmann
not format... I do know the brat format. > > Best, > > -- Richard > > > On 19.10.2016, at 21:30, Joern Kottmann wrote: > > > > Have a look at this page: > > http://brat.nlplab.org/standoff.html > > > > Jörn > > > > > > On Wed, O

Re: Moving brat annotator to opennlp.git

2016-10-19 Thread Joern Kottmann
Looks like POS Tagging is not supported by Brat. Jörn On Wed, Oct 19, 2016 at 8:59 PM, Joern Kottmann wrote: > There is a dedicated servlet which implements exactly the protocol brat > requires. We can extend it to make it available for other tools. > > Do you know any other anno

  1   2   3   >