sentence detector newline behavior

2013-05-21 Thread Miller, Timothy
The sentence detector always ends a sentence where there are newlines. This is a problem for some notes (e.g. MIMIC radiology notes) where a line can wrap in the middle of a sentence at specified character offsets. In the comments for SentenceDetector, it seems to be split up very logically in tha

Re: sentence detector newline behavior

2013-05-21 Thread Steven Bethard
On May 21, 2013, at 6:07 AM, "Miller, Timothy" wrote: > The sentence detector always ends a sentence where there are newlines. > This is a problem for some notes (e.g. MIMIC radiology notes) where a > line can wrap in the middle of a sentence at specified character > offsets. In the comments for

RE: sentence detector newline behavior

2013-05-21 Thread Savova, Guergana
deals with this issue. --Guergana -Original Message- From: Steven Bethard [mailto:steven.beth...@colorado.edu] Sent: Tuesday, May 21, 2013 9:59 AM To: dev@ctakes.apache.org Subject: Re: sentence detector newline behavior On May 21, 2013, at 6:07 AM, "Miller, Timothy" wr

RE: sentence detector newline behavior

2013-05-21 Thread Masanz, James J.
, Timothy Sent: Tuesday, May 21, 2013 7:07 AM To: dev@ctakes.apache.org Subject: sentence detector newline behavior The sentence detector always ends a sentence where there are newlines. This is a problem for some notes (e.g. MIMIC radiology notes) where a line can wrap in the middle of a sentence at

Re: sentence detector newline behavior

2013-05-21 Thread Tim Miller
egative would negate extravascular findings -Original Message- From: dev-return-1605-Masanz.James=mayo@ctakes.apache.org [mailto:dev-return-1605-Masanz.James=mayo@ctakes.apache.org] On Behalf Of Miller, Timothy Sent: Tuesday, May 21, 2013 7:07 AM To: dev@ctakes.apache.org Subject: sent

Re: sentence detector newline behavior

2013-05-21 Thread Steven Bethard
And without breaking on the line ending, the word negative would negate >> extravascular findings >> >> >> -Original Message- >> From: dev-return-1605-Masanz.James=mayo....@ctakes.apache.org >> [mailto:dev-return-1605-Masanz.James=mayo@ctakes.apache.org] On Behal

RE: sentence detector newline behavior

2013-05-21 Thread Savova, Guergana
...@colorado.edu] Sent: Tuesday, May 21, 2013 11:38 AM To: dev@ctakes.apache.org Subject: Re: sentence detector newline behavior On May 21, 2013, at 9:02 AM, Tim Miller wrote: > I think the whole reason to use a machine learning approach for > sentence detection should be to help weigh evidenc

Re: sentence detector newline behavior

2013-05-21 Thread Steven Bethard
nt: Tuesday, May 21, 2013 11:38 AM > To: dev@ctakes.apache.org > Subject: Re: sentence detector newline behavior > > On May 21, 2013, at 9:02 AM, Tim Miller > wrote: >> I think the whole reason to use a machine learning approach for >> sentence detection should be to help weigh

RE: sentence detector newline behavior

2013-05-21 Thread Chen, Pei
es or retraining?) --Pei > -Original Message- > From: Steven Bethard [mailto:steven.beth...@colorado.edu] > Sent: Tuesday, May 21, 2013 12:07 PM > To: dev@ctakes.apache.org > Subject: Re: sentence detector newline behavior > > On May 21, 2013, at 9:53 AM, "Savova, Guer

RE: sentence detector newline behavior

2013-05-21 Thread Savova, Guergana
nt: Tuesday, May 21, 2013 11:38 AM > To: dev@ctakes.apache.org > Subject: Re: sentence detector newline behavior > > On May 21, 2013, at 9:02 AM, Tim Miller > wrote: >> I think the whole reason to use a machine learning approach for >> sentence detection should be to help weigh

Re: sentence detector newline behavior

2013-05-21 Thread Steven Bethard
ginal Message- > From: Steven Bethard [mailto:steven.beth...@colorado.edu] > Sent: Tuesday, May 21, 2013 12:07 PM > To: dev@ctakes.apache.org > Subject: Re: sentence detector newline behavior > > On May 21, 2013, at 9:53 AM, "Savova, Guergana" > wrote: >&

Re: sentence detector newline behavior

2013-05-22 Thread Jörn Kottmann
On 05/21/2013 08:00 PM, Steven Bethard wrote: So perhaps we could re-train it to disambiguate newline characters as well? Yes, the OpenNLP Sentence Detector now supports that in the new 1.5.3 version out of the box, you can specify the set of EOS chars to use, but the default is still: !?. If

Re: sentence detector newline behavior

2013-05-22 Thread Miller, Timothy
That's awesome! It might be worth trying at least. How does the training process change? Previously the training data would be one sentence per line, but with newlines as possible mid-sentence characters that could be trouble, is there a new representation for training data? Or would we have to use

Re: sentence detector newline behavior

2013-05-22 Thread Jörn Kottmann
On 05/22/2013 01:17 PM, Miller, Timothy wrote: That's awesome! It might be worth trying at least. How does the training process change? Previously the training data would be one sentence per line, but with newlines as possible mid-sentence characters that could be trouble, is there a new represen

Re: sentence detector newline behavior

2013-05-23 Thread Tim Miller
OK I've started doing this, was able to get training working on a very small example, will try doing slightly bigger. Tim On 05/22/2013 08:03 AM, Jörn Kottmann wrote: On 05/22/2013 01:17 PM, Miller, Timothy wrote: That's awesome! It might be worth trying at least. How does the training process

Re: sentence detector newline behavior

2014-01-20 Thread Jörn Kottmann
Hi all, currently I have quite a bit of time to work on OpenNLP, and would like to help you out with this issue. Here is the follow up issue for this change: https://issues.apache.org/jira/browse/OPENNLP-602 I am still trying to figure out what would be the best option to implement this. In

Re: sentence detector newline behavior

2014-01-20 Thread vijay garla
The sentence detection opennlp model used by ctakes does not split sentences at newlines - there is additional logic in the takes sentence splitter that does this (and an alternative impl that doesn't is in the ytex branch). Afaik no retraining / change to the feature representation is necessary.

Re: sentence detector newline behavior

2014-01-20 Thread Chen, Pei
I presume Joern was suggesting that if he supports new lines in the opennlp SentenceDectector (either part of the trained models or post processing with some rules?) cTAKES will be able to use it out of the box and we should be able remove any additional custom logic that we currently have- whic

Re: sentence detector newline behavior

2014-01-21 Thread Jörn Kottmann
Yes, exactly, OPENNLP-602 is about training a sentence detector model which can use a new line as a end-of-sentence character. In case you have certain rules to split sentences we should have a look at them. The Sentence Detector could be extended to support a user provided rule based splitter.

RE: sentence detector newline behavior

2014-01-22 Thread Masanz, James J.
o:dev-return-2390-Masanz.James=mayo@ctakes.apache.org] On Behalf Of Jörn Kottmann Sent: Tuesday, January 21, 2014 4:29 AM To: dev@ctakes.apache.org Subject: Re: sentence detector newline behavior Yes, exactly, OPENNLP-602 is about training a sentence detector model which can use a new line as a end-of-s

RE: sentence detector newline behavior

2014-01-22 Thread Finan, Sean
7; Subject: RE: sentence detector newline behavior The only rule I know of is that cTAKES (prior to ytex integration) always forces a sentence break at a newline. This was because the clinical notes cTAKES original processed never had newlines in the middle of a sentence, but did need sentence b

RE: sentence detector newline behavior

2014-01-22 Thread Masanz, James J.
...@childrens.harvard.edu] Sent: Wednesday, January 22, 2014 1:33 PM To: dev@ctakes.apache.org Subject: RE: sentence detector newline behavior Just whistling in the wind here ... Perhaps before any changes are made to universally toggle cTakes in one direction or the other, we can take a poll of when & w

RE: sentence detector newline behavior

2014-01-22 Thread Finan, Sean
been prescribed" -- 2 sentences, 5 lines Nothing can really be done for the last bit where punctuation is missing. -Original Message- From: Masanz, James J. [mailto:masanz.ja...@mayo.edu] Sent: Wednesday, January 22, 2014 3:07 PM To: 'dev@ctakes.apache.org' Subject: RE:

RE: sentence detector newline behavior

2014-01-22 Thread Finan, Sean
l and Tylenol have been prescribed" -- 2 sentences, 5 lines Nothing can really be done for the last bit where punctuation is missing. -Original Message- From: Masanz, James J. [mailto:masanz.ja...@mayo.edu] Sent: Wednesday, January 22, 2014 3:07 PM To: 'dev@ctakes.apache.org&

Re: sentence detector newline behavior

2014-01-23 Thread vijay garla
d.edu] > Sent: Wednesday, January 22, 2014 3:42 PM > To: dev@ctakes.apache.org > Subject: RE: sentence detector newline behavior > > Thanks James > > > but then no typical sentence ending punctuation at the end of the line > > Gotcha. > > > So simply using Lines w

Re: sentence detector newline behavior

2014-01-23 Thread Karthik Sarma
From: Finan, Sean [mailto:sean.fi...@childrens.harvard.edu] > > Sent: Wednesday, January 22, 2014 3:42 PM > > To: dev@ctakes.apache.org > > Subject: RE: sentence detector newline behavior > > > > Thanks James > > > > > but then no typical sentence ending

Re: sentence detector newline behavior

2014-01-23 Thread Tim Miller
Message- From: Finan, Sean [mailto:sean.fi...@childrens.harvard.edu] Sent: Wednesday, January 22, 2014 3:42 PM To: dev@ctakes.apache.org Subject: RE: sentence detector newline behavior Thanks James but then no typical sentence ending punctuation at the end of the line Gotcha. So simply using Li

Re: sentence detector newline behavior

2014-01-24 Thread Jörn Kottmann
On 01/23/2014 10:06 PM, Tim Miller wrote: Just an FYI, a while back I did some of these annotations myself on MIMIC to get around this issue. I replaced the newline character with a special (non-English) character, then pre-processed ctakes input to replace newlines with that character, then di

Re: sentence detector newline behavior

2014-01-24 Thread Jörn Kottmann
il and Tylenol have been prescribed" -- 2 sentences, 5 lines Nothing can really be done for the last bit where punctuation is missing. -Original Message- From: Masanz, James J. [mailto:masanz.ja...@mayo.edu] Sent: Wednesday, January 22, 2014 3:07 PM To: 'dev@ctakes.

Re: sentence detector newline behavior

2014-01-25 Thread Miller, Timothy
gt;>>> was in >>>> the opennlp model, but this requires training data. I don't know >>>> what the >>>> training data is, or if the training data has sentences that cross >>>> newline >>>> boundaries (if not, won't bu

Re: sentence detector newline behavior

2014-01-25 Thread Miller, Timothy
s requires training data. I don't know >>>> what the >>>> training data is, or if the training data has sentences that cross >>>> newline >>>> boundaries (if not, won't buy us anything). >>>> >>>> vijay >>>> &

Re: sentence detector newline behavior

2014-01-25 Thread Jörn Kottmann
On 01/25/2014 01:33 PM, Miller, Timothy wrote: Thanks Joern, I'll try it. My understanding is I just need to give it my training data, with the special character I used replaced with the literal string "" and each line in the file is an example sentence. Yes, exactly. Just thinking about the

Re: sentence detector newline behavior

2014-01-25 Thread Jörn Kottmann
On 01/25/2014 03:03 PM, Miller, Timothy wrote: I'm running into one issue, it gets tripped up on sentences with line-ending spaces. I could easily remove them with a script but by default they are in there. It happens when a sentence example ends: ...BILAT HEMATOMAS. (There is a period, then

Re: sentence detector newline behavior

2014-01-25 Thread Miller, Timothy
On 01/25/2014 12:24 PM, Jörn Kottmann wrote: > The code which computes the spans tries to remove white space from it. > Removing the white space from a whitespace only sentence is causing > the exception your are seeing. Which response would you expect from > the sentence detector? Should a white

Re: sentence detector newline behavior

2014-01-26 Thread Jörn Kottmann
On 01/25/2014 10:03 PM, Miller, Timothy wrote: On 01/25/2014 12:24 PM, Jörn Kottmann wrote: The code which computes the spans tries to remove white space from it. Removing the white space from a whitespace only sentence is causing the exception your are seeing. Which response would you expect fr

Re: sentence detector newline behavior

2014-01-26 Thread Miller, Timothy
On 01/26/2014 09:59 AM, Jörn Kottmann wrote: > > The evaluation should ignore white spaces. I committed now my fix, it > would be nice if you can > test it. > > There might be still something wrong. In my test data I replaced all > question marks with white spaces, and the result > is slightly w

Re: sentence detector newline behavior

2014-01-27 Thread Jörn Kottmann
On 01/26/2014 11:29 PM, Miller, Timothy wrote: Yes, this fixes the whitespace sentence issue but the evaluation issue remains. I believe the problem is in SentenceSampleStream, where in the following block the whitespace trim happens before the character is replaced with the \n character. So tes

Re: sentence detector newline behavior

2014-01-27 Thread Tim Miller
OK, with the most recent version I am able to replicate the performance I was getting before. Thanks a lot Jörn! Assuming this is in the next incremental release of opennlp, how quickly can we get a re-trained model into cTAKES? I heard from a researcher at AMIA who tried cTAKES and because of

RE: sentence detector newline behavior

2014-01-27 Thread digital paula
> Date: Mon, 27 Jan 2014 09:52:00 -0500 > From: timothy.mil...@childrens.harvard.edu > To: dev@ctakes.apache.org > Subject: Re: sentence detector newline behavior > > OK, with the most recent version I am able to replicate the performance > I was getting before. Thanks a lo

RE: sentence detector newline behavior

2014-01-27 Thread Masanz, James J.
or and just using OpenNLP directly. -- James -Original Message- From: Tim Miller [mailto:timothy.mil...@childrens.harvard.edu] Sent: Monday, January 27, 2014 8:52 AM To: dev@ctakes.apache.org Subject: Re: sentence detector newline behavior OK, with the most recent version I am able to replic

Re: sentence detector newline behavior

2014-01-27 Thread Tim Miller
AM To: dev@ctakes.apache.org Subject: Re: sentence detector newline behavior OK, with the most recent version I am able to replicate the performance I was getting before. Thanks a lot Jörn! Assuming this is in the next incremental release of opennlp, how quickly can we get a re-trained model into c

RE: sentence detector newline behavior

2014-01-27 Thread Masanz, James J.
> From: Tim Miller [mailto:timothy.mil...@childrens.harvard.edu] > Sent: Monday, January 27, 2014 8:52 AM > To: dev@ctakes.apache.org > Subject: Re: sentence detector newline behavior > > OK, with the most recent version I am able to replicate the performance > I was getting before. Thank

Re: sentence detector newline behavior

2014-01-27 Thread vijay garla
w with it. I've > been delving deeper into cTAKES on the machine learning aspect...I'm > struggling a bit with it and if anything I scratch my head and doubt my > competence. ;-) > > Regards, > Paula > > > Date: Mon, 27 Jan 2014 09:52:00 -0500 > > From: timo

Re: sentence detector newline behavior

2014-01-27 Thread Tim Miller
with it. I've been delving deeper into cTAKES on the machine learning aspect...I'm struggling a bit with it and if anything I scratch my head and doubt my competence. ;-) Regards, Paula Date: Mon, 27 Jan 2014 09:52:00 -0500 From: timothy.mil...@childrens.harvard.edu To: dev@ctakes.apac

Re: sentence detector newline behavior

2014-01-27 Thread vijay garla
month and worked around it with treating narrative as one string. > > Anyone who's looked at the code would appreciate and acknowledge that > cTAKES is a powerful and complex application. I'm overall impressed with > it and I intend to continue to use it, improve it, and gro

Re: sentence detector newline behavior

2014-01-27 Thread Tim Miller
Date: Mon, 27 Jan 2014 09:52:00 -0500 From: timothy.mil...@childrens.harvard.edu To: dev@ctakes.apache.org Subject: Re: sentence detector newline behavior OK, with the most recent version I am able to replicate the performance I was getting before. Thanks a lot Jörn! Assuming this is in the next

Re: sentence detector newline behavior

2014-01-29 Thread Jörn Kottmann
On 01/27/2014 08:44 PM, Tim Miller wrote: That is a good point, and something I was wondering about. Having now looked at both the ctakes and opennlp code for the sentence splitter it seems like there is a lot of overlap. I would've thought it was just a matter of converting annotations into

RE: sentence detector newline behavior

2014-01-29 Thread Chen, Pei
ry 29, 2014 3:55 PM > To: dev@ctakes.apache.org > Subject: Re: sentence detector newline behavior > > On 01/27/2014 08:44 PM, Tim Miller wrote: > > > > That is a good point, and something I was wondering about. Having now > > looked at both the ctakes and opennlp code fo

Re: sentence detector newline behavior

2014-01-29 Thread Jörn Kottmann
On 01/27/2014 03:52 PM, Tim Miller wrote: OK, with the most recent version I am able to replicate the performance I was getting before. Thanks a lot Jörn! Assuming this is in the next incremental release of opennlp, how quickly can we get a re-trained model into cTAKES? I am currently worki