Just to clarify - with the YTEX branch there are 2 sentence splitter - the
original ctakes sentence that splits on newlines, and the ytex sentence
splitter that doesn't. the changes to other components in the ytex branch
(dependency parser, assertion) work with both sentence splitters.
I think
We could possibly add some additional datasets for training. MIMIC data
does come to mind -- I can't remember off the top of my head if the MIMIC
dataset has sentences spanning lines or not.
--
Karthik Sarma
UCLA Medical Scientist Training Program Class of 20??
Member, UCLA Medical Imaging
Just an FYI, a while back I did some of these annotations myself on
MIMIC to get around this issue. I replaced the newline character with a
special (non-English) character, then pre-processed ctakes input to
replace newlines with that character, then did sentence detection, then
added the