Re: Missing resources for script that extracts markables from a corpus for analysis [EXTERNAL]

2017-10-04 Thread Alexandru Zbarcea
Hi Tim, Because LuceneIndex is touched in several places within the code, I started with refactorization of LuceneIndexReaderResourceImpl (see: CTAKES-464 [1]) If you have time, may you also check CTAKES-334 [2]. I started to have it as a prerequisite, because the patch provided actually will

Re: Missing resources for script that extracts markables from a corpus for analysis [EXTERNAL]

2017-10-04 Thread Alexandru Zbarcea
Thanks Tim, I will let you know about the progress. Alex On Oct 4, 2017 06:34, "Miller, Timothy" < timothy.mil...@childrens.harvard.edu> wrote: > I had in mind the notes in: > /ctakes-examples-res/src/main/resources/org/apache/ctakes/ > examples/notes/rtf > > which I believe are the fake notes

Re: Missing resources for script that extracts markables from a corpus for analysis [EXTERNAL]

2017-10-04 Thread Miller, Timothy
I had in mind the notes in: /ctakes-examples-res/src/main/resources/org/apache/ctakes/examples/notes/rtf which I believe are the fake notes Dr. John Green wrote for us. I don't know why they are rtf but they are nice, non-toy-length notes. Tim From:

Re: Missing resources for script that extracts markables from a corpus for analysis [EXTERNAL]

2017-10-02 Thread Miller, Timothy
Yeah, it might be nice to build a lucene index of all the sample notes in the ctakes-example module. I'll create a jira for it but probably won't be able to get to it right away. Tim From: Alexandru Zbarcea Sent: Monday, October 2,

Re: Missing resources for script that extracts markables from a corpus for analysis [EXTERNAL]

2017-10-02 Thread Alexandru Zbarcea
Hi Tim, I understand, makes sense. Is it possible to anonymize the data you have or come up with a separate body of test data to generate a Lucene index and unit test the code? I think this would have the double benefit of the code being tested and showing dev/users how the code is supposed to be

Re: Missing resources for script that extracts markables from a corpus for analysis [EXTERNAL]

2017-10-02 Thread Miller, Timothy
Thanks Alex, This code is for processing a clinical text data corpus stored as a lucene index -- data that cannot be redistributed for privacy reasons. Since it's so related to the coref stuff I thought it should go alongside the coreference module. But maybe it makes more sense as an external