Custom dictionary no-"no" [was: Re: PREFTERMs not included in UMLS rare-word dictionary?]

2024-04-16 Thread Kean Kaufmann
flaw in the dictionary > creator tool. > > Time for a rebuild with the 5.0 release ... > > Thanks for the report, > > Sean > > > From: Kean Kaufmann > Sent: Wednesday, December 6, 2023 4:12 PM > To: dev@ctakes.apache.org > Su

PREFTERMs not included in UMLS rare-word dictionary?

2023-12-06 Thread Kean Kaufmann
cal terms that are in PREFTERM but not CUI_TERMs. Off the top: C0017168 gastroesophageal reflux disease, C0018802 congestive heart failure, C0022104 irritable bowel syndrome, ... We've been adding these to a supplementary BSV file as they come up, but there are many more. This HSQL query f

Re: Discrepancy in cTAKES Identification of 'Chemotherapy' SNOMED Codes

2023-11-17 Thread Kean Kaufmann
hoice (UMLS CUI: C1298669) > Any resources I can use to help me with other similar questions? UMLS Metathesaurus Browser: https://uts.nlm.nih.gov/uts/umls/home Signup is free. _____ *Kean Kaufmann* NLP Architect RecordsOne nSight Driven | *

Re: Initial CTakes analysis

2023-08-12 Thread Kean Kaufmann
t, extracting tumor info from tables, etc. The cTAKES RegexSectionizer might work for you. https://ctakes.apache.org/apidocs/4.0.0/org/apache/ctakes/core/ae/RegexSectionizer.html _ *Kean Kaufmann* NLP Architect RecordsOne nSight Driven |

Re: Apache cTAKES GitHub mirror is stuck in 2019 [EXTERNAL] [SUSPICIOUS] [SUSPICIOUS]

2022-06-06 Thread Kean Kaufmann
Is Git LFS an option? https://www.atlassian.com/git/tutorials/git-lfs#installing-git-lfs Needs an LFS-aware host e.g. Bitbucket; I don't know what the Apache hosting setup is like. On Fri, Jun 3, 2022 at 9:31 AM Finan, Sean wrote: > Hi Tim, > > >we ran into issues in previous attempts at migrat

Re: rule-based lookup for custom lexicon [EXTERNAL] [SUSPICIOUS]

2021-05-19 Thread Kean Kaufmann
ommunity can apply ruta rules to their project. > > > > When I looked at it a few years ago it was for reason 2b. In the end we > went for different annotators like Peter and Kean outlined and just use > piper file changes to satisfy #2 as that is definitely much easier. > However, it do

Re: rule-based lookup for custom lexicon [EXTERNAL] [SUSPICIOUS]

2021-05-19 Thread Kean Kaufmann
> yes, the line between "lookup" and rule execution is a little blurry sometimes. Sure is. I blur it with a set of annotators that extend dictionary annotations based on words or annotations covered by the same Chunk, e.g. DiseaseDisorderMention + /screen(ing)?/i = ProcedureMention MedicationMe

Re: Dictionary "bad" codes

2021-02-15 Thread Kean Kaufmann
t; script and use it to massage the dictionary I created using the creator. > > Peter > > > > On Mon, Feb 15, 2021 at 4:16 PM Kean Kaufmann wrote: > > > FWIW, rather than editing the HSQLDB script, we use Sean's > > BsvRareWordDictionary to

Re: Dictionary "bad" codes

2021-02-15 Thread Kean Kaufmann
FWIW, rather than editing the HSQLDB script, we use Sean's BsvRareWordDictionary to add phrases with a BSV file: cTakesHsql.xml: AddPhrases org.apache.ctakes.dictionary.lookup2.dictionary.BsvRareWordDictionary

Re: Passing SectionsBsv to piper containing BsvRegexSectionizer [EXTERNAL]

2021-02-02 Thread Kean Kaufmann
tory"; for outpatient radiology, "History" maps to "Reason for Exam". A lot of people in the community don't dream in java I do, sometimes... but then I wake up screaming. ;-) Kean Kaufmann Chief Architect - NLP RecordsOne, Inc. On Sat, Jan 30, 2021 at 10:01 AM Fi

Re: Lab Value Finder

2020-12-27 Thread Kean Kaufmann
> I've attached the Junit test based off your unit test and its debug output. You'll have to change the package name, though. Hi Peter -- Yes, that does sound weird. Not seeing an attachment. Send it along and I'll give troubleshooting a shot. Happy holidays, everybody. On Thu, Dec 24, 2020 at

Re: What to do about 4.0.0 and UMLS

2020-12-09 Thread Kean Kaufmann
> > 3. for 4.0.0 users that compile their own, provide a tar file containing > the sources plus instructions for modifying xml files and removing obsolete > Junit file. Is it worth a quick email poll of 4.0.0 users? +1 for Option 3! Thanks Peter (and everybody)... On Wed, Dec 9, 2020 at 8:

Re: Disambiguation --alignment with SNOMED [EXTERNAL]

2020-12-03 Thread Kean Kaufmann
Peter says: the LabValueFinder. It has settings that allow it to clone procedures into > lab values or vice versa (I can't remember). The former... at least, when I contributed it. For potential lab values, it filters by TUIs: some procedures, others medications. Sean says: The only immediat

Re: I think I found a bug.

2020-08-31 Thread Kean Kaufmann
Hi Peter, I believe I've encountered this too; I never got around to tracking it down to the root cause, and didn't have the civic-mindedness to report it as you have. Thanks! To shut it up I implemented a brutal brute-force workaround, enclosed for your possible amusement. But it occurred to me

Re: Question about window size in term lookup

2020-08-24 Thread Kean Kaufmann
> > my question is whether there's a place where one can register specific two > character terms, for example BP or PT which will be found even with a > window size set to three. My brute-force approach is pretty brutal: Change the window size to two, annotate terms, then remove all two-letter an

Re: Query on LabValueFinder

2018-03-19 Thread Kean Kaufmann
Gandhi, at first blush, I can't replicate your result using the code I submitted... but my code and config differ from trunk, so Sean is probably the best person to ask. I included unit tests with a mini-dictionary for ProcedureMentions, but they probably didn't play nicely with the rest of the fra

Re: UmlsOverlapLookupAnnotator + BsvRareWordDictionary: # tokens skipped varies? [EXTERNAL]

2018-03-07 Thread Kean Kaufmann
out. > That is very strange. I have no idea why adding an entry would change the > behavior. I will have to look at the code and run your examples. By the > way, thank you for the explicit examples! > > >Is this expected behavior? > No, and thanks for letting me know about it.

Re: UmlsOverlapLookupAnnotator + BsvRareWordDictionary: # tokens skipped varies?

2018-03-07 Thread Kean Kaufmann
P.S. Extra config bit: I also removed "CD" from the exclusionTags in the UmlsOverlapLookupAnnotator. On Wed, Mar 7, 2018 at 10:58 AM, Kean Kaufmann wrote: > Hi Sean, > > I'm perplexed. It seems as if the number of tokens that the > UmlsOverlapLookupAnnotator will s

UmlsOverlapLookupAnnotator + BsvRareWordDictionary: # tokens skipped varies?

2018-03-07 Thread Kean Kaufmann
Hi Sean, I'm perplexed. It seems as if the number of tokens that the UmlsOverlapLookupAnnotator will skip varies with the content of the RareWordDictionary. Here's my setup. I think I've included enough information to replicate my perplexity, if you have time/inclination to do that; let me know

Re: Lab Value - Range finder

2018-03-01 Thread Kean Kaufmann
same. > > 28 Feb 2018 15:04:04 INFO LabValueFinder - Set to value: > LabMention(349-352): HCT > > 28 Feb 2018 15:04:04 INFO LabValueFinder - Set to value: > RangeAnnotation(365-370): 42-52 > > 28 Feb 2018 15:04:04 INFO LabValueFinder - Set to value: > MeasurementA

Fwd: Lab Value - Range finder

2018-02-12 Thread Kean Kaufmann
Oops, forgot to cc: dev. Happy Monday... -- Forwarded message -- From: Kean Kaufmann Date: Mon, Feb 12, 2018 at 10:14 AM Subject: Re: Lab Value - Range finder To: abilash.mat...@cognizant.com > I could see setReferenceRangeNarrative method in LabMention class. Is that the

Re: Lab Value - Range finder

2018-02-09 Thread Kean Kaufmann
Hi Abilash, By design, the Lab Value annotator avoids ranges if possible: https://svn.apache.org/repos/asf/ctakes/trunk/ctakes-core/src/main/java/org/apache/ctakes/core/ae/LabValueFinder.java // prefer non-range values, if any > value = candidateList.stream() >

Re: Lab Value Finder dictionary

2017-12-19 Thread Kean Kaufmann
In my experience, the quick answer is: Certainly not all, but probably many. Different institutions will have different formats for lab reports, different panels they typically perform, and different labels. On Tue, Dec 19, 2017 at 8:21 AM, wrote: > Hi , > > Based on the experience , can anyon

Re: Lab report

2017-11-13 Thread Kean Kaufmann
I wrote a lab annotator that will be checked into the trunk at some point. Source, unit tests and description attached to this issue: https://issues.apache.org/jira/browse/CTAKES-441 On Mon, Nov 13, 2017 at 6:09 AM, wrote: > Hi All, > > Can CTAKES process LAB reports and able to extract key fin

Re: CAS Visual Debugger - [EXTERNAL]

2017-10-25 Thread Kean Kaufmann
+1 I point it at an engine descriptor .xml file (using the command-line option -desc) that refers to the type system file, but that's a hack... On Wed, Oct 25, 2017 at 1:49 PM, Dligach, Dmitriy wrote: > +1 > > Also, I’d love to be able to point CVD to a directory containing XMI files > at start

Re: false positive [EXTERNAL]

2017-10-25 Thread Kean Kaufmann
Sean, thanks! Blacklisting is essential, and making it category-specific is a really nice touch. Dispatch from the trenches, FWIW: a) The blacklist can get quite big, e.g. when mining common wordlists. To reduce bloat, might you allow comma-separated lists of semantic groups in the first field?

Re: Filter CVD output?

2017-07-17 Thread Kean Kaufmann
Hi A.S., Does the "Show Selected Annotations" menu item serve your purposes? https://uima.apache.org/d/uimaj-current/tools.html#cvd.toolsMenu On Mon, Jul 17, 2017 at 4:31 AM, Lacey A.S. wrote: > Hi - I spend a lot of time showing doctors the output of cTakes via what I > have parsed during po

Re: Problem with the PrecisionTermConsumer [EXTERNAL]

2017-07-14 Thread Kean Kaufmann
ecise PrecisionTermConsumer or do I need to > make my own custom TermConsumer? I also tried the > SemanticCleanupTermConsumer, but it gave the same results. > > Here's the code I'm using to extract phrases: > > JCas jcas = JCasFactory.createJCas(); > jcas.setDocumentText(note); > AggregateBuild

LVG questions

2017-07-14 Thread Kean Kaufmann
tomization; and I've skimmed the NLM documentation, but it doesn't seem to be intended for developers. Can anyone point me to more detailed docs? And: Has anyone tried plugging in another stemmer? To play nicely with the ctakes-dictionary-lookup-fast annotators, it seems as if all it would have to do would be to populate canonicalForm. Happy Friday, and thanks for any help you can provide! Kean Kaufmann NLP Developer RecordsOne, Inc.

Re: cTakes doesn't identify certain words like "fell" in clinical notes

2017-07-14 Thread Kean Kaufmann
I'd think LVG would come up with "fall" as the canonicalForm of "fell" and "fallen", but apparently it doesn't. The only terms associated with C0085639 in my custom-built dictionary are: sql> select cui, tui, text, prefterm from cui_terms c join tui t on t.cui = > c.cui join prefterm p on p.cui =

Re: cTAKES as a dependency

2017-05-01 Thread Kean Kaufmann
> > On Fri, Apr 28, 2017 at 9:53 PM, Finan, Sean < > sean.fi...@childrens.harvard.edu> wrote: > Hey Kean, > > It is great to know that your project is out there! > Hey Sean! Very kind of you. Speaking of which, our BizDevVeep would like to see RecordsOne listed under "Companies" on the "Users of

Re: cTAKES 4.0.0 Release

2017-05-01 Thread Kean Kaufmann
> > For further information, please visit the project website at > > http://ctakes.apache.org/ > > > > -- The Apache cTAKES Team > > > -- _ *​Kean Kaufmann* ​NLP Developer RecordsOne nSight Dri

Re: cTAKES as a dependency

2017-04-28 Thread Kean Kaufmann
t; Has anybody tried to run a cTAKES pipeline without having a local cTAKES > installation? In other words, is it possible to set up a maven project that > will use cTAKES as an external dependency? > > Dima > > > > -- _

Re: Labs annotator?

2017-03-30 Thread Kean Kaufmann
mmitter the code or have somebody review it remotely? The > "tweaks" may be something useful to ctakes, but if not I'm sure that we can > create a decent interfacing. > > > > Cheers, > > Sean > > > > -Original Message- > > From: Kean

Re: Labs annotator?

2017-03-29 Thread Kean Kaufmann
sure that people would love to see lab values in ctakes! Could you > please write a small summary of what it does? Maybe an example or two > could suffice. > > We can definitely put it into ctakes in release 4.1 - maybe next quarter? > > Cheers, > Sean > > -Origin

Labs annotator?

2017-03-28 Thread Kean Kaufmann
dev list a couple of years ago; did anything come of it? Happy to contribute if it's helpful. -- _____ *​Kean Kaufmann* ​NLP Developer RecordsOne nSight Driven | *Priority. Clarity. Integrity. * *mobile* | ​240-401-6131 *Twitter: **@R1

Re: 2016AB UMLS (ctakessnorx)

2017-03-14 Thread Kean Kaufmann
To: dev@ctakes.apache.org > Subject: 2016AB UMLS (ctakessnorx) > > Hi, > > I've been using cTAKES for a bit now, but I still can't figure > out how to upgrade the UMLS version to the most recent one. > If I create my

ctakes-dictionary-lookup-fast-3.2.3-20170301.120140-151 runtime error

2017-03-02 Thread Kean Kaufmann
the time being. Thanks, Kean -- _____ *​Kean Kaufmann* ​NLP Developer RecordsOne nSight Driven | *Priority. Clarity. Integrity. * *mobile* | ​240-401-6131 *Twitter: **@R1_RecordsOne* --- *Confidentiality N