Jim, What is the output of the tokenize model, look like?
James On 3/14/2012 7:34 AM, Dimitrios wrote: > On 14/03/12 11:23, Jörn Kottmann wrote: >> Can you re-produce your issue with a dictionary which only contains a >> single entry? > > Yes i can indeed re-produce the issue with the following dictionary: > -------------------------------------------------------------------------------------------- > > <?xml version="1.0" encoding="UTF-8"?> > <dictionary case_sensitive="false"> > <entry> > <token>Folic</token> > <token>acid</token> > </entry> > <entry> > <token>Baclofen</token> > </entry> > </dictionary> > -------------------------------------------------------------------------------------------- > > > The small paragraph i'm using for testing is this: > > "Folic acid is one variable, but other factors remain. > Studies suggest that substances active at the GABA receptor may > produce NTDs. > To test this hypothesis pregnant rats were exposed to either the GABA > a agonist muscimol (1, 2 or 4 mg/kg), the GABA a antagonist > bicuculline (.5, 1, or 2 mg/kg), the GABA b agonist baclofen (15, 30, > 60 mg/kg), or the GABA b antagonist hydroxysaclofen (1, 3, or 5 mg/kg) > during neural tube formation. > Normal saline was used as a control and valproic acid (600 mg/kg) as a > positive control." > > > The dictionary finds "baclofen" but it does not find "Folic acid"! The > workflow is as follows: > > 1. get-sentences > 2. tokenize -sentences > 3. call dictionary name finder ".find()" method with an array of srings > (tokens of a single sentence) > > Jim > >
