Jim,

What is the output of the tokenize model, look like?

James

On 3/14/2012 7:34 AM, Dimitrios wrote:
> On 14/03/12 11:23, Jörn Kottmann wrote:
>> Can you re-produce your issue with a dictionary which only contains a
>> single entry? 
>
> Yes i can indeed re-produce the issue with the following dictionary:
> --------------------------------------------------------------------------------------------
>
> <?xml version="1.0" encoding="UTF-8"?>
> <dictionary case_sensitive="false">
> <entry>
> <token>Folic</token>
> <token>acid</token>
> </entry>
> <entry>
> <token>Baclofen</token>
> </entry>
> </dictionary>
> --------------------------------------------------------------------------------------------
>
>
> The small paragraph i'm using for testing is this:
>
> "Folic acid is one variable, but other factors remain.
> Studies suggest that substances active at the GABA receptor may
> produce NTDs.
> To test this hypothesis pregnant rats were exposed to either the GABA
> a agonist muscimol (1, 2 or 4 mg/kg), the GABA a antagonist
> bicuculline (.5, 1, or 2 mg/kg), the GABA b agonist baclofen (15, 30,
> 60 mg/kg), or the GABA b antagonist hydroxysaclofen (1, 3, or 5 mg/kg)
> during neural tube formation.
> Normal saline was used as a control and valproic acid (600 mg/kg) as a
> positive control."
>
>
> The dictionary finds "baclofen" but it does not find "Folic acid"! The
> workflow is as follows:
>
> 1. get-sentences
> 2. tokenize -sentences
> 3. call dictionary name finder ".find()" method with an array of srings
>    (tokens of a single sentence)
>
> Jim
>
>

Reply via email to